Open-Source Data Ingestion – Amazon OpenSearch Service – Amazon Web Services (original) (raw)

Ingest, transform and route data at scale to Amazon OpenSearch Domains and Serverless collections

Why Amazon OpenSearch Service Ingestion?

Amazon OpenSearch Ingestion is a feature of Amazon OpenSearch Service that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch domain or Serverless collection. Amazon OpenSearch Ingestion is capable of ingesting data from a wide variety of sources and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs. Amazon OpenSearch Ingestion is serverless in nature and will scale automatically to meet the requirements of your most demanding workloads, helping you focus on your business logic while abstracting away the complexity of managing complex data pipelines for your observability and security use cases.

Cost optimization

Realize storage cost reductions by deduplicating, sampling, and routing noisy data to lower cost storage.

Data quality

Enforce data quality by transforming, filtering, and enriching data with built-in processors and by adopting schemas to accelerate observability and reduce security investigation times.

Data protection

Protect sensitive data by redacting and obfuscating sensitive information before it gets to a destination.

Security and compliance

Route data using conditional logic to maintain compliance with data residency laws.

Key features

AWS is a leading contributor of the OpenSearch project, which many customers use. You’ll get all of the new innovations for OpenSearch Data Prepper within this managed service. Beyond those features, which the community drives and contributes to, Amazon OpenSearch Ingestion Service also brings these capabilities:

Ingestion FAQs

Amazon OpenSearch Ingestion is a data ingestion tier that enables you to filter, enrich, transform, normalize and aggregate data for downstream analytics and visualization in Amazon OpenSearch domains and Amazon OpenSearch Serverless collections. Amazon OpenSearch Ingestion allows you to create custom data pipelines to improve the operational view of your applications. The serverless nature of Amazon OpenSearch Ingestion abstracts away the complexities of self-managing data pipelines and ensure that the processing capabilities of your data pipelines auto-scales as per the demands of your workloads. With Amazon OpenSearch Ingestion, you can

An Amazon OpenSearch Ingestion pipeline consists of three major components:

Amazon OpenSearch supports ingesting all types of data that you would normally index in an Amazon OpenSearch domain. This includes but is not limited to structured, unstructured, textual, numerical and geospatial data. OpenSearch Ingestion also supports ingestion of all three pillars of the observability data: logs, metrics and traces. You can use OpenSearch Ingestion along with its support for a rich ecosystem of data sources, processors and sinks to transform your data before storing it in Amazon OpenSearch domains. With OpenSearch Ingestion, you no longer have to write custom lambda function or self-manage Logstash and Elasticsearch ingest nodes to ingest data that needs to be indexed in Amazon OpenSearch clusters. Please refer to our documentation page to see the list of sources, processors and sinks supported by Amazon OpenSearch Ingestion.

Amazon OpenSearch Ingestion is a data ingestion tier that pre-processes data before the data is indexed in Amazon OpenSearch Service. OpenSearch Ingestions is built with Data Prepper which is a component of the OpenSearch project and supports all data formats, sources, processors and sinks supported by Data Prepper.

To get started with Amazon OpenSearch Ingestion, you begin by defining a data pipeline. An OpenSearch Ingestion pipeline is the core of your business logic and consists of a source, a single or a series of processors and a sink. You define your pipeline configuration via a YAML file which contains details of your source, processors and sinks. OpenSearch Ingestion also enables you to set up a minimum and maximum capacity of the OpenSearch Compute Units for Ingestion (OCUs) that you want to set per pipeline. Finally, you can choose on how your data reaches your OpenSearch Ingestion pipelines:

You can get started with creating a data pipeline via the AWS Console or the AWS command line.

AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari. Learn more »