Open-Source Data Ingestion – Amazon OpenSearch Service – Amazon Web Services (original) (raw)

Ingest, transform and route data at scale to Amazon OpenSearch Domains and Serverless collections

Why Amazon OpenSearch Service Ingestion?

Amazon OpenSearch Ingestion is a feature of Amazon OpenSearch Service that allows you to ingest, filter, transform, enrich, and route data to an Amazon OpenSearch domain or Serverless collection. Amazon OpenSearch Ingestion is capable of ingesting data from a wide variety of sources and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs. Amazon OpenSearch Ingestion is serverless in nature and will scale automatically to meet the requirements of your most demanding workloads, helping you focus on your business logic while abstracting away the complexity of managing complex data pipelines for your observability and security use cases.

Cost optimization

Realize storage cost reductions by deduplicating, sampling, and routing noisy data to lower cost storage.

Data quality

Enforce data quality by transforming, filtering, and enriching data with built-in processors and by adopting schemas to accelerate observability and reduce security investigation times.

Data protection

Protect sensitive data by redacting and obfuscating sensitive information before it gets to a destination.

Security and compliance

Route data using conditional logic to maintain compliance with data residency laws.

Key features

AWS is a leading contributor of the OpenSearch project, which many customers use. You’ll get all of the new innovations for OpenSearch Data Prepper within this managed service. Beyond those features, which the community drives and contributes to, Amazon OpenSearch Ingestion Service also brings these capabilities:

AWS-managed software installation and patching
AWS monitors and repairs the service, 24x7
AWS upgrades versions
Zero downtime for updates and upgrades
Availability SLA: 99.9%
Serverless, with automatic scaling for ingestion workloads

Ingestion FAQs

Amazon OpenSearch Ingestion is a data ingestion tier that enables you to filter, enrich, transform, normalize and aggregate data for downstream analytics and visualization in Amazon OpenSearch domains and Amazon OpenSearch Serverless collections. Amazon OpenSearch Ingestion allows you to create custom data pipelines to improve the operational view of your applications. The serverless nature of Amazon OpenSearch Ingestion abstracts away the complexities of self-managing data pipelines and ensure that the processing capabilities of your data pipelines auto-scales as per the demands of your workloads. With Amazon OpenSearch Ingestion, you can

Realize storage cost reductions by data deduplication, and sampling to prevent noisy data from being indexed in Amazon OpenSearch.
Enforce data quality and adopt common schemas by transforming, formatting, and enriching data before it is indexed in Amazon OpenSearch domains making it easier to troubleshoot issues.
Redact or obfuscate sensitive information before it gets to a destination enabling compliancy with data residency laws.

An Amazon OpenSearch Ingestion pipeline consists of three major components:

Source is the input component of a pipeline. It defines the mechanism through which a pipeline consumes records. The source can consume records either by receiving data over http/s or by reading from external 3rd part endpoints.
Processors are intermediate processing units that can filter transform, and enrich records into a desired format before publishing them to the sink. The processor is an optional component of a pipeline. If you don't define a processor, records are published in the format defined in the source. You can have more than one processor. Processors are executed in the order that you define them in the pipeline.
Sink is the output component of a pipeline. It defines one or more destinations to which a pipeline publishes records. A sink can also be another pipeline, which allows you to chain multiple pipelines together.

Amazon OpenSearch supports ingesting all types of data that you would normally index in an Amazon OpenSearch domain. This includes but is not limited to structured, unstructured, textual, numerical and geospatial data. OpenSearch Ingestion also supports ingestion of all three pillars of the observability data: logs, metrics and traces. You can use OpenSearch Ingestion along with its support for a rich ecosystem of data sources, processors and sinks to transform your data before storing it in Amazon OpenSearch domains. With OpenSearch Ingestion, you no longer have to write custom lambda function or self-manage Logstash and Elasticsearch ingest nodes to ingest data that needs to be indexed in Amazon OpenSearch clusters. Please refer to our documentation page to see the list of sources, processors and sinks supported by Amazon OpenSearch Ingestion.

Amazon OpenSearch Ingestion is a data ingestion tier that pre-processes data before the data is indexed in Amazon OpenSearch Service. OpenSearch Ingestions is built with Data Prepper which is a component of the OpenSearch project and supports all data formats, sources, processors and sinks supported by Data Prepper.

To get started with Amazon OpenSearch Ingestion, you begin by defining a data pipeline. An OpenSearch Ingestion pipeline is the core of your business logic and consists of a source, a single or a series of processors and a sink. You define your pipeline configuration via a YAML file which contains details of your source, processors and sinks. OpenSearch Ingestion also enables you to set up a minimum and maximum capacity of the OpenSearch Compute Units for Ingestion (OCUs) that you want to set per pipeline. Finally, you can choose on how your data reaches your OpenSearch Ingestion pipelines:

VPC access: For VPC access, we establish a Private Link from your VPC to the Amazon OpenSearch Ingestion pipeline. This provides private connectivity to your pipelines without exposing your traffic to the public internet.
Public access: In this network configuration, your data to your OpenSearch pipelines flows over the public internet.

You can get started with creating a data pipeline via the AWS Console or the AWS command line.

AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari. Learn more »