Data pipelines
Both AI/ML and traditional data analytics need clean and accessible data in a format that's usable by analytics tools and AI algorithms.

AWS Pipeline Analytcs ETL Services:​
1. Data Ingestion Services​
Service Name | Logo | Key Attributes | Use Cases |
---|---|---|---|
Amazon Kinesis Data Streams | ![]() | - Serverless, real-time data streaming. - Massively scalable for terabytes of data. - Automatic provisioning and scaling. | - Real-time analytics (e.g., clickstreams). - Log and event data collection at scale. - IoT data ingestion from sensors. |
Amazon Data Firehose | ![]() | - Fully managed, near real-time data loading. - Built-in data transformation (ETL). - Delivers directly to S3, Redshift, OpenSearch. | - Streaming ETL pipelines. - Simple delivery of logs to analytics tools. - Ingesting IoT data directly into a data lake. |
2. Data Storage Services​
Service Name | Logo | Key Attributes | Use Cases |
---|---|---|---|
Amazon S3 | ![]() | - Highly scalable, durable object storage. - The foundation for data lakes. - Stores any type of data (structured/unstructured). | - Central data lake for raw data. - Archiving and backup. - Source/destination for analytics and ML services. |
Amazon Redshift | ![]() | - Fully managed, petabyte-scale data warehouse. - High-performance with columnar storage. - Optimized for complex SQL queries. | - Business intelligence (BI) and reporting. - High-performance analytical workloads. - Storing structured, transformed data. |
3. Data Cataloging Services​
Service Name | Logo | Key Attributes | Use Cases |
---|---|---|---|
AWS Glue Data Catalog | ![]() | - Centralized, managed metadata repository. - Automatic schema discovery with crawlers. - Integrates with Athena, EMR, and Redshift. | - Defining schemas for data in S3. - Enabling data discovery for a data lake. - Providing a unified data view for analytics. |
4. Data Processing Services​
Service Name | Logo | Key Attributes | Use Cases |
---|---|---|---|
AWS Glue | ![]() | - Serverless, fully managed ETL service. - Automated schema discovery and code generation. - Pay-per-job execution model. | - Transforming raw data into structured formats. - Cleaning, enriching, and validating data. - Automating data preparation workflows. |
Amazon EMR | ![]() | - Managed big data platform for Spark, Hadoop, etc. - Handles infrastructure provisioning and scaling. - Cost-effective with Spot Instance integration. | - Large-scale, petabyte-level data processing. - Machine learning and ETL with big data frameworks. - Genomic and scientific data analysis. |
5. Data Analysis and Visualization Services​
Service Name | Logo | Key Attributes | Use Cases |
---|---|---|---|
Amazon Athena | ![]() | - Serverless, interactive query service. - Uses standard SQL to query data in place (in S3). - Pay-per-query cost model. | - Ad-hoc data discovery on data lakes. - Quickly querying log files without loading them. - Serverless BI and reporting. |
Amazon Redshift | ![]() | - Fully managed, petabyte-scale data warehouse. - High-performance with columnar storage. - Optimized for complex SQL queries. | - Business intelligence (BI) and reporting. - High-performance analytical workloads. - Storing structured, transformed data. |
Amazon QuickSight | ![]() | - Serverless, cloud-native BI service. - Interactive dashboards and reports. - Natural language querying with Amazon Q. | - Creating executive and operational dashboards. - Data visualization for business users. - Embedding analytics into applications. |
Amazon OpenSearch | ![]() | - Managed service for OpenSearch clusters. - Real-time log analytics and application monitoring. - Full-text search capabilities. | - Interactive log analytics and troubleshooting. - Powering search functionality for applications. - Real-time monitoring dashboards. |