Guide to AWS Athena: Create, Manage, and Optimize Costs (original) (raw)

Last Updated : 23 Jul, 2025

AWS Athena is a powerful serverless query service provided by AWS for analyzing the data directly in Amazon S3 using standard SQL. It facilitates features like high scalability, cost-effectiveness, easy-to-use platform for running complex queries without the need for extensive infrastructure setup. In this article we will discuss on what is aws athena, its archtiecture, benefits, limitations, advantages, disadvantages and how it difference from Amazon Redshift, Amazon Glue and Microsoft SQL server effectively.

Table of Content

What is AWS Athena?

AWS Athena is a serverless interactive query service that enables normal SQL data analysis in Amazon S3. Athena is based on Presto, a distributed SQL query engine, and it can query data in Amazon S3 fast using conventional SQL syntax. There is no infrastructure to handle with Athena, so you can focus on analyzing data at scale. To have more idea of AWS Ethena, let us understand the architecture first.

AWS Athena Architecture

Apache Presto, an open-source distributed SQL query engine, serves as the foundation for Athena. When a query is submitted by a user, Athena generates a query plan and sends it to Presto for execution. Presto then distributes the query over numerous cluster nodes for parallel processing. The results are subsequently compiled and presented to the user. Athena stores table and partition metadata in a controlled Hive metastore.

When a query is run, Athena gets the metadata from the metastore to establish the data's location and format. Athena also interfaces with AWS Glue, a fully managed extract, transform, and load (ETL) service, allowing customers to create and manage data catalogs and ETL processes. Furthermore, we will go through the various components of AWS Athena.

Amazon-Athena-Architecture

How to Setup AWS Athena? A Step-By-Step Guide

Setting up of AWS Athena service is a straigtforward with involving some key steps. The following steps guides you to get started with querying your data in Amazon S3 using Athena.

Step 1: Sign in to AWS Management Console

Step 2: Navigate to AWS Athena

Step 3: Set up a Query Result Location

Step 4: Create a Database

CREATE DATABASE mydatabase;

Step 5: Create a Table

CREATE EXTERNAL TABLE mytable (
id INT,
name STRING,
age INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ','
) LOCATION 's3://your-bucket-name/data/';

Step 6: Query Your Data

SELECT * FROM mytable;

Step 7: Explore and Analyze Data

How to Setup AWS Athena Using AWS CloudFormation Templates? A Step-By-Step Guide

The following are the steps that guides in setting up the AWS Athena using AWS CloudFormation Templates to automate the process:

Step 1: Install and Configure AWS CLI

pip install awscli

aws configure

Step 2: Create Amazon S3 Bucket

aws s3 mb s3://my-athena-bucket

Step 3: Write the CloudFormation Template

AWSTemplateFormatVersion: '2010-09-09'
Resources:
AthenaWorkGroup:
Type: AWS::Athena::WorkGroup
Properties:
Name: MyWorkGroup
Description: "WorkGroup for Athena queries"
State: ENABLED
WorkGroupConfiguration:
ResultConfiguration:
OutputLocation: s3://my-athena-bucket/athena-results/

Step 4: Deploy the CloudFormation Stack

aws cloudformation package --template-file template.yaml --output-template-file packaged-template.yaml --s3-bucket my-athena-bucket

aws cloudformation deploy --template-file template.yaml --stack-name AthenaSetupStack --capabilities CAPABILITY_NAMED_IAM

Step 5: Verify the Deployment

aws cloudformation describe-stacks --stack-name AthenaSetupStack

aws athena list-work-groups

Step 6: Query the Data with Athena

aws athena start-query-execution --query-string "SELECT * FROM my_table;" --query-execution-context Database=my_database --result-configuration OutputLocation=s3://my-athena-bucket/query-results/

How to Run Amazon Athena Queries?

Amazon Athena facilitates in running the SQL Queries on your data that is stored in Amazon S3. Through this we can report the query results to other AWS services such as Amazon S3, Amazon QuickSight or sending the notifications via Amazon SNS. The following steps helps in guiding you to run the Amazon Athena Queries:

Step 1: Install and Configure AWS CLI

pip install awscli
aws configure

Step 2: Create a S3 Bucket for Query Results

aws s3 mb s3://my-athena-query-results

Step 3: Run a Query using AWS CLI

aws athena start-query-execution --query-string "SELECT * FROM my_table;" --query-execution-context Database=my_database --result-configuration OutputLocation=s3://my-athena-query-results/

Step 4: Check Query Execution Status

aws athena get-query-execution --query-execution-id

Step 5: Fetch Query Results

aws athena get-query-results --query-execution-id

How to Report Data to Other Resources?

The following steps helps in how to report the data to other resources:

Step 1: Save Results to Amazon S3

aws s3 cp s3://my-athena-query-results/.csv .

Step 2: Visualize the Data in Amazon QuickSight

Step 3: Send Notifications via Amazon SNS

aws sns create-topic --name MyAthenaResultsTopic

aws sns subscribe --topic-arn --protocol email --notification-endpoint myemail@example.com

aws sns publish --topic-arn --message "Athena query results are available at s3://my-athena-query-results/.csv"

What are the benefits of using Amazon Athena?

The following are the benefits of using Amazon Athena:

What are some Amazon Athena Limitations?

The following are the some limitations of Amazon Athena:

Features of AWS Athena

The following are the features of AWS Athena:

Advantages of AWS Athena

The following are the advantages of AWS Athena:

Disadvantages of AWS Athena

The following are the disadvantages of AWS Athena:

Amazon Athena Pricing: How Much Does Athena Cost?

The following table discuss on Amazon Athena Pricing:

Pricing Component Description Cost
**Query Execution Charges are based on the amount of data scanned by your queries. $5 per TB of data scanned
**Data Scanned You can reduce costs by compressing data, partitioning data, and using columnar formats. N/A
**Storage Athena queries data directly in Amazon S3, so you only pay for S3 storage. Based on Amazon S3 pricing
**Data Transfer Data transfer within the same AWS region is free; cross-region data transfer costs apply. Based on AWS Data Transfer pricing

How Does Amazon Athena compares to AWS Redshift, Microsoft SQL Server and AWS Glue?

The following table details the comparisons of Amazon Athena with AWS Redshift, Microsoft SQL Server and AWS Glue:

Features Amazon Athena AWS Redshift Microsoft SQL Server AWS Glue
Service Type It is servless interactive query service It is fully managed by data warehouse. It is relational database management system is is serverless data integration service.
Primary Use Case It performs adhoc querying on Amzon S3 It facilitates in Data warehousing and OLAP It facilitates with transactional and analytical processing It facilitates with ETL and data cataloging.
**Pricing Model Pay per query based on data scanned ($5/TB) Pay per node/hour and additional storage costs Licensing costs and pay-per-usage for cloud Pay per usage (job runs, data catalog storage)
**Data Storage Amazon S3 Redshift managed storage, integrates with S3 Local or cloud storage, depends on setup Amazon S3 and other data sources
**Performance Optimized for quick queries on large datasets High performance for complex queries and large datasets High performance for transactional and analytical workloads Optimized for ETL operations and data transformation
**Maintenance Fully managed, no maintenance required Managed service, but requires some administration Requires regular maintenance and updates Fully managed, minimal maintenance required
**Data Formats Supported JSON, CSV, Parquet, ORC, Avro JSON, CSV, Parquet, ORC, Avro, and more Traditional RDBMS formats JSON, CSV, Parquet, ORC, Avro, and more

How to Optimize AWS Athena Costs Quickly and Accurately?

On following the below suggested practices you can can optimize the AWS Athena quickly and accurately:

Use Cases of AWS Athena

The following are the use cases of AWS Athena:

Conclusion

In conclusion, Amazon Athena is a serverless query service that allows customers to run regular SQL queries to evaluate data in S3. Serverless design, standard SQL support, interaction with the AWS environment, cost-effective pricing, and integration with BI tools are among its primary characteristics. Its architecture is based on top of Apache Presto and interfaces with AWS Gl.