Using Amazon S3 Tables with AWS analytics services (original) (raw)
To make tables in your account accessible by AWS analytics services, you integrate your Amazon S3 table buckets with Amazon SageMaker Lakehouse. This integration allows AWS analytics services to automatically discover and access your table data. You can use this integration to work with your tables in these services:
Note
This integration uses the AWS Glue and AWS Lake Formation services and might incur AWS Glue request and storage costs. For more information, see AWS Glue Pricing.
Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine that you're using.
How the integration works
When you create a table bucket in the console, Amazon S3 initiates the following actions to integrate table buckets in the Region that you have selected with AWS analytics services:
- Creates a new AWS Identity and Access Management (IAM) service role that gives Lake Formation access to all your table buckets.
- Using the service role, Lake Formation registers table buckets in the current Region. This allows Lake Formation to manage access, permissions, and governance for all current and future table buckets in that Region.
- Adds the
s3tablescatalog
catalog to the AWS Glue Data Catalog in the current Region. Adding thes3tablescatalog
catalog allows all your table buckets, namespaces, and tables to be populated in the Data Catalog.
Note
These actions are automated through the Amazon S3 console. If you perform this integration programmatically, you must manually take all of these actions.
You integrate your table buckets once per AWS Region. After the integration is completed, all current and future table buckets, namespaces, and tables are added to the AWS Glue Data Catalog in that Region.
The following illustration shows how the s3tablescatalog
catalog automatically populates table buckets, namespaces, and tables in the current Region as corresponding objects in the Data Catalog. Table buckets are populated as subcatalogs. Namespaces within a table bucket are populated as databases within their respective subcatalogs. Tables are populated as tables in their respective databases.
How permissions work
We recommend integrating your table buckets with AWS analytics services so that you can work with your table data across services that use the AWS Glue Data Catalog as a metadata store. The integration enables fine-grained access control through AWS Lake Formation. This security approach means that, in addition to AWS Identity and Access Management (IAM) permissions, you must grant your IAM principal Lake Formation permissions on your tables before you can work with them.
There are two main types of permissions in AWS Lake Formation:
- Metadata access permissions control the ability to create, read, update, and delete metadata databases and tables in the Data Catalog.
- Underlying data access permissions control the ability to read and write data to the underlying Amazon S3 locations that the Data Catalog resources point to.
Lake Formation uses a combination of its own permissions model and the IAM permissions model to control access to Data Catalog resources and underlying data:
- For a request to access Data Catalog resources or underlying data to succeed, the request must pass permission checks by both IAM and Lake Formation.
- IAM permissions control access to the Lake Formation and AWS Glue APIs and resources, whereas Lake Formation permissions control access to the Data Catalog resources, Amazon S3 locations, and the underlying data.
Lake Formation permissions apply only in the Region in which they were granted, and a principal must be authorized by a data lake administrator or another principal with the necessary permissions in order to be granted Lake Formation permissions.
For more information, see Overview of Lake Formation permissions in the AWS Lake Formation Developer Guide.
Make sure that you follow the steps in Prerequisites for integration and Integrating table buckets with AWS analytics services so that you have the appropriate permissions to access the AWS Glue Data Catalog and your table resources, and to work with AWS analytics services.
Important
If you aren't the user who performed the table buckets integration with AWS analytics services for your account, you must be granted the necessary Lake Formation permissions on the table. For more information, see Granting permission on a table or database.
Prerequisites for integration
The following prerequisites are required to integrate table buckets with AWS analytics services:
- Create a table bucket.
- Attach the AWSLakeFormationDataAdmin AWS managed policy to your AWS Identity and Access Management (IAM) principal to make that user a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
- Add permissions for the
glue:PassConnection
operation to your IAM principal. - Add permissions for the
lakeformation:RegisterResource
andlakeformation:RegisterResourceWithPrivilegedAccess
operations to your IAM principal. - Update to the latest version of the AWS Command Line Interface (AWS CLI).
Important
When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by AWS Lake Formation or the AWS Glue Data Catalog. In this case, your table won't be visible to AWS analytics services such as Amazon Athena, even if your table buckets are integrated with AWS analytics services.
If your table definition contains capital letters, you receive the following error message when running a SELECT
query in Athena: "GENERIC_INTERNAL_ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names."
Integrating table buckets with AWS analytics services
This integration must be done once per AWS Region.
Important
The AWS analytics services integration now uses the WithPrivilegedAccess
option in the registerResource
Lake Formation API operation to register S3 table buckets. The integration also now creates the s3tablescatalog
catalog in the AWS Glue Data Catalog by using theAllowFullTableExternalDataAccess
option in the CreateCatalog
AWS Glue API operation.
If you set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating. To migrate to the updated integration, see Migrating to the updated integration process.
- Open the Amazon S3 console athttps://console.aws.amazon.com/s3/.
- In the left navigation pane, choose Table buckets.
- Choose Create table bucket.
The Create table bucket page opens. - Enter a Table bucket name and make sure that the Enable integration checkbox is selected.
- Choose Create table bucket. Amazon S3 will attempt to automatically integrate your table buckets in that Region.
The first time that you integrate table buckets in any Region, Amazon S3 creates a new IAM service role on your behalf. This role allows Lake Formation to access all table buckets in your account and federate access to your tables in AWS Glue Data Catalog.
To integrate table buckets using the AWS CLI
The following steps show how to use the AWS CLI to integrate table buckets. To use these steps, replace the `user input placeholders`
with your own information.
- Create a table bucket.
aws s3tables create-table-bucket \
--region us-east-1 \
--name amzn-s3-demo-table-bucket
- Create an IAM service role that allows Lake Formation to access your table resources.
- Create a file called
Role-Trust-Policy.json
that contains the following trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationDataAccessPolicy", "Effect": "Allow", "Principal": { "Service": "lakeformation.amazonaws.com" }, "Action": [ "sts:AssumeRole", "sts:SetContext", "sts:SetSourceIdentity" ], "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333" } } } ] }
Create the IAM service role by using the following command:
aws iam create-role \ --role-name S3TablesRoleForLakeFormation \ --assume-role-policy-document file://Role-Trust-Policy.json
- Create a file called
LF-GluePolicy.json
that contains the following policy:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationPermissionsForS3ListTableBucket", "Effect": "Allow", "Action": [ "s3tables:ListTableBuckets" ], "Resource": [ "*" ] }, { "Sid": "LakeFormationDataAccessPermissionsForS3TableBucket", "Effect": "Allow", "Action": [ "s3tables:CreateTableBucket", "s3tables:GetTableBucket", "s3tables:CreateNamespace", "s3tables:GetNamespace", "s3tables:ListNamespaces", "s3tables:DeleteNamespace", "s3tables:DeleteTableBucket", "s3tables:CreateTable", "s3tables:DeleteTable", "s3tables:GetTable", "s3tables:ListTables", "s3tables:RenameTable", "s3tables:UpdateTableMetadataLocation", "s3tables:GetTableMetadataLocation", "s3tables:GetTableData", "s3tables:PutTableData" ], "Resource": [ "arn:aws:s3tables:us-east-1:111122223333:bucket/*" ] } ] }
Attach the policy to the role by using the following command:
aws iam put-role-policy \ --role-name S3TablesRoleForLakeFormation \ --policy-name LakeFormationDataAccessPermissionsForS3TableBucket \ --policy-document file://LF-GluePolicy.json
- Create a file called
- Create a file called
input.json
that contains the following:
{
"ResourceArn": "arn:aws:s3tables:us-east-1:111122223333:bucket/*",
"WithFederation": true,
"RoleArn": "arn:aws:iam::111122223333:role/S3TablesRoleForLakeFormation"
}
Register table buckets with Lake Formation by using the following command:
aws lakeformation register-resource \
--region us-east-1 \
--with-privileged-access \
--cli-input-json file://input.json
- Create a file called
catalog.json
that contains the following catalog:
{
"Name": "s3tablescatalog",
"CatalogInput": {
"FederatedCatalog": {
"Identifier": "arn:aws:s3tables:us-east-1:111122223333:bucket/*",
"ConnectionName": "aws:s3tables"
},
"CreateDatabaseDefaultPermissions":[],
"CreateTableDefaultPermissions":[]
}
}
Create the s3tablescatalog
catalog by using the following command. Creating this catalog populates the AWS Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.
aws glue create-catalog \
--cli-input-json file://catalog.json
- Verify that the
s3tablescatalog
catalog was added in AWS Glue by using the following command:
aws glue get-catalog --catalog-id s3tablescatalog
The AWS analytics services integration process has been updated. If you've set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating by using the following steps. For more information about the migration or integration process, see Creating an Amazon S3 Tables catalog in the AWS Glue Data Catalog in the AWS Lake Formation Developer Guide.
- Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator. For more information about how to create a data lake administrator, seeCreate a data lake administrator in the AWS Lake Formation Developer Guide.
- Delete your
s3tablescatalog
catalog by doing the following:- In the left navigation pane, choose Catalogs.
- Select the option button next to the
s3tablescatalog
catalog in theCatalogs list. On the Actions menu, chooseDelete.
- Deregister the data location for the
s3tablescatalog
catalog by doing the following:- In the left navigation pane, go to the Administration section, and choose Data lake locations.
- Select the option button next to the
s3tablescatalog
data lake location, for example,s3://tables:`region`:`account-id`:bucket/*
. - On the Actions menu, choose Remove.
- In the confirmation dialog box that appears, choose Remove.
- Now that you've deleted your
s3tablescatalog
catalog and data lake location, you can follow the steps to integrate your table buckets with AWS analytics services by using the updated integration process.
Creating a resource link to your table's namespaces (Amazon Data Firehose)
To access your tables, Amazon Data Firehose needs a resource link that targets your table's namespace. A resource link is a Data Catalog object that acts as an alias or pointer to another Data Catalog resource, such as a database or table. The link is stored in the Data Catalog of the account or Region where it's created. For more information, see How resource links work in the_AWS Lake Formation Developer Guide_.
After you've integrated your table buckets with the AWS analytics services, you can create resource links to work with your tables in Amazon Data Firehose. For more information about creating these links, see Streaming data to tables with Amazon Data Firehose.
Granting Lake Formation permissions on your table resources
After your table buckets are integrated with the AWS analytics services, Lake Formation manages access to your table resources. Lake Formation uses its own permissions model (Lake Formation permissions) that enables fine-grained access control for Data Catalog resources. Lake Formation requires that each IAM principal (user or role) be authorized to perform actions on Lake Formation–managed resources. For more information, see Overview of Lake Formation permissions in the AWS Lake Formation Developer Guide. For information about cross-account data sharing, see Cross-account data sharing in Lake Formation in the AWS Lake Formation Developer Guide.
Before IAM principals can access tables in AWS analytics services, you must grant them Lake Formation permissions on those resources.
Note
If you're the user who performed the table bucket integration, you already have Lake Formation permissions to your tables. If you're the only principal who will access your tables, you can skip this step. You only need to grant Lake Formation permissions on your tables to other IAM principals. This allows other principals to access the table when running queries. For more information, see Granting permission on a table or database.
You must grant other IAM principals Lake Formation permissions on your table resources to work with them in the following services:
- Amazon Redshift
- Amazon Data Firehose
- Amazon QuickSight
- Amazon Athena
Note
For Amazon Data Firehose, which uses a resource link to access your tables, you must separately grant permissions to both the resource link and the target (linked) namespace. For more information, seeGranting permission on a resource link.
Granting permission on a table or database
You can grant a principal Lake Formation permissions on a table or database in a table bucket, either through the Lake Formation console or the AWS CLI.
Note
When you grant Lake Formation permissions on a Data Catalog resource to an external account or directly to an IAM principal in another account, Lake Formation uses the AWS Resource Access Manager (AWS RAM) service to share the resource. If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. Then, to make the shared resource available, the data lake administrator in the grantee account must use the AWS RAM console or AWS CLI to accept the invitation. For more information about cross-account data sharing, see Cross-account data sharing in Lake Formation in the AWS Lake Formation Developer Guide.
Console
- Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
- In the navigation pane, choose Data permissions, and then chooseGrant.
- On the Grant Permissions page, under Principals, do one of the following:
- For Amazon Athena or Amazon Redshift, choose IAM users and roles, and select the IAM principal you use for queries.
- For Amazon Data Firehose, choose IAM users and roles, and select the service role that you created to stream to tables.
- For QuickSight, choose SAML users and groups, and enter the Amazon Resource Name (ARN) of your QuickSight admin user.
- Under LF-Tags or catalog resources, choose Named Data Catalog resources.
- For Catalogs, choose the subcatalog that you created when you integrated your table bucket, for example,
`account-id`:s3tablescatalog/`amzn-s3-demo-bucket`
. - For Databases, choose the S3 table bucket namespace that you created.
- (Optional) For Tables, choose the S3 table that you created in your table bucket.
Note
If you're creating a new table in the Athena query editor, don't select a table. 8. Do one of the following:
- If you specified a table in the prior step, for Table permissions, choose Super.
- If you didn't specify a table in the prior step, go to Database permissions. For cross-account data sharing, you can't chooseSuper to grant the other principal all permissions on your database. Instead, choose more fine-grained permissions, such asDescribe.
- Choose Grant.
CLI
- Make sure that you're running the following AWS CLI commands as a data lake administrator. For more information, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
- Run the following command to grant Lake Formation permissions on table in S3 table bucket to an IAM principal to access the table. To use this example, replace the
`user input placeholders`
with your own information.
aws lakeformation grant-permissions \
--region us-east-1 \
--cli-input-json \
'{
"Principal": {
"DataLakePrincipalIdentifier": "user or role ARN, for example, arn:aws:iam::account-id:role/example-role"
},
"Resource": {
"Table": {
"CatalogId": "account-id:s3tablescatalog/amzn-s3-demo-bucket",
"DatabaseName": "S3 table bucket namespace, for example, test_namespace",
"Name": "S3 table bucket table name, for example test_table"
}
},
"Permissions": [
"ALL"
]
}'