Using Amazon S3 Tables with AWS analytics services (original) (raw)

To make tables in your account accessible by AWS analytics services, you integrate your Amazon S3 table buckets with Amazon SageMaker Lakehouse. This integration allows AWS analytics services to automatically discover and access your table data. You can use this integration to work with your tables in these services:

Note

This integration uses the AWS Glue and AWS Lake Formation services and might incur AWS Glue request and storage costs. For more information, see AWS Glue Pricing.

Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine that you're using.

How the integration works

When you create a table bucket in the console, Amazon S3 initiates the following actions to integrate table buckets in the Region that you have selected with AWS analytics services:

Creates a new AWS Identity and Access Management (IAM) service role that gives Lake Formation access to all your table buckets.
Using the service role, Lake Formation registers table buckets in the current Region. This allows Lake Formation to manage access, permissions, and governance for all current and future table buckets in that Region.
Adds the s3tablescatalog catalog to the AWS Glue Data Catalog in the current Region. Adding the s3tablescatalog catalog allows all your table buckets, namespaces, and tables to be populated in the Data Catalog.

Note

These actions are automated through the Amazon S3 console. If you perform this integration programmatically, you must manually take all of these actions.

You integrate your table buckets once per AWS Region. After the integration is completed, all current and future table buckets, namespaces, and tables are added to the AWS Glue Data Catalog in that Region.

The following illustration shows how the s3tablescatalog catalog automatically populates table buckets, namespaces, and tables in the current Region as corresponding objects in the Data Catalog. Table buckets are populated as subcatalogs. Namespaces within a table bucket are populated as databases within their respective subcatalogs. Tables are populated as tables in their respective databases.

The ways that table resources are represented in AWS Glue Data Catalog.

How permissions work

We recommend integrating your table buckets with AWS analytics services so that you can work with your table data across services that use the AWS Glue Data Catalog as a metadata store. The integration enables fine-grained access control through AWS Lake Formation. This security approach means that, in addition to AWS Identity and Access Management (IAM) permissions, you must grant your IAM principal Lake Formation permissions on your tables before you can work with them.

There are two main types of permissions in AWS Lake Formation:

Metadata access permissions control the ability to create, read, update, and delete metadata databases and tables in the Data Catalog.
Underlying data access permissions control the ability to read and write data to the underlying Amazon S3 locations that the Data Catalog resources point to.

Lake Formation uses a combination of its own permissions model and the IAM permissions model to control access to Data Catalog resources and underlying data:

For a request to access Data Catalog resources or underlying data to succeed, the request must pass permission checks by both IAM and Lake Formation.
IAM permissions control access to the Lake Formation and AWS Glue APIs and resources, whereas Lake Formation permissions control access to the Data Catalog resources, Amazon S3 locations, and the underlying data.

Lake Formation permissions apply only in the Region in which they were granted, and a principal must be authorized by a data lake administrator or another principal with the necessary permissions in order to be granted Lake Formation permissions.

For more information, see Overview of Lake Formation permissions in the AWS Lake Formation Developer Guide.

Make sure that you follow the steps in Prerequisites for integration and Integrating table buckets with AWS analytics services so that you have the appropriate permissions to access the AWS Glue Data Catalog and your table resources, and to work with AWS analytics services.

Important

If you aren't the user who performed the table buckets integration with AWS analytics services for your account, you must be granted the necessary Lake Formation permissions on the table. For more information, see Granting permission on a table or database.

Prerequisites for integration

The following prerequisites are required to integrate table buckets with AWS analytics services:

Create a table bucket.
Attach the AWSLakeFormationDataAdmin AWS managed policy to your AWS Identity and Access Management (IAM) principal to make that user a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
Add permissions for the glue:PassConnection operation to your IAM principal.
Add permissions for the lakeformation:RegisterResource andlakeformation:RegisterResourceWithPrivilegedAccess operations to your IAM principal.
Update to the latest version of the AWS Command Line Interface (AWS CLI).

Important

When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by AWS Lake Formation or the AWS Glue Data Catalog. In this case, your table won't be visible to AWS analytics services such as Amazon Athena, even if your table buckets are integrated with AWS analytics services.

If your table definition contains capital letters, you receive the following error message when running a SELECT query in Athena: "GENERIC_INTERNAL_ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names."

Integrating table buckets with AWS analytics services

This integration must be done once per AWS Region.

Important

The AWS analytics services integration now uses the WithPrivilegedAccess option in the registerResource Lake Formation API operation to register S3 table buckets. The integration also now creates the s3tablescatalog catalog in the AWS Glue Data Catalog by using theAllowFullTableExternalDataAccess option in the CreateCatalog AWS Glue API operation.

If you set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating. To migrate to the updated integration, see Migrating to the updated integration process.

Open the Amazon S3 console athttps://console.aws.amazon.com/s3/.
In the left navigation pane, choose Table buckets.
Choose Create table bucket.
The Create table bucket page opens.
Enter a Table bucket name and make sure that the Enable integration checkbox is selected.
Choose Create table bucket. Amazon S3 will attempt to automatically integrate your table buckets in that Region.

The first time that you integrate table buckets in any Region, Amazon S3 creates a new IAM service role on your behalf. This role allows Lake Formation to access all table buckets in your account and federate access to your tables in AWS Glue Data Catalog.

To integrate table buckets using the AWS CLI

The following steps show how to use the AWS CLI to integrate table buckets. To use these steps, replace the `user input placeholders` with your own information.

Create a table bucket.

aws s3tables create-table-bucket \  
--region us-east-1 \  
--name amzn-s3-demo-table-bucket

Create an IAM service role that allows Lake Formation to access your table resources.

Create a file called Role-Trust-Policy.json that contains the following trust policy:

{  
    "Version": "2012-10-17",  
    "Statement": [  
      {  
        "Sid": "LakeFormationDataAccessPolicy",  
        "Effect": "Allow",  
        "Principal": {  
          "Service": "lakeformation.amazonaws.com"  
        },  
        "Action": [  
            "sts:AssumeRole",  
            "sts:SetContext",  
            "sts:SetSourceIdentity"  
        ],  
        "Condition": {  
          "StringEquals": {  
            "aws:SourceAccount": "111122223333"  
          }  
        }  
      }  
    ]  
}

Create the IAM service role by using the following command:

aws iam create-role \  
--role-name S3TablesRoleForLakeFormation \  
--assume-role-policy-document file://Role-Trust-Policy.json

Create a file called LF-GluePolicy.json that contains the following policy:

{  
    "Version": "2012-10-17",  
    "Statement": [  
        {  
            "Sid": "LakeFormationPermissionsForS3ListTableBucket",  
            "Effect": "Allow",  
            "Action": [  
                "s3tables:ListTableBuckets"  
            ],  
            "Resource": [  
                "*"  
            ]  
        },  
        {  
            "Sid": "LakeFormationDataAccessPermissionsForS3TableBucket",  
            "Effect": "Allow",  
            "Action": [  
                "s3tables:CreateTableBucket",  
                "s3tables:GetTableBucket",  
                "s3tables:CreateNamespace",  
                "s3tables:GetNamespace",  
                "s3tables:ListNamespaces",  
                "s3tables:DeleteNamespace",  
                "s3tables:DeleteTableBucket",  
                "s3tables:CreateTable",  
                "s3tables:DeleteTable",  
                "s3tables:GetTable",  
                "s3tables:ListTables",  
                "s3tables:RenameTable",  
                "s3tables:UpdateTableMetadataLocation",  
                "s3tables:GetTableMetadataLocation",  
                "s3tables:GetTableData",  
                "s3tables:PutTableData"  
            ],  
            "Resource": [  
                "arn:aws:s3tables:us-east-1:111122223333:bucket/*"  
            ]  
        }  
    ]  
}

Attach the policy to the role by using the following command:

aws iam put-role-policy \  
--role-name S3TablesRoleForLakeFormation  \  
--policy-name LakeFormationDataAccessPermissionsForS3TableBucket \  
--policy-document file://LF-GluePolicy.json

Create a file called input.json that contains the following:

{  
    "ResourceArn": "arn:aws:s3tables:us-east-1:111122223333:bucket/*",  
    "WithFederation": true,  
    "RoleArn": "arn:aws:iam::111122223333:role/S3TablesRoleForLakeFormation"  
}

aws lakeformation register-resource \  
--region us-east-1 \  
--with-privileged-access \  
--cli-input-json file://input.json

Create a file called catalog.json that contains the following catalog:

{  
   "Name": "s3tablescatalog",  
   "CatalogInput": {  
      "FederatedCatalog": {  
          "Identifier": "arn:aws:s3tables:us-east-1:111122223333:bucket/*",  
          "ConnectionName": "aws:s3tables"  
       },  
       "CreateDatabaseDefaultPermissions":[],  
       "CreateTableDefaultPermissions":[],  
       "AllowFullTableExternalDataAccess": "True"  
   }  
}

Create the s3tablescatalog catalog by using the following command. Creating this catalog populates the AWS Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.

aws glue create-catalog \  
--region us-east-1 \  
--cli-input-json file://catalog.json

Verify that the s3tablescatalog catalog was added in AWS Glue by using the following command:

aws glue get-catalog --catalog-id s3tablescatalog

The AWS analytics services integration process has been updated. If you've set up the integration with the preview release, you can continue to use your current integration. However, the updated integration process provides performance improvements, so we recommend migrating by using the following steps. For more information about the migration or integration process, see Creating an Amazon S3 Tables catalog in the AWS Glue Data Catalog in the AWS Lake Formation Developer Guide.

Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator. For more information about how to create a data lake administrator, seeCreate a data lake administrator in the AWS Lake Formation Developer Guide.
Delete your s3tablescatalog catalog by doing the following:
- In the left navigation pane, choose Catalogs.
- Select the option button next to the s3tablescatalog catalog in theCatalogs list. On the Actions menu, chooseDelete.
Deregister the data location for the s3tablescatalog catalog by doing the following:
- In the left navigation pane, go to the Administration section, and choose Data lake locations.
- Select the option button next to the s3tablescatalog data lake location, for example,s3://tables:`region`:`account-id`:bucket/*.
- On the Actions menu, choose Remove.
- In the confirmation dialog box that appears, choose Remove.
Now that you've deleted your s3tablescatalog catalog and data lake location, you can follow the steps to integrate your table buckets with AWS analytics services by using the updated integration process.

Granting Lake Formation permissions on your table resources

After your table buckets are integrated with the AWS analytics services, Lake Formation manages access to your table resources. Lake Formation uses its own permissions model (Lake Formation permissions) that enables fine-grained access control for Data Catalog resources. Lake Formation requires that each IAM principal (user or role) be authorized to perform actions on Lake Formation–managed resources. For more information, see Overview of Lake Formation permissions in the AWS Lake Formation Developer Guide. For information about cross-account data sharing, see Cross-account data sharing in Lake Formation in the AWS Lake Formation Developer Guide.

Before IAM principals can access tables in AWS analytics services, you must grant them Lake Formation permissions on those resources.

Note

If you're the user who performed the table bucket integration, you already have Lake Formation permissions to your tables. If you're the only principal who will access your tables, you can skip this step. You only need to grant Lake Formation permissions on your tables to other IAM principals. This allows other principals to access the table when running queries. For more information, see Granting permission on a table or database.

You must grant other IAM principals Lake Formation permissions on your table resources to work with them in the following services:

Amazon Redshift
Amazon Data Firehose
Amazon QuickSight
Amazon Athena

Granting permission on a table or database

You can grant a principal Lake Formation permissions on a table or database in a table bucket, either through the Lake Formation console or the AWS CLI.

Note

When you grant Lake Formation permissions on a Data Catalog resource to an external account or directly to an IAM principal in another account, Lake Formation uses the AWS Resource Access Manager (AWS RAM) service to share the resource. If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. Then, to make the shared resource available, the data lake administrator in the grantee account must use the AWS RAM console or AWS CLI to accept the invitation. For more information about cross-account data sharing, see Cross-account data sharing in Lake Formation in the AWS Lake Formation Developer Guide.

Console

Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator. For more information about how to create a data lake administrator, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
In the navigation pane, choose Data permissions, and then chooseGrant.
On the Grant Permissions page, under Principals, do one of the following:
- For Amazon Athena or Amazon Redshift, choose IAM users and roles, and select the IAM principal you use for queries.
- For Amazon Data Firehose, choose IAM users and roles, and select the service role that you created to stream to tables.
- For QuickSight, choose SAML users and groups, and enter the Amazon Resource Name (ARN) of your QuickSight admin user.
Under LF-Tags or catalog resources, choose Named Data Catalog resources.
For Catalogs, choose the subcatalog that you created when you integrated your table bucket, for example,`account-id`:s3tablescatalog/`amzn-s3-demo-bucket`.
For Databases, choose the S3 table bucket namespace that you created.
(Optional) For Tables, choose the S3 table that you created in your table bucket.

Note

If you're creating a new table in the Athena query editor, don't select a table. 8. Do one of the following:

If you specified a table in the prior step, for Table permissions, choose Super.
If you didn't specify a table in the prior step, go to Database permissions. For cross-account data sharing, you can't chooseSuper to grant the other principal all permissions on your database. Instead, choose more fine-grained permissions, such asDescribe.

Choose Grant.

CLI

Make sure that you're running the following AWS CLI commands as a data lake administrator. For more information, see Create a data lake administrator in the AWS Lake Formation Developer Guide.
Run the following command to grant Lake Formation permissions on table in S3 table bucket to an IAM principal to access the table. To use this example, replace the `user input placeholders` with your own information.

aws lakeformation grant-permissions \  
--region us-east-1 \  
--cli-input-json \  
'{  
    "Principal": {  
        "DataLakePrincipalIdentifier": "user or role ARN, for example, arn:aws:iam::account-id:role/example-role"  
    },  
    "Resource": {  
        "Table": {  
            "CatalogId": "account-id:s3tablescatalog/amzn-s3-demo-bucket",  
            "DatabaseName": "S3 table bucket namespace, for example, test_namespace",  
            "Name": "S3 table bucket table name, for example test_table"  
        }  
    },  
    "Permissions": [  
        "ALL"  
    ]  
}'