Polaris Catalog

Get started with Polaris Catalog in LocalStack for Snowflake

Introduction

Polaris Catalog is a unified data catalog that provides a single view of all your data assets across Snowflake and external sources. It enables you to discover, understand, and govern your data assets, making it easier to find and use the right data for your analytics and machine learning projects.

The Snowflake emulator supports creating Iceberg tables with Polaris catalog. Currently, CREATE CATALOG INTEGRATION is supported by LocalStack. LocalStack also provides a localstack/polaris Docker image that can be used to create a local Polaris REST catalog.

Getting started

This guide is designed for users new to Iceberg tables with Polaris catalog and assumes basic knowledge of SQL and Snowflake. Start your Snowflake emulator and connect to it using an SQL client in order to execute the queries further below.

This guide shows how to use the Polaris REST catalog to create Iceberg tables in the Snowflake emulator, by:

  • Launching the Polaris Catalog service
  • Setting up an external volume
  • Creating a catalog integration
  • Creating an Iceberg table
  • Querying the Iceberg table

Start Polaris catalog container

The following command starts the Polaris catalog container using the localstack/polaris Docker image:

docker run -d --name polaris-test \
  -p 8181:8181 -p 8182:8182 \
  -e AWS_REGION=us-east-1 \
  -e AWS_ACCESS_KEY_ID=test \
  -e AWS_SECRET_ACCESS_KEY=test \
  -e AWS_ENDPOINT_URL=http://localhost:4566 \
  -e POLARIS_BOOTSTRAP_CREDENTIALS=default-realm,root,s3cr3t \
  -e polaris.realm-context.realms=default-realm \
  -e quarkus.otel.sdk.disabled=true \
  localstack/polaris:latest

Wait for Polaris to become healthy:

curl -X GET http://localhost:8182/health

Authenticate and create Polaris catalog

Set variables and retrieve an access token:

REALM="default-realm"
CLIENT_ID="root"
CLIENT_SECRET="s3cr3t"
BUCKET_NAME="test-bucket-$(openssl rand -hex 4)"
CATALOG_NAME="polaris"

TOKEN=$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
  -H "Polaris-Realm: $REALM" \
  -d "grant_type=client_credentials&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&scope=PRINCIPAL_ROLE:ALL" | jq -r '.access_token')

The TOKEN variable will contain the access token.

Create a catalog:

curl -s -X POST http://localhost:8181/api/management/v1/catalogs \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "catalog": {
      "name": "'"$CATALOG_NAME"'",
      "type": "INTERNAL",
      "properties": {
        "default-base-location": "s3://'"$BUCKET_NAME"'/test"
      },
      "storageConfigInfo": {
        "storageType": "S3_COMPATIBLE",
        "allowedLocations": ["s3://'"$BUCKET_NAME"'/"],
        "s3.roleArn": "arn:aws:iam::000000000000:role/'"$BUCKET_NAME"'",
        "region": "us-east-1",
        "s3.pathStyleAccess": true,
        "s3.endpoint": "http://localhost:4566"
      }
    }
  }'

Grant necessary permissions to the catalog:

curl -s -X PUT http://localhost:8181/api/management/v1/catalogs/polaris/catalog-roles/catalog_admin/grants \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type": "catalog", "privilege": "TABLE_WRITE_DATA"}'

Create a bucket

Create a bucket using the awslocal command:

awslocal s3 mb s3://$BUCKET_NAME

Create an external volume

In your SQL client, create an external volume using the CREATE EXTERNAL VOLUME statement:

CREATE EXTERNAL VOLUME polaris_volume
STORAGE_LOCATIONS = (
  (
    NAME = aws_s3_test
    STORAGE_PROVIDER = S3
    STORAGE_BASE_URL = 's3://test-bucket/'
    STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/test-bucket'
    ENCRYPTION = (TYPE = AWS_SSE_S3)
  )
)
ALLOW_WRITES = TRUE;

Create catalog integration

Create a catalog integration using the CREATE CATALOG INTEGRATION statement:

CREATE CATALOG INTEGRATION polaris_catalog
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'test_namespace'
REST_CONFIG = (
  CATALOG_URI = 'http://localhost:8181',
  CATALOG_NAME = 'polaris'
)
REST_AUTHENTICATION = (
  TYPE = OAUTH,
  OAUTH_CLIENT_ID = 'root',
  OAUTH_CLIENT_SECRET = 's3cr3t',
  OAUTH_ALLOWED_SCOPES = (PRINCIPAL_ROLE:ALL)
)
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = 60
COMMENT = 'Polaris catalog integration';

Create and query an Iceberg table

Now create the table using the Polaris catalog and volume:

CREATE ICEBERG TABLE polaris_iceberg_table (c1 TEXT)
CATALOG = 'polaris_catalog',
EXTERNAL_VOLUME = 'polaris_volume',
BASE_LOCATION = 'test/test_namespace';

Insert and query data:

INSERT INTO polaris_iceberg_table(c1) VALUES ('test'), ('polaris'), ('iceberg');

SELECT * FROM polaris_iceberg_table;

The output should be:

+----------+
| c1       |
|----------|
| iceberg  |
| foobar   |
| test     |
+----------+

All data will be persisted under:

awslocal s3 ls s3://$BUCKET_NAME/test/test_namespace/

You will see:

  • data/ with .parquet files
  • metadata/ with Iceberg metadata files

Configuration options

The following configuration options are available for the Polaris Catalog Docker image provided by LocalStack:

Environment VariableDescriptionDefault ValueRequired
AWS_REGIONThe AWS region to useus-east-1Yes
AWS_ACCESS_KEY_IDAWS access key ID for accessing AWS services-Yes when using AWS services
AWS_SECRET_ACCESS_KEYAWS secret access key for accessing AWS services-Yes when using AWS services
AWS_ENDPOINT_URLCustom endpoint URL for AWS services (e.g., for LocalStack)-No
POLARIS_BOOTSTRAP_CREDENTIALSInitial realm, username, and password in format: realm,username,password-Yes
polaris.realm-context.realmsList of realms to create/use-Yes
quarkus.otel.sdk.disabledDisable OpenTelemetry SDKfalseNo

The following logging options are available for the Polaris Catalog Docker image:

Logging OptionDescription
quarkus.log.levelSets the overall logging level (e.g., DEBUG)
quarkus.log.console.levelSets the console logging level (e.g., DEBUG)
quarkus.log.category."org.apache.polaris".levelSets the logging level specifically for the Polaris components
quarkus.log.category."org.apache.polaris".min-levelSets the minimum logging level for the Polaris components (e.g., TRACE)