Skip to main content

Apache Iceberg Support in Collate

Apache Iceberg is an open table format for huge analytic datasets. Collate provides comprehensive support for Iceberg tables, enabling you to discover, profile, and track lineage for your Iceberg-based data assets.

How Collate Supports Iceberg Tables

Collate supports Apache Iceberg tables through multiple approaches, depending on where and how your Iceberg tables are managed:

1. Through Query Engines & Data Platforms

For most users, the recommended approach is to use Collate’s existing connectors for the query engine or data platform that accesses your Iceberg tables. This provides full metadata ingestion, profiling, and lineage capabilities without requiring a separate Iceberg connector.

2. Direct Iceberg Catalog Connection

For advanced use cases or when Iceberg tables are not accessible through a supported query engine, Collate provides a dedicated Iceberg connector that connects directly to Iceberg catalogs.

Connectors with Iceberg Support

Production Connectors

The following connectors provide full production support for Iceberg tables:

Databricks

Snowflake


Dedicated Iceberg Connector

For direct catalog access, Collate provides a dedicated Iceberg connector that supports multiple catalog backends.

Supported Catalog Types

Connect to Iceberg tables using Hive Metastore as the catalog backend.Configuration Requirements:
  • Hive Metastore URI
  • Authentication credentials
  • Warehouse location
Connect to Iceberg tables using a REST catalog API.Configuration Requirements:
  • REST catalog endpoint URL
  • Authentication tokens (if required)
  • Warehouse location
Connect to Iceberg tables using AWS Glue Data Catalog as the backend.Configuration Requirements:
  • AWS credentials (Access Key ID, Secret Access Key)
  • AWS region
  • Warehouse location (S3 path)
Connect to Iceberg tables using Amazon DynamoDB as the catalog backend.Configuration Requirements:
  • DynamoDB table name
  • AWS credentials
  • AWS region
  • Warehouse location

File System Support

The Iceberg connector supports the following file systems for table data:
  • Local File System: For development and testing
  • Amazon S3: Production deployments with S3-based data lakes
  • Azure Blob Storage: Azure-based data lake deployments

Key Features

  • Custom Catalog Naming: Configure catalog names to match your organization’s naming conventions
  • Warehouse Location: Specify the base path where Iceberg table data is stored
  • Owner Property Mapping: Map Iceberg table properties to Collate ownership metadata

Other Connectors with Iceberg Compatibility

While not explicitly marketed as Iceberg-first connectors, the following Collate connectors may work with Iceberg tables through their respective query engines:

Choosing the Right Approach

Use this decision tree to select the best connector for your Iceberg tables:
1

Are your Iceberg tables in Databricks?

Yes → Use the Databricks connectorNo → Continue to Step 2
2

Are your Iceberg tables in Snowflake?

Yes → Use the Snowflake connectorNo → Continue to Step 3
3

Do you query Iceberg tables through Trino, Presto, or Dremio?

Yes → Use the respective connector (Trino, Presto, Dremio)No → Continue to Step 4
4

Do you need direct catalog access?

Yes → Use the dedicated Iceberg connector (contact support for setup)No → Contact Collate support to discuss your use case

Need Help?

If you’re unsure which connector to use for your Iceberg tables, or if you’re experiencing issues with Iceberg table ingestion: