Apache Iceberg Support in Collate

Apache Iceberg is an open table format for huge analytic datasets. Collate provides comprehensive support for Iceberg tables, enabling you to discover, profile, and track lineage for your Iceberg-based data assets.

How Collate Supports Iceberg Tables

Collate supports Apache Iceberg tables through multiple approaches, depending on where and how your Iceberg tables are managed:

1. Through Query Engines & Data Platforms

For most users, the recommended approach is to use Collate’s existing connectors for the query engine or data platform that accesses your Iceberg tables. This provides full metadata ingestion, profiling, and lineage capabilities without requiring a separate Iceberg connector.

2. Direct Iceberg Catalog Connection

For advanced use cases or when Iceberg tables are not accessible through a supported query engine, Collate provides a dedicated Iceberg connector that connects directly to Iceberg catalogs.

Connectors with Iceberg Support

Production Connectors

The following connectors provide full production support for Iceberg tables:

Databricks

Databricks Connector

Best for: Iceberg tables managed in Databricks or Unity CatalogSupported Features:

Metadata Ingestion
Query Usage
Lineage & Column-level Lineage
Data Profiler
Data Quality
Sample Data
Auto-Classification

Collate supports ingestion and profiling of Iceberg tables through Databricks. If your Iceberg tables are accessible via Databricks (or Unity Catalog), use this connector — it provides full metadata ingestion and profiler support for Iceberg tables without requiring a separate Iceberg connector.Learn more →

Snowflake

Snowflake Connector

Best for: Iceberg tables managed in SnowflakeSupported Features:

Metadata Ingestion
Query Usage
Lineage & Column-level Lineage
Data Profiler
Data Quality
Stored Procedures
Tags
Sample Data
Auto-Classification

Collate supports ingestion and profiling of Iceberg tables through Snowflake. If your Iceberg tables are accessible via Snowflake, use this connector — it provides full metadata ingestion and profiler support for Iceberg tables without requiring a separate Iceberg connector.Learn more →

Dedicated Iceberg Connector

For direct catalog access, Collate provides a dedicated Iceberg connector that supports multiple catalog backends.

Supported Catalog Types

Hive Catalog

Connect to Iceberg tables using Hive Metastore as the catalog backend.Configuration Requirements:

Hive Metastore URI
Authentication credentials
Warehouse location

REST Catalog

Connect to Iceberg tables using a REST catalog API.Configuration Requirements:

REST catalog endpoint URL
Authentication tokens (if required)
Warehouse location

AWS Glue Catalog

Connect to Iceberg tables using AWS Glue Data Catalog as the backend.Configuration Requirements:

AWS credentials (Access Key ID, Secret Access Key)
AWS region
Warehouse location (S3 path)

DynamoDB Catalog

Connect to Iceberg tables using Amazon DynamoDB as the catalog backend.Configuration Requirements:

DynamoDB table name
AWS credentials
AWS region
Warehouse location

File System Support

The Iceberg connector supports the following file systems for table data:

Local File System: For development and testing
Amazon S3: Production deployments with S3-based data lakes
Azure Blob Storage: Azure-based data lake deployments

Key Features

Custom Catalog Naming: Configure catalog names to match your organization’s naming conventions
Warehouse Location: Specify the base path where Iceberg table data is stored
Owner Property Mapping: Map Iceberg table properties to Collate ownership metadata

Other Connectors with Iceberg Compatibility

While not explicitly marketed as Iceberg-first connectors, the following Collate connectors may work with Iceberg tables through their respective query engines:

Trino

Query Iceberg tables through Trino’s Iceberg connector

Presto

Query Iceberg tables through Presto’s Iceberg connector

Dremio

Access Iceberg tables through Dremio’s data lakehouse platform

AWS Glue

Manage Iceberg table metadata through AWS Glue Data Catalog

Choosing the Right Approach

Use this decision tree to select the best connector for your Iceberg tables:

Are your Iceberg tables in Databricks?

Yes → Use the Databricks connectorNo → Continue to Step 2

Are your Iceberg tables in Snowflake?

Yes → Use the Snowflake connectorNo → Continue to Step 3

Do you query Iceberg tables through Trino, Presto, or Dremio?

Yes → Use the respective connector (Trino, Presto, Dremio)No → Continue to Step 4

Do you need direct catalog access?

Yes → Use the dedicated Iceberg connector (contact support for setup)No → Contact Collate support to discuss your use case

Need Help?

If you’re unsure which connector to use for your Iceberg tables, or if you’re experiencing issues with Iceberg table ingestion:

Email: support@getcollate.io
Documentation: Browse our connector guides
Community: Join discussions on our community forums

Database Connectors

Explore all available database connectors

Data Profiler

Learn about data profiling for your Iceberg tables

Data Lineage

Track lineage for Iceberg-based data pipelines

Data Quality

Set up data quality tests for Iceberg tables

Connectors

Connectors

​Apache Iceberg Support in Collate

​How Collate Supports Iceberg Tables

​1. Through Query Engines & Data Platforms

​2. Direct Iceberg Catalog Connection

​Connectors with Iceberg Support

​Production Connectors

​Databricks