Duckberg!

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv dependencies = [ "duckberg>=0.3.1", "pyarrow>=19.0.1", ] Code from duckberg import DuckBerg def main(): region = "eu-central-1" catalog_config: dict[str, str] = { "type": "rest", # Iceberg catalog type "uri": f"https://glue.{region}.amazonaws.com/iceberg", "rest.sigv4-enabled": "true", "rest.signing-name": "glue", "rest.signing-region": region } db = DuckBerg( catalog_name="aws_glue", database_names=["ibtest"], catalog_config=catalog_config) print(db.list_tables()) query = "SELECT * FROM 'ibtest.ibtest1'" df = db.select(sql=query).read_pandas() print(df) if name == "main": main() Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region) Catalog_name = any name Database_name = your glue database Table_name = your glue tables ['ibtest.ibtest1'] id name created 0 001 test 2024-12-22 13:48:31.381 Note * There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes. This could be a nice option to add sqlglot here. As an advanced sql parsing library.

Mar 12, 2025 - 18:44

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api

This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg

I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv

dependencies = [
  "duckberg>=0.3.1",
  "pyarrow>=19.0.1",
]

Code

from duckberg import DuckBerg


def main():
    region = "eu-central-1"


    catalog_config: dict[str, str] = {
        "type": "rest", # Iceberg catalog type 
        "uri": f"https://glue.{region}.amazonaws.com/iceberg", 
        "rest.sigv4-enabled": "true",
        "rest.signing-name": "glue",
        "rest.signing-region": region
    }

    db = DuckBerg(
        catalog_name="aws_glue",
        database_names=["ibtest"],
        catalog_config=catalog_config)



    print(db.list_tables())

    query = "SELECT * FROM 'ibtest.ibtest1'"
    df = db.select(sql=query).read_pandas()
    print(df)



if __name__ == "__main__":
    main()

Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region)

Catalog_name = any name
Database_name = your glue database
Table_name = your glue tables

['ibtest.ibtest1']
    id  name                 created
0  001  test 2024-12-22 13:48:31.381

Note *

There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes.

This could be a nice option to add sqlglot here. As an advanced sql parsing library.

Duckberg!

Code

Tags:

Related Posts

Popular Posts

Recommended Posts