Duckberg!
I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv dependencies = [ "duckberg>=0.3.1", "pyarrow>=19.0.1", ] Code from duckberg import DuckBerg def main(): region = "eu-central-1" catalog_config: dict[str, str] = { "type": "rest", # Iceberg catalog type "uri": f"https://glue.{region}.amazonaws.com/iceberg", "rest.sigv4-enabled": "true", "rest.signing-name": "glue", "rest.signing-region": region } db = DuckBerg( catalog_name="aws_glue", database_names=["ibtest"], catalog_config=catalog_config) print(db.list_tables()) query = "SELECT * FROM 'ibtest.ibtest1'" df = db.select(sql=query).read_pandas() print(df) if __name__ == "__main__": main() Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region) Catalog_name = any name Database_name = your glue database Table_name = your glue tables ['ibtest.ibtest1'] id name created 0 001 test 2024-12-22 13:48:31.381 Note * There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes. This could be a nice option to add sqlglot here. As an advanced sql parsing library.

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api
This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg
I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv
dependencies = [
"duckberg>=0.3.1",
"pyarrow>=19.0.1",
]
Code
from duckberg import DuckBerg
def main():
region = "eu-central-1"
catalog_config: dict[str, str] = {
"type": "rest", # Iceberg catalog type
"uri": f"https://glue.{region}.amazonaws.com/iceberg",
"rest.sigv4-enabled": "true",
"rest.signing-name": "glue",
"rest.signing-region": region
}
db = DuckBerg(
catalog_name="aws_glue",
database_names=["ibtest"],
catalog_config=catalog_config)
print(db.list_tables())
query = "SELECT * FROM 'ibtest.ibtest1'"
df = db.select(sql=query).read_pandas()
print(df)
if __name__ == "__main__":
main()
Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region)
- Catalog_name = any name
- Database_name = your glue database
- Table_name = your glue tables
['ibtest.ibtest1']
id name created
0 001 test 2024-12-22 13:48:31.381
- Note *
There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes.
This could be a nice option to add sqlglot here. As an advanced sql parsing library.