Best practices for migrating to Apache Iceberg

Thursday, November 21 2024 at 8:30 pm (IST)

About 1 hour

About this event

[Key highlights]

Diving into File formats, compression strategies, and write patterns
Practical guidance on Merge on Read (MoR) vs Copy on Write (CoW) implementation
Essential configurations for maintenance and monitoring
Benchmarks [duration & cost] compared with Amazon EMR Trino, Snowflake, Snowflake Iceberg, Starburst, Athena

[Further learn]

How to select tables for migration and assess critical queries
Optimal compaction strategies (BinPack, Sort, Z-order)
Key configurations for production deployment
Monitoring best practices using Iceberg virtual tables

Hosted by

External speaker

E
Yonatan Dolan Principal Analytics Specialist @ AWS

Yonatan Dolan, a Principal Analytics Specialist at AWS, focusing on Big Data & Analytics in Israel. He's an Apache Iceberg evangelist and actively drives data lake innovations. Before AWS, he led Intel's Pharma Analytics Platform, developing edge-to-cloud AI solutions for clinical trials, and spent 9 years driving advanced analytics projects at Intel.
External speaker

E
Amit Gilad Data Engineer

Amit Gilad, a Data Engineer who's been actively working with Apache Iceberg and data lakes. Currently leading data engineering in stealth, he previously worked as a data engineer at Cloudinary. He has hands-on experience with EMR, Athena, and Spark, and recently shared insights about Iceberg implementations without Spark at the Chill Data Summit.
Team member

T
Harsha Kalbalia GTM @ Datazip | Founding Member @ Datazip

Harsha is a user-first GTM specialist at Datazip, transforming early-stage startups from zero to one. With a knack for technical market strategy and a startup enthusiast's mindset, she bridges the gap between innovative solutions and meaningful market adoption.

OLake by Datazip

Fastest way to replicate your data to Apache Iceberg.

OLake is an open-source data ingestion tool available on GitHub, developed by Datazip, Inc. Its primary function is to replicate data from transactional databases and streaming platforms (like PostgreSQL, MySQL, MongoDB, Oracle, and Kafka) into open data lakehouse formats, like Apache Iceberg.

View all events

Share this event

Copy permalink