Mastering Iceberg Table Maintenance: Balancing Cost, Performance & Scalability

Thursday, September 25 2025 at 8:30 pm (IST)

About 1 hour

About this event

Webinar Overview

Apache Iceberg has quickly become the backbone of modern data lakes, but maintaining tables efficiently is just as critical as building them. This session dives into the art of Iceberg table maintenance, from compaction strategies to metadata cleanup, with a focus on balancing query performance and compute cost. Attendees will walk away with actionable strategies and best practices to keep their Iceberg tables lean, fast, and future-proof.

Who Should Attend

Data Engineers managing large-scale Iceberg deployments
Platform Engineers optimizing lakehouse infrastructure costs
Data Architects designing scalable data lake solutions
DevOps Engineers responsible for data pipeline maintenance
Technical Leaders overseeing data platform performance and budgets

Webinar Agenda

Introduction & The Maintenance Challenge- Why Iceberg table maintenance is critical for production data lakes
Compaction Strategies Deep Dive- Bin-packing vs. Sorting vs. Z-ordering and when to use each approach
Metadata & Snapshot Management- Snapshot expiration policies, orphan file cleanup, and manifest rewrites
File Layout Optimization- Solving the small file problem and right-sizing files for optimal performance
Cost-Performance Optimization Framework- Measuring ROI of maintenance operations and scheduling strategies
QnA

Hosted by

External speaker

E
Amit Gilad Data Engineer

Amit Gilad, a Data Engineer who's been actively working with Apache Iceberg and data lakes. Currently leading data engineering in stealth, he previously worked as a data engineer at Cloudinary. He has hands-on experience with EMR, Athena, and Spark, and recently shared insights about Iceberg implementations without Spark at the Chill Data Summit.
Team member

T
Harsha Kalbalia GTM @ Datazip | Founding Member @ Datazip

Harsha is a user-first GTM specialist at Datazip, transforming early-stage startups from zero to one. With a knack for technical market strategy and a startup enthusiast's mindset, she bridges the gap between innovative solutions and meaningful market adoption.

OLake by Datazip

Fastest way to replicate your data to Apache Iceberg.

OLake is an open-source data ingestion tool available on GitHub, developed by Datazip, Inc. Its primary function is to replicate data from transactional databases and streaming platforms (like PostgreSQL, MySQL, MongoDB, Oracle, and Kafka) into open data lakehouse formats, like Apache Iceberg.

View all events

Share this event

Copy permalink