OLake by Datazip invites you to their event

Mastering Iceberg Table Maintenance: Balancing Cost, Performance & Scalability

About this event

Webinar Overview

Apache Iceberg has quickly become the backbone of modern data lakes, but maintaining tables efficiently is just as critical as building them. This session dives into the art of Iceberg table maintenance, from compaction strategies to metadata cleanup, with a focus on balancing query performance and compute cost. Attendees will walk away with actionable strategies and best practices to keep their Iceberg tables lean, fast, and future-proof.


Who Should Attend

  • Data Engineers managing large-scale Iceberg deployments
  • Platform Engineers optimizing lakehouse infrastructure costs
  • Data Architects designing scalable data lake solutions
  • DevOps Engineers responsible for data pipeline maintenance
  • Technical Leaders overseeing data platform performance and budgets

Webinar Agenda

  1. Introduction & The Maintenance Challenge- Why Iceberg table maintenance is critical for production data lakes
  2. Compaction Strategies Deep Dive- Bin-packing vs. Sorting vs. Z-ordering and when to use each approach
  3. Metadata & Snapshot Management- Snapshot expiration policies, orphan file cleanup, and manifest rewrites
  4. File Layout Optimization- Solving the small file problem and right-sizing files for optimal performance
  5. Cost-Performance Optimization Framework- Measuring ROI of maintenance operations and scheduling strategies
  6. QnA

Hosted by

  • Guest speaker
    G
    Amit Gilad Data Engineer

    Amit Gilad, a Data Engineer who's been actively working with Apache Iceberg and data lakes. Currently leading data engineering in stealth, he previously worked as a data engineer at Cloudinary. He has hands-on experience with EMR, Athena, and Spark, and recently shared insights about Iceberg implementations without Spark at the Chill Data Summit.

  • Team member
    T
    Harsha Kalbalia GTM @ Datazip | Founding Member @ Datazip

    Harsha is a user-first GTM specialist at Datazip, transforming early-stage startups from zero to one. With a knack for technical market strategy and a startup enthusiast's mindset, she bridges the gap between innovative solutions and meaningful market adoption.

  • Team member
    T
    Akshay Sharma DevRel @ Datazip

    Developer Advocate at Datazip, helping engineers and contributors adopt open lakehouse technologies. I manage our contributor community and showcase how OLake delivers the fastest data replication framework to teams building at scale.

OLake by Datazip

Fastest way to replicate your data to Apache Iceberg.

OLake is an open-source data ingestion tool available on GitHub, developed by Datazip, Inc. Its primary function is to replicate data from transactional databases and streaming platforms (like PostgreSQL, MySQL, MongoDB, Oracle, and Kafka) into open data lakehouse formats, like Apache Iceberg.