OLake by Datazip invites you to their event

Apache Arrow: The In-Memory Layer Your Iceberg, Spark, and Parquet Stack Depends On

About this event

10 years of Arrow. 30 minutes to understand why it's everywhere.

If you work with modern data infrastructure, Arrow is almost certainly running somewhere in your stack. Most engineers never notice it.

Arrow solved a real problem: moving data between systems required serializing and deserializing at every boundary. CPU cycles, memory copies, latency. At scale, that cost compounds fast. Arrow's solution was a language-agnostic columnar memory format any system could share without copying. What started as a memory layout spec became the execution substrate of the modern data stack.

In this 30-minute session, Badal Singh, who has contributed to Apache Iceberg Go and built OLake's Arrow-based ingestion writer at 550,000+ rows/second, will cover:

  • From niche interoperability project to de-facto standard: Apache Arrow's 10-year journey
  • What Arrow actually is beyond "columnar in-memory format" and why that definition undersells it
  • How zero-copy data sharing eliminates serialization overhead and what that means for pipeline performance
  • Where Arrow runs today: Spark, Pandas, ClickHouse, Polars, and inside open table formats like Apache Iceberg Go
  • What's next: Arrow Flight, ADBC, nanoarrow, and the ecosystem reshaping how data systems talk to each other

Hosted by

  • External speaker
    E
    Badal Singh Software Engineer @ Datazip

    Badal is a Software Engineer at Datazip working on distributed data systems and lakehouse infrastructure. His day-to-day involves building high-performance data writers and working deep in the internals of Apache Iceberg, Apache Arrow, and storage engines. He contributed to Apache Iceberg Go, implementing a partitioned table writer using Apache Arrow. At Datazip, he built V0 of OLake's Arrow-based Full Load and CDC writer for ingestion into Apache Iceberg tables — pushing full load throughput beyond 550,000 rows per second. He doesn't just use Arrow. He builds on top of it.

  • Team member
    T
    Sandeep Devarapalli Co-founder and CEO @ Datazip, Inc.
  • Team member
    T
    Harsha Kalbalia GTM @ Datazip | Founding Member @ Datazip

    Harsha is a user-first GTM specialist at Datazip, transforming early-stage startups from zero to one. With a knack for technical market strategy and a startup enthusiast's mindset, she bridges the gap between innovative solutions and meaningful market adoption.

OLake by Datazip

Fastest way to replicate your data to Apache Iceberg.

OLake is an open-source data ingestion tool available on GitHub, developed by Datazip, Inc. Its primary function is to replicate data from transactional databases and streaming platforms (like PostgreSQL, MySQL, MongoDB, Oracle, and Kafka) into open data lakehouse formats, like Apache Iceberg.