About this event
10 years of Arrow. 30 minutes to understand why it's everywhere.
If you work with modern data infrastructure, Arrow is almost certainly running somewhere in your stack. Most engineers never notice it.
Arrow solved a real problem: moving data between systems required serializing and deserializing at every boundary. CPU cycles, memory copies, latency. At scale, that cost compounds fast. Arrow's solution was a language-agnostic columnar memory format any system could share without copying. What started as a memory layout spec became the execution substrate of the modern data stack.
In this 30-minute session, Badal Singh, who has contributed to Apache Iceberg Go and built OLake's Arrow-based ingestion writer at 550,000+ rows/second, will cover:
Hosted by
Badal is a Software Engineer at Datazip working on distributed data systems and lakehouse infrastructure. His day-to-day involves building high-performance data writers and working deep in the internals of Apache Iceberg, Apache Arrow, and storage engines. He contributed to Apache Iceberg Go, implementing a partitioned table writer using Apache Arrow. At Datazip, he built V0 of OLake's Arrow-based Full Load and CDC writer for ingestion into Apache Iceberg tables — pushing full load throughput beyond 550,000 rows per second. He doesn't just use Arrow. He builds on top of it.
Harsha is a user-first GTM specialist at Datazip, transforming early-stage startups from zero to one. With a knack for technical market strategy and a startup enthusiast's mindset, she bridges the gap between innovative solutions and meaningful market adoption.
OLake is an open-source data ingestion tool available on GitHub, developed by Datazip, Inc. Its primary function is to replicate data from transactional databases and streaming platforms (like PostgreSQL, MySQL, MongoDB, Oracle, and Kafka) into open data lakehouse formats, like Apache Iceberg.