Pruna AI invites you to their event

Example Selection and Post-Training Quantization for Large-Scale Machine Learning

About this event

Discover how approximation with simple linear error feedback is transforming optimization and model compression in machine learning.

Join us to:

  1. Look at training example ordering for stochastic gradient descent, which has long been known to affect the convergence rate. We will develop a theoretical characterization of what it is about the example order that affects convergence, and use this to motivate GraB (gradient balancing), an efficient linear-error-feedback-based example selection algorithm that yields a theoretically optimal convergence rate that's faster than the classic random-reshuffling scheme.
  2. Look at post-training quantization (PTQ), an especially important task in the practice of Large Language Model (LLM) inference, where a trained model is compressed without any additional fine-tuning. A theoretical characterization of the accuracy of "adaptive" linear-feedback quantization schemes will motivate QuIP (Quantization with Incoherence Processing), a line of work on quantization that enables highly compressed (2-bit!) LLMs, and comes with theoretical error guarantees.
  3. Reflect on future work along these lines in machine learning systems.

Featuring:

  • Chris De Sa (Associate Professor in the Computer Science department at Cornell University, member of the Cornell Machine Learning Group, and Team Lead of Relax ML Lab)

Don't miss out on this chance!

Register now and stay ahead in ML innovation!

Hosted by

  • Team member
    T
    Bertrand Charpentier Founder, President, Chief Scientist @ Pruna AI

  • Guest speaker
    G
    Christopher De Sa Associate Professor & Team Lead @ Cornell University & Relax ML Lab

Pruna AI

The AI Inference Framework.

In one line of code, you can make your AI model 2- 10x cheaper, faster, smaller, and greener.