Parallel Data Laboratory (original) (raw)

Big Learning (Systems for ML)

Data analytics and AI applications have emerged as a primary data processing activity for business, science, and online services that attempt to extract insight from quantities of observation data. Increasingly, such analytics center on statistical machine learning (ML), in which an algorithm determines model parameters that make a chosen statistical model best fit the training data. Once fit (trained), such models can expose relationships among data items (e.g., for grouping documents into topics), predict outcomes for new data items based on selected characteristics (e.g., for recommendation systems), correlate effects with causes (e.g., for genomic analyses of diseases), and so on.

Growth in data sizes and desired model precision generally dictates parallel execution of ML algorithms on clusters of servers. Naturally, parallel ML involves the same work distribution, synchronization and data consistency challenges as other parallel computations. The PDL big-learning group has attacked these challenges in the past 10 years, creating and refining powerful new approaches for supporting large-scale ML on Big Data. Current focuses include cluster scheduling for DNN jobs, efficient ML data pipelines, automated cloud AI support, and efficient use of heterogeneous accelerators. This short article overviews an inter-related collection of some of our prior efforts in this space. Click here for a short article overviewing an inter-related collection of some of our prior work in this space.

For more information on some of our ongoing work, see:

People

FACULTY

Greg Ganger
Phil Gibbons
Garth Gibson
Eric Xing

GRAD STUDENTS Jinliang Wei
Aaron Harlap
Jin Kyu Kim
Henggang Cui
Wei Dai
Qirong Ho
James Cipar

Partners

Publications

Presentations

Acknowledgements

We thank the members and companies of the PDL Consortium: Amazon, Bloomberg LP, Datadog, Google, Honda, Intel Corporation, Jane Street, LayerZero Research, Meta, Microsoft Research, Oracle Corporation, Oracle Cloud Infrastructure, Pure Storage, Salesforce, Samsung Semiconductor Inc., and Western Digital for their interest, insights, feedback, and support.