1st Workshop on Data Management for End-to-End Machine Learning


The 1st Workshop on Data Management for End-to-End Machine Learning, May 14, 2017.

Held in conjunction with ACM SIGMOD 2017
Raleigh, NC, USA, May 14-19, 2017


Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.

For example, data preprocessing and feature extraction workloads result in complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen, for that often a set of popular approaches such as linear models, decision trees and deep neural networks have to be tried out on the problem at hand. The prediction quality of such ML models heavily depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process, that poses huge opportunities for parallelization and optimization. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions, while still allowing for the lifecycle of models (that become stale over time) to be managed. As a further complication, the resulting systems need to take the target audience of ML applications into account; this audience is very heterogenous, ranging from analysts without programming skills that possibly prefer an easy-to-use cloud-based solution on the one hand, to teams of data processing experts and statisticians developing and deploying custom-tailored algorithms on the other hand.

Therefore, DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management, and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits *regular research papers describing preliminary and ongoing research results*. In addition, the workshop encourages the submission of *industrial experience reports of end-to-end ML deployments*. Submissions can either be *short papers (4 pages)* or *long papers (up to 10 pages)* following the ACM proceedings format, as described in https://www.acm.org/publications/proceedings-template.

Areas of particular interest for the workshop include (but are not limited to):

– Data Management in Machine Learning Applications
– Definition, Execution, and Optimization of Complex ML Pipelines
– Systems for Managing the Lifecycle of Machine Learning Models
– Systems for Efficient Hyperparameter Search and Feature Selection
– Machine Learning Services in the Cloud
– Modeling, Storage, and Lineage of ML experimentation data
– Integration of Machine Learning and Dataflow Systems
– Integration of Machine Learning and ETL Processing
– Benchmarking of Machine Learning Applications
– Definition and Execution of Complex Ensemble Predictors
– Architectures for Streaming Machine Learning


Papers submission deadline: February 1, 2017
Authors notification: March 1, 2017
Deadline for camera-ready copy: March 20, 2017
Workshop: Sunday May 14th, 2017


The workshop will have two tracks for regular research papers (including research in progress) and industrial papers (e.g., industrial experience reports of end-to-end ML deployments). Submissions can either be *short papers (4 pages)* or *long papers (up to 10 pages)* following the ACM proceedings format, as described in https://www.acm.org/publications/proceedings-template.


The workshop proceedings will be published in ACM Digital Library.


– Sebastian Schelter (Amazon)
– Reza Zadeh (Stanford & Matroid)
– Markus Weimer (Microsoft)
– Rajeev Rastogi (Amazon)
– Volker Markl (TU Berlin)


– Sunita Sarawagi (IIT Bombay)
– Sudip Roy (Google)
– Rainer Gemulla (University of Mannheim)
– Matthias Boehm (IBM Research)
– Matthias Seeger (Amazon)
– Evan Sparks (UC Berkeley)
– Chris Ré (Stanford)
– Ted Dunning (MapR Technologies)
– Dionysios Logothetis (Facebook)
– Nedelina Teneva (University of Chicago)
– Vasia Kalavri (KTH Stockholm)
– Venu Satuluri (Twitter)
– Shannon Quinn (University of Georgia)
– Dmitriy Lyubimov (Apache Mahout)
– Tilmann Rabl (TU Berlin)
– Max Heimel (Snowflake)
– Felix Biessmann (Amazon)
– Arun Kumar (UC San Diego)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s