Category: Uncategorized

Paper accepted to ACM SoCC’13

Our paper on debugging systems for data-intensive analytics got accepted to the ACM Symposium on Cloud Computing. The paper presents Newt, a scalable architecture for capturing and querying data lineage information, to find and resolve errors in processing pipelines.

Newt provides a flexible instrumentation API that allows system developers to collect fine-grain lineage from a range of data intensive scalable computing (DISC) architectures. Newt pairs this API with a scale-out, fault-tolerant lineage store and query engine.

Until the camera-ready version, take a look at the technical report here.

Paper at CIKM/CloudDB’12

Large-scale graphs, such as social networks, are highly dynamic and the ability to mine them in real time can enable better services, such as timely friend and content recommendation on social networks. In this paper, we present the basic concept behind our real-time graph mining system. We show how the technique of memoization can be used to transparently compute graph algorithms in an incremental fashion, speeding up the computation when the input graph changes.

Our paper at the CICM/CloudDB’12 workshop presents an initial study and some early results on our approach. Find a copy here.