Our paper on debugging systems for data-intensive analytics got accepted to the ACM Symposium on Cloud Computing. The paper presents Newt, a scalable architecture for capturing and querying data lineage information, to find and resolve errors in processing pipelines.
Newt provides a ﬂexible instrumentation API that allows system developers to collect ﬁne-grain lineage from a range of data intensive scalable computing (DISC) architectures. Newt pairs this API with a scale-out, fault-tolerant lineage store and query engine.
Until the camera-ready version, take a look at the technical report here.
Consider submitting your work at the 1st Workshop on Large-Scale Recommender Systems (LSRS), which will be co-located with RecSys’13.
Consider submitting your work to this year’s IEEE International Conference on Peer-to-Peer Computing.
Large-scale graphs, such as social networks, are highly dynamic and the ability to mine them in real time can enable better services, such as timely friend and content recommendation on social networks. In this paper, we present the basic concept behind our real-time graph mining system. We show how the technique of memoization can be used to transparently compute graph algorithms in an incremental fashion, speeding up the computation when the input graph changes.
Our paper at the CICM/CloudDB’12 workshop presents an initial study and some early results on our approach. Find a copy here.