[cs-talks] John Liagouris, Friday March 31st, 11:00am @ MCS 148
Harrington, Jacob Walter
jwharrin at bu.edu
Thu Mar 30 16:10:17 EDT 2017
Understanding Distributed Dataflow Systems
John Liagouris, Post-Doctoral Researcher, ETH Zurich
Friday March 31st, 11am – 12:30pm, MCS 148
In this talk I will present our recent work on understanding distributed dataflow systems like Apache Spark, Apache Flink, and Google’s TensorFlow. The first part of the talk will focus on understanding the semantics of distributed dataflows: Why does a dataflow return certain results and how should output explanations look like? To answer such questions, we leverage existing work in data provenance, and we advance the state-of-the-art to provide output explanations that are both sufficient and concise. The second part of the talk will focus on understanding the performance of distributed dataflows: Why is a dataflow execution slow and which are the bottlenecks in the pipeline? To answer such questions, we leverage existing work on critical path methods, and we advance the state-of-the-art to analyse the performance of dynamic and continuous computations in near-real time. We have implemented our ideas in a prototype system, Strymon, that builds on top of the novel Timely Dataflow framework written in Rust. Strymon’s ultimate goal is to provide fast and meaningful insights into complex enterprise datacenters by processing logs of events collected at all levels of the software and hardware stack in real time.
John Liagouris is a post-doctoral researcher at ETH Zurich, and a member of the Systems Group. Before joining ETHZ, he was a visiting research fellow at the University of Hong Kong (2013-2014), and a research assistant at the Institute for the Management of Information Systems (IMIS) of the Research and Innovation Center 'Athena', Greece (2009-2015). Dr. Liagouris obtained a diploma in Electrical and Computer Engineering in 2008, and a PhD in 2015, both from NTU Athens, Greece. His research interests lie in the areas of datacenter monitoring, modelling and simulation, real-time analytics, graph data management, distributed system profiling, and software defined networks.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cs-talks