List

Workflow-centric tracing (also called end-to-end tracing or distributed-systems tracing) captures the work done within and among distributed-system components to service individual requests.  Due to its ability to provide deep visibility into complex distributed-system behaviors, it is rapidly being adopted by industry (e.g., by Facebook, Google, Yelp).  However, there is a dangerous belief both in academia and industry that a single workflow-centric tracing design can serve all of the use cases commonly attributed to it (e.g., diagnosing different types of problems, resource attribution).

For this paper, we teamed up with other academics and practitioners working on workflow-centric tracing to distill its key design axes.  For each axis, we identified design choices best suited for various tracing use cases.  We also discussed how seemingly innocuous design choices for different axes can lead to poor outcomes due to the way they interact with one other.

We have been trying to get this paper published for four years, so I’m very happy about this acceptance!  The initial technical report version of this paper, which we published in 2014, has already been cited by dozens of other research papers and covered in various “Papers We Love” meetups.

Leave a Reply

Your email address will not be published. Required fields are marked *

  Posts

December 20th, 2018

DOCC Lab celebrates its first end-of-semester dinner

My students and I celebrated the end of the Fall’18. with dinner at the Q restaurant in downtown Boston last […]

December 10th, 2018

Lily presented, “Logging what matters: The Pythia just-in-time instrumentation framework for distributed applications,” at the 2018 Observability summit

Lily did a fabulous job presenting her early work on this research.  I’ve listed the abstract and video below. Logging […]

August 19th, 2018

DOCC-Lab students presented on diagnosis research at 2018 DevConf.US

Many of our students presented at Red Hat’s developer conference (DevConf.US) this year.  I’ve listed abstracts and talk videos below. […]

August 2nd, 2018

Our NSF proposal, “A just-in-time, cross-layer instrumentation framework for diagnosing performance problems in distributed applications,” was funded by NSF

Diagnosing and fixing problems in distributed applications running in cloud environments is extremely challenging.  One key reason is a lack […]

April 29th, 2018

Attending CSR Aspiring PIs Workshop

Thanks to NSF for selecting me to attend this workshop and for funding my travel costs.  I’m looking forward to […]

October 1st, 2017

Harshal and Andrew named Siemens Research Competition semifinalists!

Harshal and Andrew’s project, Tarpan: a router that supports evolvability, involved implementing a robust version of D-BGP in Quagga. D-BGP […]

May 5th, 2017

Our paper, “Bootstrapping evolvability for inter-domain routing with D-BGP,” was accepted to SIGCOMM’17!

August 21st, 2016

Our paper, “Principled Workflow-centric tracing of distributed systems,” was accepted to SoCC’16

Workflow-centric tracing (also called end-to-end tracing or distributed-systems tracing) captures the work done within and among distributed-system components to service […]