Workflow-centric tracing (also called end-to-end tracing or distributed-systems tracing) captures the work done within and among distributed-system components to service individual requests. Due to its ability to provide deep visibility into complex distributed-system behaviors, it is rapidly being adopted by industry (e.g., by Facebook, Google, Yelp). However, there is a dangerous belief both in academia and industry that a single workflow-centric tracing design can serve all of the use cases commonly attributed to it (e.g., diagnosing different types of problems, resource attribution).
For this paper, we teamed up with other academics and practitioners working on workflow-centric tracing to distill its key design axes. For each axis, we identified design choices best suited for various tracing use cases. We also discussed how seemingly innocuous design choices for different axes can lead to poor outcomes due to the way they interact with one other.
We have been trying to get this paper published for four years, so I’m very happy about this acceptance! The initial technical report version of this paper, which we published in 2014, has already been cited by dozens of other research papers and covered in various “Papers We Love” meetups.