Raja R. Sambasivan, Ilari Shafer, Jonathan Mace, Benjamin H. Sigelman, Rodrigo Fonseca, Gregory R. Ganger
In Proceedings of SoCC 2016
Publication year: 2016

Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system.  As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior.  Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis.  Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential.  To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructure’s utility for important tasks.  Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.