Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger
Carnegie Mellon Parallel Data Lab Technical Report CMU-PDL-14-102
Publication year: 2014

Note: This technical report has been superseded by our SoCC16 paper, Principled workflow-centric tracing of distributed systems.

End-to-end tracing captures the workflow of causally-related activity (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for management tasks like diagnosis and resource accounting. Drawing upon our experiences building and using end-to-end tracing infrastructures, this paper distills the key design axes that dictate trace utility for important use cases. Developing tracing infrastructures without explicitly understanding these axes and choices for them will likely result in infrastructures that are not useful for their intended purposes. In addition to identifying the design axes, this paper identifies good design choices for various tracing use cases, contrasts them to choices made by previous tracing implementations, and shows where prior implementations fall short. It also identifies remaining challenges on the path to making tracing an integral part of distributed system design.

Leave a Reply

Your email address will not be published. Required fields are marked *