Raja R. Sambasivan, Alice X. Zheng, Eno Thereska, Gregory R. Ganger
In Proceedings of HotAC 2007
Publication year: 2007

Making request flow tracing an integral part of soft- ware systems creates the potential to better understand their operation. The resulting traces can be converted to per- request graphs of the work performed by a service, repre- senting the flow and timing of each request’s processing. Collectively, these graphs contain detailed and comprehen- sive data about the system’s behavior and the workload that induced it, leaving the challenge of extracting insights. Categorizing and differencing such graphs should greatly improve our ability to understand the runtime behavior of complex distributed services and diagnose problems. Clus- tering the set of graphs can identify common request pro- cessing paths and expose outliers. Moreover, clustering two sets of graphs can expose differences between the two; for example, a programmer could diagnose a problem that arises by comparing current request processing with that of an earlier non-problem period and focusing on the aspects that change. Such categorizing and differencing of system behavior can be a big step in the direction of automated problem diagnosis.

Leave a Reply

Your email address will not be published. Required fields are marked *