Note: This techreport has been superseded by our NSDI’11 paper, “Diagnosing performance changes by comparing request flows.”
The causes of performance changes in a distributed system often elude even its developers. This paper develops a new technique for gaining insight into such changes: comparing system behaviours from two executions (e.g., of two system versions or time periods). Building on end-to-end request flow tracing within and across components, algorithms are described for identifying and ranking changes in the flow and/or timing of request processing. The implementation of these algorithms in a tool called Spectroscope is described and evaluated. Five case studies are presented of using Spectroscope to diagnose performance hanges in a distributed storage system caused by code changes and configuration modifications, demonstrating the value and efficacy of comparing system behaviours.