Many of our students presented at Red Hat’s developer conference (DevConf.US) this year. I’ve listed abstracts and talk videos below.
Logging what matters: Just-in-time instrumentation and tracing (Lily Sturmann and Emre Ates) (Video): Diagnosing problems in distributed systems is time-consuming and heavily reliant on developer guesswork to know where to instrument the system. The Pythia “Just-in-Time” Instrumentation Framework uses statistical measures to detect where instrumentation is needed in a distributed system to isolate specific problems as they occur. We will demonstrate an initial proof of concept by showing that one key statistical measure—high-performance variation among work that is expected to perform similarly—can predict where additional instrumentation is needed.
Skua: Extending distributed-systems tracing into the Linux Kernel (Harshal Sheth and Andrew Sun) (Video): Modern applications are often architected as a sprawling fleet of microservices. While this does have benefits, it also makes it incredibly difficult for developers to diagnose issues with their applications. Many tools have been developed to trace applications by recording timing data and resolving service dependencies. However, these tools miss an important part of application performance: the kernel. We present Skua, a modified suite of tracing utilities that gains insight into both application- and kernel-level behavior. Logging information produced by LTTng is augmented with tracing context information and integrated into the existing distributed-systems tracing framework provided by Jaeger.
Tracing Ceph using Jaeger-Blkkin (Mania Abdi) (Video): Blkkin is a custom end-to-end tracing infrastructure for Ceph. It captures the work done to process individual requests within and among Ceph’s components. But, it can only be turned on for individual requests and cannot be left always-on due to the resulting overhead. We present Jaeger-Blkkin, which can be used in always-on fashion in production with low overhead. Jaeger-BlkKin is constructed by replacing much of Blkkin’s tracing functionality with that of Jaeger, a widely-deployed open-source tracing infrastructure. Jaeger-Blkkin is OpenTracing compatible, meaning that it can be replaced easily with other, even more, advanced tracing infrastructures when they become available.