Many of our students presented at Red Hat’s developer conference (DevConf.US) this year.  I’ve listed abstracts and talk videos below.

Logging what matters: Just-in-time instrumentation and tracing (Lily Sturmann and Emre Ates) (Video)Diagnosing problems in distributed systems is time-consuming and heavily reliant on developer guesswork to know where to instrument the system. The Pythia “Just-in-Time” Instrumentation Framework uses statistical measures to detect where instrumentation is needed in a distributed system to isolate specific problems as they occur. We will demonstrate an initial proof of concept by showing that one key statistical measure—high-performance variation among work that is expected to perform similarly—can predict where additional instrumentation is needed.

Skua: Extending distributed-systems tracing into the Linux Kernel (Harshal Sheth and Andrew Sun) (Video): Modern applications are often architected as a sprawling fleet of microservices. While this does have benefits, it also makes it incredibly difficult for developers to diagnose issues with their applications. Many tools have been developed to trace applications by recording timing data and resolving service dependencies. However, these tools miss an important part of application performance: the kernel. We present Skua, a modified suite of tracing utilities that gains insight into both application- and kernel-level behavior. Logging information produced by LTTng is augmented with tracing context information and integrated into the existing distributed-systems tracing framework provided by Jaeger.

Tracing Ceph using Jaeger-Blkkin (Mania Abdi) (Video):  Blkkin is a custom end-to-end tracing infrastructure for Ceph. It captures the work done to process individual requests within and among Ceph’s components. But, it can only be turned on for individual requests and cannot be left always-on due to the resulting overhead. We present Jaeger-Blkkin, which can be used in always-on fashion in production with low overhead. Jaeger-BlkKin is constructed by replacing much of Blkkin’s tracing functionality with that of Jaeger, a widely-deployed open-source tracing infrastructure. Jaeger-Blkkin is OpenTracing compatible, meaning that it can be replaced easily with other, even more, advanced tracing infrastructures when they become available.

Leave a Reply

Your email address will not be published. Required fields are marked *


December 20th, 2018

DOCC Lab celebrates its first end-of-semester dinner

My students and I celebrated the end of the Fall’18. with dinner at the Q restaurant in downtown Boston last […]

December 10th, 2018

Lily presented, “Logging what matters: The Pythia just-in-time instrumentation framework for distributed applications,” at the 2018 Observability summit

Lily did a fabulous job presenting her early work on this research.  I’ve listed the abstract and video below. Logging […]

August 19th, 2018

DOCC-Lab students presented on diagnosis research at 2018 DevConf.US

Many of our students presented at Red Hat’s developer conference (DevConf.US) this year.  I’ve listed abstracts and talk videos below. […]

August 2nd, 2018

Our NSF proposal, “A just-in-time, cross-layer instrumentation framework for diagnosing performance problems in distributed applications,” was funded by NSF

Diagnosing and fixing problems in distributed applications running in cloud environments is extremely challenging.  One key reason is a lack […]

April 29th, 2018

Attending CSR Aspiring PIs Workshop

Thanks to NSF for selecting me to attend this workshop and for funding my travel costs.  I’m looking forward to […]

October 1st, 2017

Harshal and Andrew named Siemens Research Competition semifinalists!

Harshal and Andrew’s project, Tarpan: a router that supports evolvability, involved implementing a robust version of D-BGP in Quagga. D-BGP […]

May 5th, 2017

Our paper, “Bootstrapping evolvability for inter-domain routing with D-BGP,” was accepted to SIGCOMM’17!

August 21st, 2016

Our paper, “Principled Workflow-centric tracing of distributed systems,” was accepted to SoCC’16

Workflow-centric tracing (also called end-to-end tracing or distributed-systems tracing) captures the work done within and among distributed-system components to service […]