From 2013 to 2016, I was a postdoctoral researcher in the Computer Science Department at Carnegie Mellon University (CMU). I worked on the XIA project and was advised by Professor Peter Steenkiste. My research focused on enabling evolvability for inter-domain routing and was published in SIGCOMM’17. In the Fall of 2013, I developed and taught the initial offering of CMU’s graduate class on cloud computing (15-719).
I completed my Ph.D. in the Electrical & Computer Engineering department at CMU in May 2013. I worked at the Parallel Data Lab (PDL) and was advised by Professor Greg Ganger. My dissertation work focused on how to use distributed-system tracing techniques to automate problem diagnosis tasks in cloud environments. Conference papers related to my dissertation were published in NSDI’11, InfoVis’13, and SoCC’16.
In 2007, I appeared in a PhDComics strip encouraging CS grad students to wear lab coats to work. In my spare time, I enjoy playing tennis, running, and photography. I also occasionally blog at Formalized Curiosity.
Harshal and Andrew’s work involves propagating trace context into the Linux kernel and extending LTTNG with support for recording the context. This work was part of their MIT primes high-school research project. Their presentation of this project at The Red Hat developers conference, Devconf.US was well received.
Diagnosing and fixing problems in distributed applications running in cloud environments is extremely challenging. One key reason is a lack of needed instrumentation: it is difficult to predict a priori where instrumentation is needed, what instrumentation is needed, and within what datacenter stack layer (e.g., application, virtualization, network) instrumentation is needed to provide visibility into future problems.
To help, this proposal describes a framework that will explore the search space of possible instrumentation choices to automatically enable the instrumentation needed to help engineers diagnose a new problem. This work builds on workflow-centric tracing (also called end-to-end tracing or distributed tracing), which was a focus of my dissertation work, machine-learning techniques, and domain-specific knowledge.
My Co-PIs and I are very excited to make progress on this project!
NSF CNS CSR Small: A just-in-time, cross-layer instrumentation framework to help diagnose performance problems in distributed applications. Raja R Sambasivan, Ayse K. Coskun, Orran Krieger. $460,249.
Thanks to NSF for selecting me to attend this workshop and for funding my travel costs. I’m looking forward to learn more about NSF’s programs and how to write great proposals :).
Harshal and Andrew’s project, Tarpan: a router that supports evolvability, involved implementing a robust version of D-BGP in Quagga. D-BGP is a version of BGP that includes extensions that let it bootstrap evolvability to new inter-domain routing protocols—i.e., facilitate their deployment and gradually deprecate itself in favor of one or more of them. D-BGP was the focus of my postdoc research and was published in SIGCOMM’17. Harshal and Andrew are high school students from the MIT primes research program.
Congrats to Harshal and Andrew on this accomplishment!
Our paper, “Bootstrapping evolvability for inter-domain routing with D-BGP,” was accepted to SIGCOMM’17!
Our paper, “Principled workflow-centric tracing of distributed systems,” was accepted to SoCC’16! Workflow-centric tracing (also called end-to-end tracing or distributed-systems tracing) captures the work done within and among distributed-system components to service individual requests. Due to its ability to provide deep visibility into complex distributed system behavior, it is rapidly being adopted by industry (e.g., by Facebook, Google, Yelp).
This paper identifies the design decisions that must be made within a workflow-centric tracing infrastructure to support common tracing use cases, such as diagnosis and resource accounting. To do so, it discusses the authors’ extensive previous experiences with tracing and systematizes the breadth of past work on this topic. A key outcome of this work is that no single tracing infrastructure design is sufficient to satisfy the common tracing use cases.