Path-Based Failure and Evolution Management

12 years 7 months ago
Path-Based Failure and Evolution Management
We present a new approach to managing failures and evolution in large, complex distributed systems using runtime paths. We use the paths that requests follow as e through the system as our core abstraction, and our "macro" approach focuses on component interactions rather than the details of the components themselves. Paths record component performance and interactions, are user- and request-centric, and occur in sufficient volume to enable statistical analysis, all in a way that is easily reusable across applications. Automated statistical analysis of multiple paths allows for the detection and diagnosis of complex failures and the assessment of evolution issues. In particular, our approach enables significantly stronger capabilities in failure detection, failure diagnosis, impact analysis, and understanding system evolution. We explore these capabilities with three real implementations, two of which service millions of requests per day. Our contributions include the approa...
Mike Y. Chen, Anthony Accardi, Emre Kiciman, David
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where NSDI
Authors Mike Y. Chen, Anthony Accardi, Emre Kiciman, David A. Patterson, Armando Fox, Eric A. Brewer
Comments (0)