"There's an incredible amount of depth and thinking in the practices described here, and it's impressive to see it all in one place."
-Win Treese, coauthor of Designing Systems for Internet Commerce
The Practice of Cloud System Administration, Volume 2, focuses on "distributed" or "cloud" computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.
Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:
Designing and building modern web and distributed systems
Fundamentals of large system design
Understand the new software engineering implications of cloud administration
Make systems that are resilient to failure and grow and scale dynamically
Implement DevOps principles and cultural changes
IaaS/PaaS/SaaS and virtual platform selection
Operating and running systems using the latest DevOps/SRE strategies
Upgrade production systems with zero down-time
What and how to automate; how to decide what not to automate
On-call best practices that improve uptime
Why distributed systems require fundamentally different system administration techniques
Identify and resolve resiliency problems before they surprise you
Assessing and evaluating your team's operational effectiveness
Manage the scientific process of continuous improvement
A forty-page, pain-free assessment system you can start using today
Part I: Design: Building It
Chapter 1: Designing in a Distributed World
Chapter 2: Designing for Operations
Chapter 3: Selecting a Service Platform
Chapter 4: Application Architectures
Chapter 5: Design Patterns for Scaling
Chapter 6: Design Patterns for Resiliency
Part II: Operations: Running It
Chapter 7: Operations in a Distributed World
Chapter 8: DevOps Culture
Chapter 9: Service Delivery: The Build Phase
Chapter 10: Service Delivery: The Deployment Phase
Chapter 11: Upgrading Live Services
Chapter 12: Automation
Chapter 13: Design Documents
Chapter 14: Oncall
Chapter 15: Disaster Preparedness
Chapter 16: Monitoring Fundamentals
Chapter 17: Monitoring Architecture and Practice
Chapter 18: Capacity Planning
Chapter 19: Creating KPIs
Chapter 20: Operational Excellence