Main takeaways
Site Reliability Engineering
20+ years
Gum Trees
"Evolution selects for resilience, adaptability and evolvability" - Alicia Juarrero
Gum Trees
Humerus fracture
Egypt 1539–1075 BC
Man Made Disaster
Control Theory
Latent Failure Theory
Normal Accident Theory
Cognitive Sciences
Cybernetics
Sociology
Psychology
Human Factors
Complex Systems Science and Philosophy
Safety Science
Socio-Technical Systems
Philosophy and sciences can agree:
They have lots of components, that interact locally, not globally
Small changes done locally, can have unintented effects globally
Embed in their environments, adapt, grow and sensitive to changes
Require constant energy, entropy is constant, equilibrium is impossible
Hierarchy imposes constraints, added layers become more abstract
They have a history, which is crucial to their growth
It is a Field and a Community
It's not a tool or a product.
It is multi-disciplinary, it crosses multiple industries, has origins dating back several decades but has become more of a "thing" in the past 15 years. In other words there is a lot of academic material, it is highly opinionated and that is great because it provokes great discussion
resiliencepapers.club
A resilient system is able effectively to adjust its functioning prior to, during, or following changes and disturbances, so that it can continue to perform as required after a disruption or a major mishap, and in the presence of continuous stresses.
* Sustained Adaptive Capacity
* Graceful Extensibility
* Continuous Adaptability
It is what your organisation does. Not what it has.
How well do your people and systems adapt...
To failure?
To unplanned work?
To new architecture platforms & technology?
Production environment!
Complex systems will behave in unexpected ways
...you can't always code the ability to dynamically adapt
Learn from incidents as much as possible
They are
part of normal complex system behavior. Use them.
You can't wait for resilience to evolve naturally.
It must
become an on-going practice
Create conditions and environments where teams can sustain adaptive capacity - wherever the work-is-done
Understand the interactions between people and technology.
Don't isolate them as separate
challenges
Constraints can also enable innovation, think of them as probabilities for change, not restrictions
Safe fail over fail safes. Diversity of thought will increase robustness. Strict controls can lead to brittle and static systems
Avoid restrictive control structures. Keep feedback loops open and innovation enabled
Don't blindly persist with operational models that become commoditized. e.g. cloud computing
Thank you!
https://res-eng.hatchman76.com
@hatchman76