Systems In The Wild

Architecture observations on complex distributed systems.

Studying how complex systems behave outside architectural diagrams.

Platform Governance

Translating OpenShift Health into Business Risk

The gap no one owns Most OpenShift environments can report their health status with precision. Very few can report their risk position with confidence. Clusters expose thousands of signals: node conditions, operator status, etcd latency, certificate countdowns… The data exists. What rarely exists is a structured translation layer between platform health and business risk. ...

March 4, 2026 · 10 min · Andre Rocha
Platform Governance

Why Most OpenShift DR Strategies Fail at Executive Level

Most enterprise OpenShift disaster recovery strategies are designed to satisfy audits, not to survive real incidents They describe recovery procedures, declare RPO and RTO targets, and satisfy audit checklists. What they rarely do is demonstrate recovery capability under realistic conditions. This distinction matters more than it appears. Having a DR plan and having DR capability are fundamentally different things. The first is a document. The second is a measurable organizational competence that requires investment, testing, and continuous validation. ...

March 2, 2026 · 10 min · Andre Rocha
Platform Governance

Platform Governance as a Control System in Multi-Cluster Kubernetes

Does it really matter? Let’s explore five items and try to answer that question. 1. Multi Clusters Organizations operating multi-cluster Kubernetes fleets face a structural risk that is rarely discussed in architectural reviews: governance gaps that remain invisible until an audit fails or an incident escalates. The cost is measurable. Undetected configuration drift increases incident blast radius. Inconsistent RBAC baselines extend audit preparation from days to weeks. Clusters onboarded without active policy enforcement create compliance blind spots that accumulate silently. ...

February 26, 2026 · 5 min · Andre Rocha