1. Beyer
B.
Jones
C.
Petoff
J.
& Murphy
N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media.
Reference: Chapter 6
"Eliminating Toil." The chapter defines toil and argues that "the SRE organization should be focused on engineering projects that reduce toil and improve the service. If a team is buried in toil
its members don’t have time to engineer." Automating cluster updates is a prime example of eliminating such toil.
2. Cloud Native Computing Foundation (CNCF). GitOps - The Bad Parts. CNCF Blog.
Reference: This official blog post
while discussing challenges
operates on the foundational premise of GitOps
which is automation. It states
"GitOps is a way to do Kubernetes cluster management and application delivery. It works by using Git as a single source of truth for declarative infrastructure and applications." The entire model is built on automating the reconciliation of the cluster state with the state defined in Git
including version updates.
3. Kubernetes Documentation. Upgrading a cluster.
Reference: Section "Automating cluster upgrades." The official documentation notes
"The Kubernetes project recommends that you automate cluster upgrades. Automation can reduce the amount of manual work involved
and can also make the upgrade process more consistent." This directly supports the prioritization of automation for efficiency and reliability.