Thursday, October 20, 2011

Radical Automation

There are a lot of tools out there to help out operationally.
They help with numerous areas, including but not at all limited to:

  • config file management
  • patch and package management
  • server builds
  • remote management (even for virtualized/remote console)
  • monitoring/alerting
  • data gathering and data graphing (can be separate from monitoring/alerting
  • CMDB and inventory management
  • capacity planning/management
  • performance monitoring/management/planning
  • security monitoring/prevention/policy enforcement
  • correlation engines
  • release tools
  • lifecycle management (hardware, software, etc)
  • control management (automated error correction like restarting a failing application instance)
  • reporting tools
Each of these items is useful in its own right.

When you have enough of these systems, they can seem like a lot to look at, and a lot of overhead.  But without them you are blind.

The solution is radical automation.

I've worked in a radically automated shop.  This was in the days before so much open source software, in the 90's.  We wrote systems that handled most of these issues end-to-end, and were written flexibly to be extensible.  We had what I think of as code-perfectionists who wrote self-documented code in perl, no less, that anyone could modify.  I started helping do it, and then others who were better at it and more dedicated to this space took over.  But even years later I could go and modify our code and automation because it was written so well.  But I digress.

Having all of these systems very tightly integrated did many things for us:

  • Automation means less administrator-caused downtime.  Also, less experienced folks could take on more complex tasks that were "wrapped" in automation that safeguarded against most stupid mistakes and "fat fingering" problems.
  • We had the data we wanted now, and on an ongoing basis.  And seeing and using that data, sharing that data, was easily doable.  
    • Weekly reporting that could be changed week to week but otherwise be consistent

  • We could tackle enormous tasks.  When building and deploying a server is entering a tiny bit of data in a database about it and its role, and plugging it in and waiting, it is not hard to deploy hundreds of servers in a day with a few people.
  • Budgeting and planning were eventually largely based on rules from automated capacity planning.
  • We knew where and what and how many of everything there was and why.
  • We did a LOT more to focus on vendor management when we were not so busy staring at our navel.
There are a ton of frameworks for automation, but none replace rolling up your team's sleeves and getting to work on writing some automation.  You can start small and grow from there.  Pay your sysadmins to be programmers and not just sysadmins.

No comments: