Page tree
Skip to end of metadata
Go to start of metadata

Where we want to be & how we propose to get there.

Where we look to be :

  • Able to accommodate a high rate of change.
  • Able to progress constant small improvements through small frequent feedback loops : 
    • “Knowledge exists at the edges, not at the centre”
    • "Turn local discoveries into global improvements"
    • "Delivering value continuously"
    • Aim is not the allowance of daily work but the improvement of daily work.
    • Changes should have short lead times.
  • Able to use the 24 hour global nature of the organization
    • Code changes and deployments at any time of the day without massive chaos and disruption.
    • It's always 9.00AM somewhere so there is no time when everyone is asleep/inactive.
    • Code and environments have to be safe to change (and we can recover from mistakes quickly)
    • Source code, Data etc. all have to be safe and versioned so that if need be we can rebuild from a specific point in time/version.
    • Means a high-trust environment where we have to rely on our team members.
  • Able to become and stay Flexible.
    • Ability to restructure easily as needs or providers changes.
    • Ability to up or downsize services and/or move to a different provider without disruption.
    • Ability to keep an eye on provider costs and not get trapped by what we have done/created.
  • Able to become and stay standards based
    • Standards, encompassing the sum of our organizational knowledge, should be easier to use than to not.

When things go wrong:

  • Have monitoring in place to see that it has gone wrong & what the problem is.
  • Have nightly backups and possibly constant database replication for high value, high change services.
  • Have systems in place to restore service right through to a complete rebuild on a new instance.
  • Have the systems in place to learn from events and either prevent or more rapidly detect & recover.

How we get there :

Automation using Ansible :

  • Infrastructure automation:
    • Deliver compute, network, storage, and security (Infrastructure as a Service). 
      • We use a given version of Linux (Ubuntu) with a standard setup (fail2ban, firewall, users, key based ssh access, sudo rights etc.)
  • Build automation:
    • Deliver application components as immutable images (Continuous Integration). 
      • We build to a .deb file in our nexus repository. If a web container is required we should build that in too (Tomcat).
  • Application automation:
    • Deliver complete environments (Continuous Delivery). 
      • Specific roles for example the terminology server or MLDS. 
      • Includes backup to an external place (e.g. AWS) from day 1 of production.
      • Provide SSL via a standard setup for Nginx. No need for the application to manage SSL,

Feedback via Freshdesk :

  • Used to feedback into the production cycle.
    • Useful ideas will usually come from users & not developers as they know what will make their life easier & more productive.
  • Used as a human based system monitoring system.
    • If for example a system is down or running slowly or there are any other problems with it.

Monitoring using Zabbix :

  • We can monitor systems on almost any metric from disk space to CPU usage to numbers of queries a given database is receiving
    • This allows us to gather the metrics to then work out if a service needs more or fewer resources.

Summary :

  • Improve information flow, break down silos, improve speed.
  • Build, measure, learn, repeat.
  • Automate in idempotent manner - repeatability. Instance A is the same as instance B. This allows for rebuilds/movement/flexibility.
  • Visibility - Get improvements up in front of users as fast as possible.
  • Solidity - standards & monitoring to know of any problems & fix them before the users are aware.
  • No labels