Skip to end of metadata
Go to start of metadata
Where we want to be & how we propose to get there.
Where we look to be :
- Able to accommodate a high rate of change.
- Able to progress constant small improvements through small frequent feedback loops :
- “Knowledge exists at the edges, not at the centre”
- "Turn local discoveries into global improvements"
- "Delivering value continuously"
- Aim is not the allowance of daily work but the improvement of daily work.
- Changes should have short lead times.
- Able to use the 24 hour global nature of the organization
- Code changes and deployments at any time of the day without massive chaos and disruption.
- It's always 9.00AM somewhere so there is no time when everyone is asleep/inactive.
- Code and environments have to be safe to change (and we can recover from mistakes quickly)
- Source code, Data etc. all have to be safe and versioned so that if need be we can rebuild from a specific point in time/version.
- Means a high-trust environment where we have to rely on our team members.
- Able to become and stay Flexible.
- Ability to restructure easily as needs or providers changes.
- Ability to up or downsize services and/or move to a different provider without disruption.
- Ability to keep an eye on provider costs and not get trapped by what we have done/created.
- Able to become and stay standards based
- Standards, encompassing the sum of our organizational knowledge, should be easier to use than to not.
When things go wrong:
- Have monitoring in place to see that it has gone wrong & what the problem is.
- Have nightly backups and possibly constant database replication for high value, high change services.
- Have systems in place to restore service right through to a complete rebuild on a new instance.
- Have the systems in place to learn from events and either prevent or more rapidly detect & recover.
How we get there :
Automation using Ansible :
- Infrastructure automation:
- Deliver compute, network, storage, and security (Infrastructure as a Service).
- We use a given version of Linux (Ubuntu) with a standard setup (fail2ban, firewall, users, key based ssh access, sudo rights etc.)
- Build automation:
- Deliver application components as immutable images (Continuous Integration).
- We build to a .deb file in our nexus repository. If a web container is required we should build that in too (Tomcat).
- Application automation:
- Deliver complete environments (Continuous Delivery).
- Specific roles for example the terminology server or MLDS.
- Includes backup to an external place (e.g. AWS) from day 1 of production.
- Provide SSL via a standard setup for Nginx. No need for the application to manage SSL,
- Used to feedback into the production cycle.
- Useful ideas will usually come from users & not developers as they know what will make their life easier & more productive.
- Used as a human based system monitoring system.
- If for example a system is down or running slowly or there are any other problems with it.
Monitoring using Zabbix :
- We can monitor systems on almost any metric from disk space to CPU usage to numbers of queries a given database is receiving
- This allows us to gather the metrics to then work out if a service needs more or fewer resources.
- Improve information flow, break down silos, improve speed.
- Build, measure, learn, repeat.
- Automate in idempotent manner - repeatability. Instance A is the same as instance B. This allows for rebuilds/movement/flexibility.
- Visibility - Get improvements up in front of users as fast as possible.
- Solidity - standards & monitoring to know of any problems & fix them before the users are aware.