At Qlector we are committed to develop and deliver high quality software and take into account best engineering practices as listed in 12 factor apps.
In the following post we describe how we introduced Continuous Delivery. By doing so, we reduce waste, costs of development as well as increase quality. First we will briefly describe the practice and principles and later dive into details about how we implemented it at Qlector. Over the post we point to relevant posts which were useful to us when thinking about problems and designing a pipeline.
Definitions, principles and metrics
CI & CD2: how do we define them?
The three practices are related and mean higher stages of build automation. Continuous Integration is the practice to build and execute tests on each commit on an independent server. This server should guarantee a clean environment where code is built from scratch in order to ensure no failures or tests pass due to some inconsistent state where run. We may also need to build in different environments, to ensure same code does not behave in different ways depending on the environment.
Continuous Delivery takes this further: for each commit we build a binary, which can be deployed to production with a push of a button — making this decision a business one. Continuous Deployment means deploy the binary we get from every commit — no longer requiring human intervention.
Not every software project can introduce continuous deployment to PROD environment, but introducing it into other stages greatly helps to better understand deployments and make them less painful, shorten the feedback loop and avoid investing time in tasks that can be automated. By doing this frequently in a no-risk environment, we put constant stress on delivery and deployment process thus reducing its fragility and turn it into a low-risk activity in production.
Continuous Delivery is based on 5 principles:
- build quality in: the later we find defects, the higher the cost. We aim for a short feedback cycle in order to find and fix issues the quickest possible way
- work in small batches: allows to have quicker feedback and less surface to explore in case of issues, thus allowing us to come with fixes shortly after detected
- computers perform repetitive tasks, people solve problems: people should invest their time into creative activities and automate where possible in order to have time to do so. This not only increases process quality (less human error over time), but also greatly motivates people, which can devote themselves to meaningful work
- relentlessly pursue continuous improvement: continuous delivery is not a one time shot — is an attitude: we can always improve the process and by doing so, will increase overall quality and productivity
- everyone is responsible: the goal is to build a better product and that requires teamwork. Better quality and delivery is not a matter of isolated teams, but transcends to each one in the company.
How to measure?
In every company, coupled with every initiative we require a success metric — some KPI that would allow us to understand how we perform. How do we measure continuous delivery? The State of DevOps report and Accelerate book highlight that in order to predict and improve the performance of a team, we only need to measure four key metrics: lead time, deployment frequency, mean time to restore (MTTR), and change fail percentage. This metrics were also highlighted as worth pursuing by the latest ThoughtWorks Technology Radar.
Designing the pipeline
When we committed ourselves to build a continuous delivery pipeline, we sought for principles, to design and build it on a good foundation. This principles are: to build packages once, deploy the same way everywhere, smoke test deployments and keep environments similar.
Enforcing immutability has many benefits and thus we paid attention to it across different stages:
- we use Docker to provision build slaves on demand: these are created for a single build and destroyed when finished, ensuring a clean environment. To enforce the right slaves are created, we label slave types and associate them to certain Docker images and this way control the environment on which builds run.
- to make sure binaries are built once, are accessible and remain immutable, we use a binaries repository management tool. Our choice was JFrog Artifactory.
- from binaries immutability to the environment where the application is running there is a long way. We use Docker to achieve it: for every build we retrieve the latest binary, install it on a Docker image and release it. This way we can start any version anywhere, making it easy to deploy the latest, rollback if an issue is found or reproduce a specific version with the same environment.
- Jenkins holds a lot of information. Over the last year we saw a great new feature developed, that allows us to specify Jenkins Configuration as Code (JCasC). We gave it a try and coded all our plugins and pipelines using this feature. Although it is not mature yet, we find it very promising and agree is the way to go.
How does our pipeline look like?
- Whenever a commit is pushed to Github, Jenkins will start a build for that project. A new slave will be created with the right environment for it, code retrieved from the repository using Github Deploy Keys, built and tests run.
- If all tests pass, will proceed to create a binary and push it to JFrog Artifactory and have our slave killed.
- Another slave will pick the latest build from our binaries repository manager and install to a predefined Docker image, releasing a new version to Docker Hub. Every new image will be tagged with two tags: the corresponding version and ‘latest’, making sure we can always retrieve the latest version without knowledge of its version.
- Finally we issue the corresponding notifications on success or failure. Here we implemented different policies how to notify build status. We usually notify only when status changes, to avoid notifications overload (and the consequent decrease of attention from receivers). Notifications are issues to a Slack channel everyone is subscribed to.
As a standard practice, important stats on time per pipeline stage and their status are displayed by the Jenkins Build Pipeline and current pipelines status can be visualized on our Jenkins Build Monitor.
War stories on building the pipeline
Things are easier said than done :) But we all proudly remember hard times, which allow to display resourcefulness and tenacity towards solving pressing issues we face. Building the pipeline was no exception and we would like to share our experience on some issues we faced when developing it.
Slaves provisioning with Docker
We use the Docker plugin for slaves provisioning. Some issues we faced at first where how to make Docker host accessible to Jenkins and then properly configure the slave to interface with the master. We decided to communicate over ssh and customized the Jenkins Docker ssh slave template for that. Best practice is not only have a unique instance per build, but also to issue a new pair of keys to login into each of the slaves, so that no one can log into them except the master. We achieved this by requesting ssh-key injection when configuring Docker agent templates.
Docker in Docker (DinD) configuration
We explored using a DinD configuration, but finally decided to run Jenkins master on a dedicated server and only dockerize its slaves (except those building Docker images). We found a good summary of potential issues and links to further relevant resources in this post.
Our final architecture runs code building and tests up to binary publication to the binaries repository management inside Docker slaves. Then retrieves the binary on a non-dockerized slave and proceeds to build and publish the Docker image.
Smoke testing the image
After building our Docker images we make sure the app starts with associated services as expected. Only after this checks pass, we push it to the Docker registry, ensuring a new level of quality. Images we build are removed from local slave, to avoid accumulating unnecessary waste on disk.
Build notifications: implementation and policies
Notifications are a central feature to the pipeline: proper message and configuration may well determine its success. Shortening the feedback loop means properly communicating what the pipeline was built for: if changes we introduce seem good or something needs to be fixed.
We decided to notify builds status through Slack. In our experience, a notification per each build may result in many irrelevant statuses that decrease developers attention due to notifications overload. An alternative policy would be notify only bad builds or on status change — policies we implemented. We are grateful to Betterment for sharing their experience on this topic and decided to share a code snippet with policies and message formatting as well. Bonus? We provide a variety of emojis for each build status, so that messages are not always the same :)
Building context into Docker image profiles
Among continuous delivery pipeline patterns we find the build packages once pattern: by deploying same code we tested, we eliminate packages as the source of failure. Since we may need to deploy containers from same image in different contexts and each one with its own configurations, we built in support for profiles: depending on them we are able to load different configurations, ensuring the same build may behave as configured in different contexts.
How to trace changes across the whole pipeline? Versioning may help us …
How we handle versioning
An important issue regarding software development is versioning, which is meant for the purpose of communicating changes in binaries we deliver. Versioning can be a great challenge since requires to convey meaningful information to humans but we should be able to delegate its creation to machines. How did we address this issue at Qlector?
Versioning can be addressed broadly, to convey information about
- the magnitude of the changes
- if the build is a final release or meant as WIP
- provide traceability to latest commit, so that we can quickly bind a binary to its corresponding status in the codebase.
By providing a deterministic heuristic, we ensure same version will be produced under same circumstances.
Magnitude of changes
When speaking about the magnitude of change we mean if changes we introduce to new version break compatibility with previous API, add new features or/and fix reported issues. This information is best conveyed following the SemVer convention, which due to its clarity has become a standard. SemVer also supports adding pre-release and commit tags- features we used to convey additional meaningful information.
Definitive vs WIP releases
It would be nice to just by reading a version know if corresponds to a definitive release (milestone) or to some work in progress. For this purpose we borrowed the concept of SNAPSHOT release from Maven. In addition to the SemVer number, we add the SNAPSHOT tag to WIP releases.
Tracing the build back to a git commit
Similar to the SNAPSHOT tag, we may also add a short git commit tag: this way we can trace each build version back to the commit that generated it. By doing so, we have a reference to triage issues in the codebase when reported from a deployed environment.
When developing a pipeline, we need to provide some mechanism to automate versioning: something that would consider the rules above and provide a proper version for next release. To do so, we created a script that proposes a version based on latest git tag version hash and latest commit hash:
- if latest commit is same as tagged one, we return the tag as version’
- if latest commit is not same to latest tagged, we
- increase by one the minor value from version retrieved from git tag
- add the SHAPSHOT tag
- in all cases we add the short version of current commit tag, to provide traceability between build version and code versioning system.
Faster is better: speeding up the feedback loop
At Qlector we value and put great attention to agility: the shorter the feedback loop, the quicker we develop features, identify issues and remediate them.
After setting up our continuous delivery pipeline, we saw two opportunities for speeding up:
- perform continuous deployment on non-risk environments and learn best practices to make it painless in production
- optimize our dev environment, so that local versions of services are created and means are provided to gain feedback on new code
Staying current with dev dependencies
Staying current with dependencies versions is important due to security reasons (avoid vulnerabilities) and to minimize the upgrade gap (the greater the delay, the greater the gap of changes we need to adapt in our code — incremental changes avoid us big refactorings). Despite this, not all teams and projects regularly perform updates. Research performed on this topic found out that some practices may help teams to update dependencies regularly.
In our case we decided not to implement an automated PR for version changes, but have a regular job test updates and notify us if our code builds without issues and we do so regularly.
Towards an agile dev environment
Even with a continuous delivery or deployment pipeline, we spend most time coding on a local environment and anything we can do to speed up this process will likely have a great productivity impact.
One of the principles for continuous delivery is to have the least possible difference between environments. This well stands for development as well. Having this in mind, we decided to create an image identical to the one to be deployed to production, that would mount the code from the repository, install required dependencies and watch for code changes. On code change, it recompiles the sources, runs tests and provide immediate feedback if something is not behaving as expected.
Once we migrated to this schema, it enabled us also reduce time to setup environment for newcomers: by issuing two command they had everything up and running. It also helped us to ensure things work as expected in production and that nothing works or breaks in local environment due to code or configuration leftovers.
We found many opportunities for improvement, enforcing standard practices such as continuous compilation and testing and working towards a lean setup. This way we significantly reduced setup times and got quicker feedback on new code developed.
The extra mile: dockerized development environments!
Continuous delivery follows some principles inherent to lean: continuous improvement by removing all kind of waste — in our case by automating repetitive tasks and focusing only on tasks that drive value — those that require human creativity and skill. Can we build a dev environment to achieve this?
By following this lean principle, we identified features such a platform should satisfy:
- the best scenario we can aim for would be a zero knowledge setup for anyone working on the project: just by cloning the code, opening an editor or IDE and executing some script we should have everything ready to develop, view the app and get feedback. This should not take longer than a minute or two :)
- the environment should not impose constraints to the developers: everyone should be able to work on the OS and IDE of choice, removing unnecessary learning curves
- we should replicate same environment as in production to make sure no other issues arise outside from those that may happen in production in same conditions. By doing so, we standardize running OS, dependencies (at OS and application packages), components location, encodings as well as user and permissions
- we should prevent issues due to stale conditions on developer side as may be stale packages, configurations or files. Sometimes this make things work locally but are impossible to replicate on other machines
- make it configurable: the developer may decide if all or some modules and services should be running and how: as usual, by having changes being watched and recompile, lint and / or test code on changes? When starting the environment, shall we start from where we left or ex.: recreate the database? shall we recreate the container?
- provide configurations, certificates and credentials defaults, so that the developer does not need to worry about them until some specific change is required
- provide tools and means to ease debugging
We achieved this by using a Docker images hierarchy that provide us same environment for development, CI as well as production and enable to start all services as defined in docker compose. In development we mirror code from git repository into the container, running the modules inside it, while the production image is created from same base image but persisting released binaries into it. To avoid mismatches in docker-compose definitions, we provide a base definition and modular overwrites which keep track of specific changes at DEV or PROD. This definitions are re-generated on each run, to ensure we never run on a stale setup. We prevent stale conditions on containers by providing means to recreate containers as well as associated volumes if the developer requires so. Developer application dependencies are not persisted into Docker images, but locked by specifying required version number. The development environment will cache downloaded dependencies thus preventing their download and ensuring quick startups.
Settings for developer environment (which modules should run and if we shall watch code changes) as well as credentials and SSL certificates are generated with defaults, but can be overridden by developers at any time.
When watching for changes, we ensure required modules are continuously compiled and the developer may request additional checks through configuration (test and/or lint the code).
Debugging is one of most important tasks performed when it comes to development. Time invested into making it easier, gain visibility or just shortcut to relevant directories saves developers time.
To provide console access, we added two commands that enable us to login into containers and attach tmux sessions with predefined window layouts over all running processes and databases. Panels are grouped based on interacting components so that we can concentrate on a given screen to observe behavior, diagnose issues and work on a solution.
Since debugging sometimes requires a separate set of tools that are not part of the running applications, we developed a specific container with them, that attaches to the running environment and provides utility scripts that may help on the task. Scripts we develop to assist diagnosing a given situation in a better way or shorter time are included in that Docker image, to boost team productivity.
Failures may occur at any step of the stack and its important to have visibility across all stages. If a request is failing, we shall know if is due to bad Nginx mappings or because a service is failing to respond due to some reason. Our configuration provides the ability to make requests through the whole stack or to jump in at any stage to diagnosticate the issue.
There is light at the end of the pipeline!
Implementing a continuous delivery pipeline was a great journey which still lasts as we seek how to improve our pipeline up to deploys. Some benefits we gained from it are a quick feedback loop while developing the product, simplified configurations across multiple environments, short setup times, measurable quality and actionable items from reports as well as dockerized images that work in any environment — released for every working commit we push. Even if at first may feel a bit scary, the small batches principle turns out to help to develop faster and better.
Did you have a similar experience or are setting up your pipeline? Ping us — we will be glad to hear about your experience! Thinking about a new job? We are always looking for best professionals!