CI/CD: Using GitLab and Ansible to deploy to Docker Swarm

How we deploy to Swarm from GitLab using Ansible

August 25, 2021

Yes, this is an article about Docker Swarm in 2021!

We have previously explained how we use GitLab CI and Ansible to deploy services.

In this post, we will show how we use the same setup (GitLab and Ansible) to build and deploy containers to Docker Swarm.

TL;DR

We use GitLab CI to build and store a Docker image matching a Git tag
We use Ansible (as a shell runner) to template our Docker Stack file
We use the docker_stack module to deploy to Swarm

Quick reminder

GitLab is a web-based Git repository manager with CI/CD pipeline features.
Ansible is an automation tool for provisioning, configuration management, and application deployment.
Docker is a program (and much more) that runs containers.
Docker Swarm is a container orchestration tool provided by Docker.

Workflow

GitLab CI is at the center of our CI/CD system.

For one of our project that runs on a Docker Swarm cluster, the CI/CD pipeline looks like this:

Test & Build: build a Docker image from a Dockerfile. The image is stored on the GitLab Container Registry.
Deploy with Ansible to specific environments.

The pipelines are run on tags only.

Here is the complete workflow:

                build stage        deploy stage           swarm post deployment
            ┌─────────────────┐ ┌────────────────┐ ┌─────────────────────────────────┐ 

┌────────┐   ┌───────────────┐   ┌──────────────┐   ┌──────────────┐ ┌─> worker
│ GitLab │ ► │ GitLab Runner │ ► │   Ansible    │ ► │ Docker Swarm │ ──> worker
└────────┘   └───────────────┘   └──────────────┘   └──────────────┘ └─> worker(s)...
 private        build, push        stack deploy         manager
 instance     docker executor     shell executor

Step 1: build the image

For the first step, we use the following job in our .gitlab-ci.yml file:

Build:
  image: docker:20-git
  stage: build
  only:
    - tags
  script:
    - echo -n $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
    - >
      docker build
      --build-arg http_proxy=$http_proxy
      --build-arg https_proxy=$https_proxy
      --build-arg no_proxy=$no_proxy
      --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME
      .      
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME

We use docker:20-git because our Dockerfile requires git at some point (composer dependencies are sometimes fetched with git). Also, we specify a version of the docker image, because we like reproducibility. Using latest would break the CI at some point.

The script has 3 commands:

docker login
docker build (with proxy build args): the image is tagged the same as the Git tag with $CI_COMMIT_REF_NAME
docker push to push the image to the GitLab Container Registry

At this point, the GitLab Container Registry contains an image tagged the same as the Git tag being deployed.

Step 2: deploy to swarm

On GitLab

To deploy the containers, we use the following jobs:

Deploy to production:
  stage: deploy
  script: *deploy
  when: manual
  only:
    - tags
  environment:
    name: production
  tags:
    - deployment
  variables:
    GIT_STRATEGY: none
    ANSIBLE_INVENTORY: prod
    ANSIBLE_SUBSET: all

Deploy to TH2:
  stage: deploy
  script: *deploy
  when: manual
  only:
    - tags
  environment:
    name: production-th2
  tags:
    - deployment
  variables:
    GIT_STRATEGY: none
    ANSIBLE_INVENTORY: prod
    ANSIBLE_SUBSET: th2

You can see we use two options here:

Deploy to environment “production”, or
Deploy to environment “TH2”.

Both jobs set a GIT_STRATEGY to none because we do not need the source code anymore at this point: the docker images are ready. Both jobs use a common script defined elsewhere, using a YAML-specific feature: anchors.

Here is the .deploy reusable hidden job:

.deploy: &deploy
  - >
    cd /var/ansible &&
    sudo -E ansible-playbook ci_api.yml
    --diff
    --private-key="/var/ansible/ssh_keys/gitlab/gitlab-ci"
    --inventory="inventories/$ANSIBLE_INVENTORY"
    --limit="$ANSIBLE_SUBSET"
    -e "API_IMAGE=$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME"
    -e "GITLAB_USER_ID=$GITLAB_USER_ID"
    -e "GITLAB_USER_LOGIN=$GITLAB_USER_LOGIN"
    -e "GITLAB_USER_NAME=$GITLAB_USER_NAME"
    -e "CI_COMMIT_REF_NAME=$CI_COMMIT_REF_NAME"
    -e "CI_COMMIT_SHA=$CI_COMMIT_SHA"
    -e "CI_ENVIRONMENT_NAME=$CI_ENVIRONMENT_NAME"
    -e "CI_PIPELINE_ID=$CI_PIPELINE_ID"
    -e "CI_PROJECT_URL=$CI_PROJECT_URL"
    -e "CI_RUNNER_DESCRIPTION=$CI_RUNNER_DESCRIPTION"
    -e "CI_COMMIT_MESSAGE=$CI_COMMIT_MESSAGE"
    -e "CI_REGISTRY_USER=$CI_REGISTRY_USER"
    -e "CI_REGISTRY_PASSWORD=$CI_REGISTRY_PASSWORD"
    -e "CI_REGISTRY=$CI_REGISTRY"

This job is the link between GitLab and Ansible. Because we set a specific deployment tag on our jobs, only some selected runners receive the job. They are configured as shell runners, where Ansible is available and ready.

The ANSIBLE_INVENTORY and ANSIBLE_SUBSET previously defined variables are passed to Ansible args --inventory and --limit respectively.

The deploy job passes a lot of variables to Ansible using -e that will be used in the Ansible tasks. Most are used to log what is being deployed by whom. The most two important args are:

the image to deploy: API_IMAGE=$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME
the registry url and auth: CI_REGISTRY_USER, CI_REGISTRY_PASSWORD, CI_REGISTRY

All variables on the right side of the -e assignments are GitLab predefined variables.

The image to deploy is $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME, which is exactly the image we pushed at the build stage using docker push.

Ansible

The playbook is quite simple:

- name: Deploy API
  hosts:
    - docker-swarm-managers
  roles:
    - callr/ci_api

First, we template our stack file to all manager nodes:

- name: Copy stack file
  template:
    src: stack/swarm-stack-api.yml
    dest: /opt/swarm-stack-api.yml
    mode: 0444
  register: stack_file
  tags: stack

The interesting parts of the stack file are:

version: "3.7"
services:
  backend:
    image: "{{ API_IMAGE }}"
    deploy:
      mode: replicated
      replicas: 8

Notice how we are using the API_IMAGE variable passed from GitLab to Ansible. This is how a Git tag becomes a Docker image and then a Docker Stack service image.

Then, we task Ansible to authenticate with the GitLab Container Registry:

- name: Docker login
  docker_login:
    registry_url: "{{ CI_REGISTRY }}"
    username: "{{ CI_REGISTRY_USER }}"
    password: "{{ CI_REGISTRY_PASSWORD }}"

To authenticate, we use the GitLab CI/CD predefined variables, specifically the CI_REGISTRY_USER and CI_REGISTRY_PASSWORD. Those are only valid for the job.

Then, we can deploy our updated stack:

- name: Deploy stack
  docker_stack:
    state: present
    name: api
    prune: yes
    with_registry_auth: yes
    compose:
      - /opt/swarm-stack-api.yml
  run_once: yes
  when: stack_file.changed

If the stack file has changed, we run the docker_stack module, with:

prune: yes to remove the services not used anymore,
with_registry_auth: yes, because we want to send the registry authentication details to swarm agents,
run_once: yes because the stack deployment needs to happen on one manager node only.

We finish with a Slack notification, using the what/whom variables:


- name: Send slack notification
  slack:
    token: "{{ slack_token }}"
    channel: "#ops-prod"
    attachments:
      - text: "API deployed from Gitlab CI\n"
        color: "#39932A"
        fields:
          - title: "Environment"
            value: "{{ CI_ENVIRONMENT_NAME }}"
            short: yes
          - title: "Git tag"
            value: "{{ CI_COMMIT_REF_NAME }}"
            short: yes
          - title: "Hosts"
            value: "{{ ansible_play_hosts|join(', ') }}"
            short: false
          - title: "Git commit hash"
            value: "{{ CI_COMMIT_SHA }}"
            short: false
          - title: "Deploying user"
            value: "{{ GITLAB_USER_NAME }} (login:{{ GITLAB_USER_LOGIN }})"
            short: yes
          - title: "CI Runner"
            value: "{{ CI_RUNNER_DESCRIPTION }}"
            short: yes
          - title: "Last commit message"
            value: "{{ CI_COMMIT_MESSAGE }}"
            short: false
          - title: "Pipeline"
            value: "{{ CI_PROJECT_URL }}/pipelines/{{ CI_PIPELINE_ID }}"
            short: false
  delegate_to: localhost
  run_once: yes
  when: stack_file.changed
  ignore_errors: yes

At this point, we have reached the end of the CI job, and the end of the “deploy” stage of GitLab CI.

                                             we are here
                                             -----------
                                                  ˅

                build stage        deploy stage           swarm post deployment
            ┌─────────────────┐ ┌────────────────┐ ┌─────────────────────────────────┐ 

┌────────┐   ┌───────────────┐   ┌──────────────┐   ┌──────────────┐ ┌─> worker
│ GitLab │ ► │ GitLab Runner │ ► │   Ansible    │ ► │ Docker Swarm │ ──> worker
└────────┘   └───────────────┘   └──────────────┘   └──────────────┘ └─> worker(s)...
 private        build, push        stack deploy         manager
 instance     docker executor     shell executor

Docker Swarm

However, the deployment is not done yet. We just gave the order to Swarm, but the deployment itself will take some time, depending on your deployment strategy.

The docker_stack module returns immediately, it does not wait for services to converge. Though it can return the stack diff.

One important note: remember we have authenticated to the GitLab Container Registry with CI_REGISTRY_USER, which is only valid for 5 minutes after the job is done. If the Swarm deployment takes longer than 5 minutes, your deployment may fail, because nodes will not be able to fetch the image from the registry. You have other options here:

Use a deploy token
Use a personal access token
Extend the default token expiration timeout: Admin area > Settings > CI/CD > Container Registry > Authorization token duration (minutes)

Final Notes

We have been using this workflow for 2 years now, and it has been working great. We are still considering using docker_swarm_service to have a better control, but we like to have our dedicated stack file.

We will probably give a try to Nomad in the coming months, and see how it compares to our Swarm cluster. We like simple things, so it might be a match. Time will tell!