On Premise Kubernetes Self Hosted CI/CD using Jenkins and Argo CD

Our CI/CD Journey

One of the best things I did at work was adopting a fully automated CI pipeline using Jenkins about 5 years ago. Prior to that all of our builds and deployments were done by hand. After manually doing builds for 15 years I was over it! A build of our primary product could take up to two hours, was tedious, and it was possible to miss a step and make a mistake. This long build cycle also discouraged us from making builds very often which meant each build that was made had more and more changes in it. That in turn took longer to QA during which time more new changes ended up going into the next build leading to a drawn-out release cycle. The worst release took about a year and a half!

Shortly after the start of a new web service based project that was comprised of multiple web services hosted across multiple different IIS servers I decided it was time to stop doing manual builds and deploys. My primary guide was the excellent Continuous Delivery book by Jez Humble and David Farley. That led me down the path of staying close to the master branch and automating the build, test, and deployment cycle. Let the computer doing all the tedious work. That’s what they are for!

After loading Jenkins on my own PC I started the journey to automating the entire build/test/deployment pipeline. Later we built a dedicated Jenkins server on Windows to do our builds. Eventually this was extended to do builds for new/changed pull request and deployed to separate URLs for QA to test against before merging to master. That project’s Jenkins pipeline is still in use today having done over 5000 master branch builds and 4000 pull request builds/preview deployments. That allows us to see new changes on our dev servers within 20 minutes of being pushed to Git and has saved us a huge amount of time doing that work by hand!

The Move to Kubernetes

As more of our new projects have moved off Windows and onto Linux we lost our ability to do PR/preview deployments as that code leveraged some Windows/IIS features. Over time I started to consider using Kubernetes to host our applications. THe draw was to get all of the app’s external dependencies bundled up by the developer in the Docker image instead of IT having to craft a VM to match the developer’s requirements. I also wanted to simplify the DR environment scenario as well which currently requires us to maintain duplicate VMs and deploys in a remote data center. I also thought that Kubernetes namespaces might easily allow us to get PR/preview environments for Linux up and running.

However, getting Kubernetes up and running was painful for us. All of our code and apps are internally hosted in our own data center servers, so we had to create our own Kubernetes cluster on premises. This will go on to cause some pain points as our servers are on internal networks and usually don’t have HTTPS TLS certs (or if they do, they are self-signed). Also, we use self-hosted Bitbucket Server for our Git repositories.

After lots of trial and error manually loading Kubernetes I finally settled on using Kubespray. This is basically a set of Ansible scripts that will bootstrap a working Kubernetes cluster from new Linux nodes. With that we can create a 5 node cluster with one command line in about 15 minutes! Easy reproducible cluster creation is a good thing, especially since we know we will have multiple clusters running in different locations.

After the cluster was running I then installed Helm followed by the Nginx Ingress helm chart, Rook using Ceph so we have persistent volumes available to our pods, and then the Fluent Bit helm chart to ship container logs to our Elastic Search logging server.

Beginning Our Kubernetes CI/CD Journey

Based on our prior work I knew I still wanted to have a completely scripted CI/CD pipeline. However, Kubenetes is a different beast that would require us to change how we build and deploy our projects. One goal of our new Kubernetes based deploys was to try and get the deployment configuration into Git instead of living mostly in Jenkins shell command lines or manually run kubectl commands that would we never be able to reproduce across clusters.

Jenkins

Since I already knew Jenkins, I started by using Jenkins with the Kubernetes and Docker plugins. I ended up configuring a new Jenkins server running on a Linux host with Docker. I actually decided to install Jenkins using the Dockerhub Jenkins image as a base image and then loading in the docker and kubectl command line utilities in our own custom image pushed to our internal Nexus Package Repository. I then loaded our custom Jenkins Docker image with access to the local Docker domain socket so our docker container can run docker builds.

docker run -p 80:8080 -p 50000:50000 --restart=unless-stopped -v /var/run/docker.sock:/var/run/docker.sock -v /var/jenkins_home:/var/jenkins_home -e JAVA_OPTS="-Duser.timezone=America/Los_Angeles" -d --name jenkins xxx.xxx.xxx.xxx:8083/jenkins-docker-kube:2.207

This was our first attempt at Kubernetes, so our general knowledge level was still pretty low at this time. I started by using the Jenkinsfile scripts I knew to build our Dockerfile:

withCredentials([usernamePassword( credentialsId: credentialsId,
    usernameVariable: 'USERNAME', passwordVariable: 'PASSWORD')])
    {  // Workaround for docker credentials
    docker.withRegistry(dockerReg, credentialsId) {
      def customImage = docker.build(
          "${dockerImageName}:${dockerTag}",
          "-f ${dockerFilename} .")
      // Workaround for docker credentials
      sh "docker login -u ${USERNAME} -p ${PASSWORD} ${dockerReg}"
      customImage.push()
    }
  }

Then the script ran some kubectl command lines to try and apply the appropriate image to our dev Kubernetes cluster. But this required changing image tag numbers in the YAML files which was painful. And I still didn’t have our PR/preview environments since the script were just deploying one project at a time. Many of our projects either call other project services or are called by other project services. I didn’t have an easy way to get our preview PR container to be used while still maintaining a normal dev environment that is using the latest set of master images (thereby matching production).

Jenkins X

Jenkins X is a highly opinionated Kubernetes specific version of Jenkins that promises to do just about everything I wanted. It has code to bring an existing non-Kubernetes project into Kuberenets, it will build projects in Kubernetes, and it will create preview environments from PRs automatically! It does GitOps so your various staging and production environment configurations are stored in separtate Git repos! Sounds perfect right?

Everything in Jenkins X is controlled by their jx command line interface. Even the installation of Jenkins X into Kubernetes is performed by the jx install command line. However, figuring out the correct options for our environment took some trial and error. In my opinion Jenkins X has tried to do much more than it should in an attempt to provide a out of the box experience. It will even create a Kubernetes cluster from scratch for you! Assuming you are using a major cloud provider like GKE. It will also spin up a copy of Nexus as an internal package repo for you. It will add Git hooks into your specific Git repo of choice. All those things sound great for trying to make an easy out of the box experience, however when your setup is a on premise Kubernetes cluster using locally hosted BitBucket Server without HTTPS then you start running into problems. This type of setup is not their “normal” happy path use case of Github and GKE. As a result, documentation on how to configure Jenkins X for this type of setup is scattered.

After lots of trial and error I did get Jenkins X up and running and building a couple of projects in preview environments successfully. It worked pretty well. However, the preview environments still deployed just the one project but I could use some custom kubectl command lines to insert this preview build into other dependent projects environments.

Then about a month later I updated Jenkins X to the latest build. It broke our preview environments. After tons of searching on Github and Google nothing was found. Looking on their Slack channel I found one other person with the same problem. The only help was a suggestion to move to Github instead of Bitbucket. Not helpful. I looked through the Jenkins X code but couldn’t find any Bitbucket specific changes to cause the problem. The best I can figure out is something somewhere in the code takes the Git URL for our server, breaks it apart and then at some point reassembles it assuming https:// belongs on the front of the URL. Works great for Github, not so much for HTTP based local BitBucket. Sure, I could change our server to use HTTPS with a self-signed cert and maybe add a CA to our machines but I wasn’t interested in making a ton of changes to our existing working environment just to make this tool work. I rolled back to the prior Jenkins X version and that fixed the preview environments again. But being stuck running on an old version is not how I want to be running long term. Also their automated PR cleanup Kubernetes cron jobs were failing on my cluster constantly and leaving a bunch of tasks behind in the cluster.

Another problem with Jenkins X is that eventually I would like to get our Windows IIS based services running under Windows Kubernetes (if I can ever get Windows working reliably in Kubernetes). However, the entire build/deploy system for a Jenkins X project is based on custom Jenkins X images (Linux only). They don’t support Windows and because their builds are done using their own base images that really prevents us from using Jenkins X on Windows. Or I have to wait for them to add support for Windows which might never happen.

After more research on their Slack channel I saw lots of recommendations for installing Jenkins X using jx boot which is a newer way to install Jenkins X that doesn’t try to do all the cluster configuration stuff and focuses just on the CI/CD parts. I tried that but at some point, I ended up with with similar Jenkins X errors as it tried to use HTTPS instead of my HTTP URLs I specified.

From my experience, if you follow the happy path of Github and GKE Jenkins X might work great for you. But the pain and random problems I encountered in an on premise BitBucket Server environment were not encouraging.

Argo CD

Next I tried Agro CD which takes a much different approach than Jenkins X. Argo CD definitely doesn’t try and do everything. Per the “CD” in its name it focuses just on the Continuous Delivery portion of a Kubernetes deployment pipeline. It will not build your project for you but it will make sure your Docker images get where they need to be. It subscribes to a GitOps pattern where your entire Kubernetes deployment environment is specified via declarative files in a Git repo which becomes your single source of truth.

Argo CD is also very flexible regarding how you define your environment. You just point it to a directory in your Git repo that contains either Helm charts, Ksonnet, Kustomize, or standard Kubernetes YAML files and it will make sure those get applied to your Kubernetes cluster. You can have the definitions applied to the Kubernetes cluster Argo CD is running in or you can point it to an external Kubernetes cluster.

Argo CD is easy to install into any Kubernetes cluster. It has minimal data stored in it so that is just stored in Kubernetes etcd so it doesn’t require persistent volumes which is nice. Also, it appears to use a standard Git client for getting the declarative configuration out of Git so it doesn’t matter what Git server you use. As long as you have a valid Git URL and credentials you are good to go. You can then download the Argo CD command line interface executable so you can manipulate the Argo CD configuration via command lines.

To get Argo CD up and running you just need to create a Kubernetes namespace and then download and apply a Kubernetes YAML file using kubectl and you are ready to start configuring Argo CD for your project. Configuring a project environment comes down to telling Argo CD a name for the project, the Git repo, branch, and directory to get configuration from, and the Kubernetes cluster and namespace to apply that configuration to.

CI/CD Pipeline Using Normal Jenkins and Argo CD

So now it is time to tie all of this together and create a real CI/CD pipeline with preview environments. I am using a standard dockerized Jenkins server to do the Docker image builds and push to an internal Docker image repo. Since we use Bitbucket I installed the Bitbucket Branch Source plugin and for a project I configured a Jenkins multi-branch Pipeline with the Branch Source setup for our Bitbucket server and filter for branch names master|PR-.*. This causes Jenkins to build on every master branch change or new/changed pull request. Master branch builds get pushed up with an Docker image named project-name:version.BUILD_ID. PR builds get pushed up with a Docker image named project-name:PR-#.BUILD_ID.

In order to automate preview environments for pull requests, I broke up the Jenkinsfile into two stages. The first stage performs the docker build and push and the second stage deploys to Kubernetes using our separate GitOps repo. This requires one Jenkins pipeline to access two different Git repositories. So at the top of our Jenkinsfile we tell Jenkins to don’t checkout the project Git repo by default using options { skipDefaultCheckout true }. Then our first stage steps are wrapped in a dir('build') {} block and our second stage in a dir('deploy') {} block to keep the two Git repositories separate. In the first stage we then do a normal checkout scm to get the configured Git repo. In the second stage we do a checkout with explicit parameters to check out the GitOps repo.

Then in the second Kubernetes deployment stage the Jenkinsfile script calls a custom shell script to create a new Kubernetes namespace named project-name-PR-# so we have a unique Kubernetes namespace/PR name to deploy the application to. Then with the GitOps Git repo we make a new branch based on the project/PR name and then use awk to modify the Docker image tag in the YAML file for the project so the tag value gets formatted as formatted PR-#.BUILD_ID and matches the just built Docker image tag name. We also modify the ingress domain name to include a subdomain name that includes the project/PR# name in it as well (leveraging wildcard DNS entries pointing to our NGINX Ingress controller). The script then pushes that branch and change to the Git server. Then we call the Argo CD command line interface executable to create a new app with the same name as the namespace project/PR name pointing to the GitOps branch we just pushed. This causes the definition in that GitOps repo branch to get deployed to the Kubernetes project/PR namespace we defined. At that point we have a copy of the service from our PR code branch deployed and running in Kubernetes and running under a unique Ingress DNS name we can manually call!

In our case our overall application is made up of many different web services. Many of these web services call other web services as well and sometimes the PR is changing code in a web service that is only called by other web services. So, in order to be able to call a web service that calls a changed web service and test the new behavior we need to make sure that first web service routes to the new backend web service. In Kubernetes we can manually change service definitions to point to different namespaces but to keep things simple we ended up creating an umbrella Helm chart that deploys each project’s Helm chart in the new namespace. That way each PR namespace has its own copy of every web service (our entire application) it might need to use.

The other problem is that currently only a couple of our web services are capable of being deployed and run on Kubernetes. To make these services work for our services that are being deployed to Kubernetes, I created a special “external” Helm chart that includes a bunch of Kubernetes ingress definitions and “ExternalName” services that point to our existing non-Kubernetes development environment services. Then as time goes on we can slowly move these external web services into Kubernetes at our own pace.

Then for production I just need a separate Git repo or branch with similar configuration in it. The only difference is this configuration would be specific to our production environment. Production DNS names, replicaCount values, external service references, etc…. We just need to create a new Argo CD app pointing to this repo and our umbrella chart and the production Kubernetes cluster. Then anytime we want to upgrade production we just need to edit the Git repo. Either through an additional prompted Jenkinsfile “Production Deploy” stage or we just manually edit the Git repo by hand.

What’s Next?

One thing we already know we need to address is cleaning up old preview environments. Currently once we are done with a PR preview environment, we just manually delete the Kubernetes project/PR namespace name and all the deployments/services/ingress/etc… get completely cleaned up for us by Kubernetes. At some point we will look at automating this piece.

The above setup appears to be doing pretty much everything we were looking for in a CI/CD system. We have only just started using this setup so I am sure we will find other things we need to do differently as we move forward. The nice thing though is that we have very little code in each project that is specific to Jenkins or Argo CD. Everything these projects are doing we could manually do using a handful of command line each time if we needed.

Also our applications aren’t dependent on any of these tools giving us flexibility to change in the future. Our main deployment related dependencies rely on standard Kubernetes and Helm YAML, not Jenkins or Argo CD specific codes/scripts. Those tools are just automating the process. Our application code and Docker images are not dependent on any of these tools or infrastructure allowing us plenty of flexibility in the future since we never know what the future holds!