Building a CI/CD pipeline with Talend and Azure DevOps
DevOps is all the rage right now, and it is only the beginning.
In this blog, I’ll cover how to get started with Talend continuous integration, delivery and deployment (CI/CD) on Azure. The first part of the blog will briefly present some basic DevOps and CI/CD concepts. I will then show you how the Talend CI/CD architecture and how it fits in Azure ecosystem with a hands-on example.
What is DevOps?
Before digging into the technical depth, let me briefly explain to you what the term DevOps is referring to. Basically, it’s a methodology that companies embrace to bring together software development (develop the applications) and operations (deploy in production), hence the term Dev + Ops. Typically, a DevOps team functions as the bridge between development and operations. It combines the needs of both teams in order to be more efficient. Through this philosophy, the entire lifecycle is controlled from design to production thanks to continuous integration, development and deployment. Adopting a DevOps strategy implies a wide range of benefits:
- Better agility -> fit better with the expectations
- Faster deployments -> better time to market, early feedback
- More automation -> fewer human errors
- Better testing and security -> more reliability
Figure 1: Software Development Lifecycle
How Does Talend CI/CD work?
That being said, let’s look at how Talend evolves in the DevOps world.
Figure 2: Talend Continuous Integration, Continuous Delivery and Continuous Deployment
Everything starts with designing jobs and unit tests in Talend Studio. These jobs are sourced control in versioning software such as Git. Source control is very important in CI/CD as it guarantees effective collaboration and reproducibility in our continuous integration process.
The Maven builds follow the design portion with the help of the Talend CommandLine and CI Builder. Depending on the build type, it creates a Talend Artifact (basically jars and execution scripts) or a Docker image. The latter is a packaged version of the Talend artifact in a container image using Docker. The aggregation and automation of the two previous steps is called continuous integration.
We can consider continuous delivery when it is added a publication of any sort. Publishing means that we have a place to store and distribute our artifacts or Docker images. In the case of Talend artifact builds, you can publish to Artifact Repositories (such as Nexus or Artifactory) or publish to Talend Cloud. For Docker images, we use what are called Docker registries. However, the goal remains the same using Docker images in that we are using these locations to distribute and deploy our jobs.
The continuous deployment part depends on where the jobs are published. If an artifact repository is used, you are most likely on-premise and must use a JobServer to execute your jobs. In the case of Talend Cloud you can use your own Remote Engine or take advantage of Talend Cloud Engines. For containers, you can run them anywhere Docker is available: a standalone machine with a Docker daemon, cloud provider container services (AWS Fargate, AWS ECS, Azure ACI, Kubernetes, OpenShift, etc.).
Talend CI/CD pipeline on Azure DevOps
Now that we have the basics, let’s get our hands dirty and build our first pipeline on Azure DevOps. In this blog post, we are only going to outline the continuous delivery of containerized jobs in Azure. The deployment part would necessitate another full article as there are several possibilities to tackle it.
Requirements
You need to already have a good knowledge of the usual Talend project management through Talend Management Console and Talend Studio. Please refer to Talend help documentation. You can also read this blog post where the project configuration is detailed.
- Azure DevOps Services
- Talend Platform license
- Talend 7.1.1
- Azure Repos for versioning your projects (or GitHub, the following applies to both) already set up in Talend Management Console
- Talend Cloud to manage your projects
Azure DevOps
Azure DevOps is a set of tools to manage CI/CD pipelines. From Azure DevOps product page, it entails:
- Azure Boards: Deliver value to your users faster using proven agile tools to plan, track, and discuss work across your teams.
- Azure Pipelines: Build, test, and deploy with CI/CD that works with any language, platform, and cloud. Connect to GitHub or any other Git provider and deploy continuously.
- Azure Repos: Get unlimited, cloud-hosted private Git repos and collaborate to build better code with pull requests and advanced file management.
- Azure Tests Plans: Test and ship with confidence using manual and exploratory testing tools.
- Azure Artifacts: Create, host, and share packages with your team, and add artifacts to your CI/CD pipelines with a single click.
- As you can see, the scope is very large, from project management to git or artifact hosting. However, Azure Pipelines is the one managing the whole CI/CD pipeline.
Talend CI/CD for containers with Azure DevOps
Figure 3: Talend CI/CD with Azure DevOps
In this example, we will focus on building container images in the Azure ecosystem. In Azure DevOps the Talend CI/CD for containers is represented in the diagram above. We are going to use the in-house git called Azure Repos. You can use GitHub as well, they are both well integrated with Azure Pipelines. Speaking of Azure Pipelines, it’s an equivalent of the well-known Jenkins if you are more familiar with this tool. The goal here is to build continuously our jobs into Docker container images and push them to Azure Container Registry. To know more about container registries authentication please refer to my previous blog post.
Azure Pipelines allows you to manage a CI/CD pipeline, but it needs build agents to effectively perform the builds. There are many Microsoft-hosted native agents such as a Maven agent (on-demand), but Talend builds need external components like the Talend CommandLine or a Docker daemon. That is why we want a custom build agent in order to achieve our builds successfully. These are called self-hosted agent.
Let’s start:
1. Prepare permissions
Please follow this Azure documentation to prepare the permissions to set up a self-hosted agent. It shows you how to create a PAT token that will allow you to connect your self-hosted agent to your Azure DevOps account.
2. Create the virtual machine (self-hosted agent)
Figure 4: Azure Virtual Machine settings
In your Azure portal, start by creating a virtual machine with Centos 7.5. Once launched successfully, make sure you perform the following instructions:
- Install OpenJDK8 and Maven 3
- Install Docker as a non-root user
- Install Git
- Pull OpenJDK docker image: docker pull openjdk:8-jre-slim
- Install Talend CommandLine Download Talend Studio 7.1.1 and unzip it in your machine (it will be your CommandLine folder) Copy your license at the root of your CommandLine folder
3. Configure the Nexus for third-party libraries
If you plan to build jobs using third-party libraries, you will have to set up an Artifact Repository such as Nexus. It will upload all the jars file needed to complete the build of these jobs.
- Install Nexus 3
- Install a Nexus instance wherever make sense for you, either on the same machine or a different one.
- Create at least these two new repositories
- talend-custom-libs-release (maven2, hosted, release version policy and permissive)
- talend-custom-libs-snapshot (maven2, hosted, release version policy and permissive)
- release (maven2, hosted, release version policy and permissive)
- Configure it in Talend Management Console:
Figure 5: Talend Management Console Nexus setup
Once the Nexus is fully configured restart your Studio and you should see that at startup the third-party libraries used in your project are being uploaded to the Nexus server:
Figure 6: Talend Studio with Third-Party libraries being uploaded
You can check these libraries are available in your Nexus Web Interface:
Figure 7: Nexus with Third-Party libraries
4. Install the Azure agent on the virtual machine
Let’s come back to your self-hosted agent. Now that all the requirements on this virtual machine are set up, we can bind the machine to what we call an agent pool. An agent pool is one or more agents that will execute your pipeline in Azure Pipelines. To configure your self-hosted agent, go to your Azure DevOps- Organization Settings. Then follow the Azure documentation.
Figure 8: Azure DevOps Self-Hosted agent installation
Configuration on the virtual machine should look like this:
Figure 9: Azure DevOps Self-Hosted agent configuration
If you miss dependencies to config the Azure agent, they created at your disposal a script to install them:
Run: “sudo ./bin/installdependencies.sh” in the agent folder
Once you configured your agent you can run it with the run.sh script. (of course, you can add it to your systemctl or other to launch it at startup).
Once launched, you can see it Online in your Agents Pools in Azure DevOps:
Figure 10: Azure DevOps Self-Hosted agent online
Finally, connect again in SSH to your self-hosted agent and pre-configure your command line:
- Run the Talend CommandLine once “./commandline-linux.sh” to initialize the configuration and exit it.
- Then modify the file “commandline-linux.sh” and replace the command with this:
- ./Talend-Studio-linux-gtk-x86_64 -nosplash -application org.talend.commandline.CommandLine -consoleLog -data workspace startServer -p 8002
- Edit the YOUR_PATH/configuration/maven_user_settings.xml with this file. (Please change Talend CommandLine path depending on your own path)
5. Modify Project POM in Studio
- You need to modify the Project POM in your Studio: Settings -> Maven -> Build -> Project
- In the Docker Profile, look for the <autoPull></autoPull> tags and instead of “once”, edit with “false” and push your changes to git.
6. Create a Pipeline in Azure DevOps
Figure 11: Azure DevOps pipeline
It’s finally time for us to create a pipeline in Azure DevOps. Select your Azure Repos or GitHub source and self-hosted agent pool you created previously. Then copy-paste this file which is the yaml file describing the pipeline. Please change the variables and the Nexus URL accordingly.
The docker_password variable is not mentioned in the pipeline file. For security reasons please set a secret.
As you can see, only one Maven command allows you to build and push your Docker image. Everything is taken care of by Maven and the Talend CommandLine.
7. Run the Pipeline in Azure DevOps
You can now run your pipeline! You should see that your agent pool is selected and be able to access the logs of your builds.
Conclusion
To conclude this article, let’s sum it all up. After a brief overall description of the DevOps and CI/CD concepts and how Talend can be structured around them, we have created a simple CI/CD pipeline with Azure DevOps. This pipeline allows us to continuously build our jobs as Docker images and push them to a Docker registry. Taking advantage of CI/CD and containers can help you overcome many challenges in your organizations. It will help improve your agility, reproducibility and flexibility by letting you run your jobs anywhere.
Want to deploy data pipelines in minutes, design seamlessly across batch and streaming use cases, and scale natively with the latest hybrid and multi-cloud technologies? Talend Pipeline Designer is a next-generation cloud data integration design environment that enables developers to do just that!
Learn more about Pipeline Designer's features on the product page here or try it free for 14-days!
Ready to get started with Talend?
More related articles
- What is IT modernization?
- What is digital transformation?
- What is data mesh?
- What is Data Fabric?
- How a digital transformation strategy promotes a strong data culture
- Build a Solid Data Strategy: What You Need To Know
- 10 Things You’re Doing Wrong in Talend
- MDM: What is Master Data Management?
- What is a Data Pipeline?