Best Practices for Using Context Variables with Talend: Part 4
Last night it occurred to me that everything in the last three parts of this blog series ( Part 1 / Part 2 / Part 3 ) had been oriented towards the Talend on-premise solution. Many developers I have worked with over the years are moving to Talend Cloud. I too have moved much of my code to the cloud as well. In fact, to build a lot of the collateral for this blog, I had to go back to my old on-premise environment to put it together. I then got to thinking, “would these best practices work in Talend Cloud?”
What's a Remote Engine?
Before I explain whether it will and if so, how, I should point out that one of the benefits of Talend Cloud is that there is significantly less server admin work to carry out. If you run everything in the cloud, then all you need to worry about is building your jobs and configuring them to run.
I know that when I was a full-time developer, I always thought too much of my time was taken up dealing with server administration. This not only frustrated me, but it swallowed up time that could have been better spent doing what I was good at. Having the freedom to just develop is a massive benefit, but having everything in a cloud that is managed by another entity can sometimes limit flexibility…..or so you might think. With everything running in the cloud, getting access to set operating system environment variables or create local properties files on the servers is likely to be a challenge, if indeed possible at all. Talend recognized that and have overcome it with the Remote Engine.
The Remote Engine is what bridges the gap between an entirely cloud-hosted solution and an on-premise solution. You can still have total control of your Remote Engine’s config while also allowing Talend to handle the Management Console. You can also have your data processed where you need it processed (none of the data goes from the Remote Engine to anywhere you don’t expressly send it), which means that your current on-premise code can likely be easily migrated without implications caused by moving it away from other on-premise tools you may have.
The reason I have focused on the Remote Engine is that it is this that allows us to do pretty much exactly what has been described in the previous blog posts, using Talend Cloud. When I tried it out, there were a few subtle changes that I had to make, but I will go through these as I explain how I got it working.
However, first I feel I owe an apology to non-Windows users….
Environment Variables on Systems other than Windows
I started my investigations into achieving this with Talend Cloud by setting up on a Mac. I am a new convert to the world of Macs. As yet, I haven’t quite been through all of the situations I have experienced on Windows, with my Mac. Environment variables were an area I thought might be interesting. As it turned out, it went from interesting to downright silly. I spent a couple of hours trying to figure out why my variables, which worked in all of my terminals, would not be picked up by my Talend Studio.
It turns out using .profile, .bashrc, .bash_profile, etc, are all useless when wanting a GUI app to pick up your variables. You need to use a plist file. I won’t go into detail about this here, I’ll just point you to this useful link. This process solved my issue on my Mac. Once I had set up my environment variables here, Talend Studio was able to see them and I could use this functionality as I could on Windows.
However, there is a part 2 to this apology. I must also apologise to those of you who may have tried to configure an on-premise Talend Runtime on Linux as well. I’m kind of hoping that this is a very small number, but I suspect that someone may have (or will in the future and will be pulling their hair out now). The Talend Runtime is an Apache Karaf based OSGI container that runs as a system service on Linux. As such, any environment variables set in .profile, .bashrc, .bash_profile, etc, will be ignored by anything that runs inside it.
The Remote Engine is based upon Apache Karaf as well. However, we can get around this VERY easily. When you install the Talend Runtime or the Remote Engine as a service on Linux, you will make use of a wrapper.conf file. For the Talend Runtime it will be called something like Talend-ESB-Container-wrapper.conf and for the Remote Engine, it will be called something like Talend-Remote-Engine-wrapper.conf. The file will be located in the installation’s /etc folder. All you need to do is to stop the service from running and add a couple of lines to the beginning of the wrapper.conf file.
Look in the file to find some code like this….
set.default.JAVA_HOME=${java.home}
set.default.KARAF_HOME=${karaf.home}
set.default.KARAF_BASE=${karaf.base}
set.default.KARAF_DATA=${karaf.data}
….and add the following with the settings you require for your variables….
set.default.FILEPATH=/home/Richard/Documents/env.txt
set.default.ENCRYPTIONKEY=12345678
These variables will be picked up in exactly the same way as system environment variables, by anything running inside the Talend Runtime or Remote Engine.
So, how is this done using the Remote Engine?
Once we have all of the possible environment variable issues resolved, it is extremely easy to get this working by using the Remote Engine. First, we need to install a Remote Engine. If you haven’t done this, there are instructions which can be followed here.
Once the Remote Engine is installed and the updates to the wrapper.conf (described above) are implemented, we can configure our first Task. I’ll assume that you have created a job for this (following the instructions in Part 3 of this blog) and have uploaded it to the artifact repository. If so, you can follow the steps below to see this working in the Remote Engine.
1) Go to the Management Console and click on the “Operations” link in the left sidebar. Then click on the “View Tasks & Plans” button.
2) Click on the “Add” button and select “Task”
3) Select the “Workspace”, “Artifact type” and “Artifact”. The job being set here is a test job that has been configured to use the Implicit Context Load.
4) Leave all of the context variables blank because these will be set via the Implicit Context Load
5) Select the “Runtime” and “Run type”. We are selecting the Remote Engine here. This is important. The “Run type” can be left as “Manual” or you can set this to be scheduled if you want.
6) Once we click “Go Live” the job will start (if we left the “Run type” as “Manual”). The next screen will show the job running on your Remote Engine.
7) If everything has been configured correctly, the next screen will show a success status
Using the method described in this blog series, you can easily control your context variable usage across all of your environments, so long as you can add environment variables to your servers. If the Implicit Context Load settings are configured for your project, you needn’t ever think about which context is used. When you build a new job, it will automatically be set to use the Implicit Context Load, which will be controlled by the settings on the machines you use to run your jobs.
← Part 3
Ready to get started with Talend?
More related articles
- What are Data Silos?
- What is Data Extraction? Definition and Examples
- What is Customer Data Integration (CDI)?
- Talend Job Design Patterns and Best Practices: Part 4
- Talend Job Design Patterns and Best Practices: Part 3
- What is Data Migration?
- What is Data Mapping?
- What is Database Integration?
- What is Data Integration?
- Understanding Data Migration: Strategy and Best Practices
- Talend Job Design Patterns and Best Practices: Part 2
- Talend Job Design Patterns and Best Practices: Part 1
- What is change data capture?
- Experience the magic of shuffling columns in Talend Dynamic Schema
- Day-in-the-Life of a Data Integration Developer: How to Build Your First Talend Job
- Overcoming Healthcare’s Data Integration Challenges
- An Informatica PowerCenter Developers’ Guide to Talend: Part 3
- An Informatica PowerCenter Developers’ Guide to Talend: Part 2
- 5 Data Integration Methods and Strategies
- An Informatica PowerCenter Developers' Guide to Talend: Part 1
- Best Practices for Using Context Variables with Talend: Part 2
- Best Practices for Using Context Variables with Talend: Part 3
- Best Practices for Using Context Variables with Talend: Part 1