[TOS tutorial 07] Configuring Joins in tMap
In this tutorial, learn how to configure the join outputs in a tMap component.
This tutorial uses Talend Open Studio Data Integration version 6.
1. Configure the join model
- In the jointMap Job, to open the tMap component wizard, double-click the tMap_1 component. Note: Clicking the tMap settings button will display a list of parameters to configure your input or output flows. One of the settings available for input flows allows you to change Join Model from the default Left Outer Join to an Inner Join.
- To change the Join Model property, click the default setting Left Outer Join, and then click [...] that appears next to Left Outer Join. In the Options window, click Inner Join, and then click OK. Note: When you change the default settings, a red dot with the number 1 on it appears on the tMap settings icon. This indicates that you have changed one parameter of the default tMap settings.
- Close the tMap wizard and run the Job.
In the Job Designer, observe that a total of 1682 rows of data from the left input are processed by the tMap component. However, only 142 rows appear in the output file. This is because the inner join only produced matches for 142 rows, resulting in the rejection of the other rows.
You can validate the rejection of other rows by viewing the moviesComplete output file. In the file, observe that all movies will have the name of the directors.
2. Create a new output in the tMap component to collect the inner join rejects only
- Open the tMap_1 component wizard and create a second output component named joinRejects. A blank output flow is created.
- To add movieID, title, releaseYear, url, and directorID fields to the output component, select the five fields from the movies component and drop them on the output component.
- In the joinRejects output file, click the tMap settings.
- To change the Catch lookup inner join reject property, click the default setting false, and then click [...] that appears next to false. In the Options window, click true, and then click OK. Note: By changing the Catch lookup inner join reject property to true, you can catch all the lines of data that were rejected by the inner join in the new output.
- Add a tFileOutputDelimited component to the Job Designer and link the joinRejects output of the tMap_1 component to the tFileOutputDelimited_2.
- To configure the output component, in the Component view of the component, specify the path and name for the output file. Also, include a header row in the output file and run the Job.
In the Job Designer, you can observe that out of 1682 rows of the input data, 142 rows appear in the joinedOutput output, and the 1540 rejected rows are collected in the joinRejects output.
You can also view the joinRejects output file and see all the movies that were rejected by the join. These are the movies that do not have directorID in the movies file, plus those that have directorID in the movies file that are absent in the directors file.
Altri articoli correlati
- Come iniziare a lavorare con Talend Open Studio for Data Integration
- [TOS tutorial 02] Reading a File
- [TOS tutorial 01] Presentazione di Talend Studio
- [TOS tutorial 03] Sorting a File
- [TOS tutorial 08] Aggiunta di filtri basati su condizioni utilizzando il componente tMap
- [TOS tutorial 09] Using Context Variables
- [TOS tutorial 06] Come unire due sorgenti di dati con il componente tMap
- [TOS tutorial 05] Procedura in tre passaggi per filtrare i dati utilizzando il componente tMap
- [TOS tutorial 04] Creazione e uso di metadati
- [TOS tutorial 13] Running a Job on Spark
- [TOS tutorial 12] Scrittura e lettura di dati su file HDFS
- [TOS tutorial 11] Creating Cluster Connection Metadata from Configuration Files
- [TOS tutorial 10] Creating Cluster Connection Metadata
- [TOS tutorial 14] Running a Job on YARN