Configuring a Join
This page displays the steps involved when creating any kind of join, inner, outer, fuzzy, or cross join
Last updated
This page displays the steps involved when creating any kind of join, inner, outer, fuzzy, or cross join
Last updated
Following steps need to be followed for achieving a join between 2 datasets.
Click Data Prep from the top menu
Select the type of Data Preparation that is intended (Join in this example)
Select the 1st dataset that needs to be joined. Click, drag and drop the same on the canvas on the right side. Perform the same operation with the 2nd dataset.
Now the 2 datasets would appear on the canvas as below
Navigate to the Operations tab from the left panel by clicking the same
Drag and drop the desired join type onto the canvas
Connect the 2 datasets to the join.
Click the join node and it will pop-up "Map Column" screen. In this screen the columns to be match from the 2 datasets being joined. Click "Map Column" button. Mapping is displayed at the bottom of the screen.
There are 2 exceptions here.
This step isn't required for the the cross join
For fuzzy join, additionally a matching condition (low/medium/high/extremely-high) needs to be selected
Click either of the datasets and it will display 3 options.
Select Where
Select Date & Column
Click "Where" to add a where condition. This step is optional and needed only when a filter condition is required
Click "Select Date & Column". Records can be selected either based on the date range or the time interval. Also desired columns in the result set can be selected from the dataset
Repeat the above 3 steps for the 2nd dataset as well. Once the dates and columns are selected, click Save button on the top right of the canvas. This would prompt for output parameters that determine target dataset to which result set of the join is written.
Select the workspace and collection
Select either an existing dataset in the collection or create a new by right click
Select the directory format (Parquet/CSV)
Select the transfer mode (where applicable)
append
upsert
truncate and insert
drop & create
Click Next
Select name for the job that is being executed
Job can be executed either one-time or be schedule as recurring one that runs at the specified frequency.
Click Submit to execute the job
Submitted job can be monitored on the Job Monitoring screen