Configuring a Join
This page displays the steps involved when creating any kind of join, inner, outer, fuzzy, or cross join
Implementation
Following steps need to be followed for achieving a join between 2 datasets.
Click Data Prep from the top menu

Select the type of Data Preparation that is intended (Join in this example)

Select the 1st dataset that needs to be joined. Click, drag and drop the same on the canvas on the right side. Perform the same operation with the 2nd dataset.

Now the 2 datasets would appear on the canvas as below

Navigate to the Operations tab from the left panel by clicking the same

Drag and drop the desired join type onto the canvas

Connect the 2 datasets to the join.

Click the join node and it will pop-up "Map Column" screen. In this screen the columns to be match from the 2 datasets being joined. Click "Map Column" button. Mapping is displayed at the bottom of the screen.
There are 2 exceptions here.
This step isn't required for the the cross join
For fuzzy join, additionally a matching condition (low/medium/high/extremely-high) needs to be selected

Click either of the datasets and it will display 3 options.
Select Where
Select Date & Column

Click "Where" to add a where condition. This step is optional and needed only when a filter condition is required

Click "Select Date & Column". Records can be selected either based on the date range or the time interval. Also desired columns in the result set can be selected from the dataset

Repeat the above 3 steps for the 2nd dataset as well. Once the dates and columns are selected, click Save button on the top right of the canvas. This would prompt for output parameters that determine target dataset to which result set of the join is written.
Select the workspace and collection
Select either an existing dataset in the collection or create a new by right click
Select the directory format (Parquet/CSV)
Select the transfer mode (where applicable)
append
upsert
truncate and insert
drop & create
Click Next

Select name for the job that is being executed
Job can be executed either one-time or be schedule as recurring one that runs at the specified frequency.
Click Submit to execute the job

Submitted job can be monitored on the Job Monitoring screen
Last updated

