Data Transformation Quickstart
Last updated
Last updated
Dataworkz allows user to transform data from any source (DB, Cloud Storage, SaaS Applications etc.) and write the transformed data to a datasource of their choice (DB, Warehouse, Cloud Storage etc.).
This guide provides an example with the required steps to demonstrate the transformation capability of Dataworkz platform. It shows how effortlessly data from any data source (Google Spreadsheet in this case) can be transformed and written to any datasource (Snowflake in this case).
For every Dataworkz account there is a default connector that is preconfigured for a dedicated S3 bucket for that account. You can choose to configure connector to another bucket of your choice by referring to Connector.
Goto Configuration -> SaaS Applications -> Google Sheet
Click the + icon to add a new configuration
Enter name for the configuration in the above screen
Select the authentication option for your account (OAuth / Service Account). Let us use OAuth for this guide.
This requires using Google OAuth to authorize Dataworkz to access Google Sheets
Select the workspace
Provide URL of the google sheet
From the dropdown select the tab from the list of tabs in the spreadsheet
Click Save
Newly created connector would show up in the list of Google Sheet configurations
Click the Data Studio -> Dataset link from the top menu
Click the workspace selected in the configuration (default in this case) and drill down to lowest leaf node gsheet -> Dataset v0 in this example. This would display the data pulled from the google sheet
Click the transform button at top right of the screen
Click the burger menu at top of any column (say of String type) and select any operation that you intend to perform (say converting a String to Uppercase)
After applying the transformation function you’ll see the modified values for the column
Transformed data can now be written to a different Dataset. For this purpose, click the “Execute Job” button at the top right of the screen. This would pop-up the screen below.
Select the columns from the dropdown that need to be written to the destination/target Dataset. Click “Next”
Select the target workspace in the next screen (say default in this case) and the collection that should contain the target dataset (say s3_<account-name> that would be available by default)
Choose the Dataset or create a new one (select the root S3 folder and right click and select NewDataset). Enter name for the new dataset. Click “Next”
Select the Job Frequency (default “One Time” should be fine for this use case). Click “Submit”
This would submit the transformation job to the queue.
After a wait of 5-10 minutes (depending upon the size of data being processed) you should be able to see the transformed data written to the target dataset.
To view the target dataset click “Dataset” from the top menu and navigate to target workspace (default in this case) and then further down to the target dataset (under s3_<account-name> in this case)