Data Transformation Quickstart

Dataworkz allows user to transform data from any source (DB, Cloud Storage, SaaS Applications etc.) and write the transformed data to a datasource of their choice (DB, Warehouse, Cloud Storage etc.).

This guide provides an example with the required steps to demonstrate the transformation capability of Dataworkz platform. It shows how effortlessly data from any data source (Google Spreadsheet in this case) can be transformed and written to any datasource (Snowflake in this case).

Step 1 : Configure a S3 Connector

For every Dataworkz account there is a default connector that is preconfigured for a dedicated S3 bucket for that account. You can choose to configure connector to another bucket of your choice by referring to AWS S3 Connector.

Step 2 : Configure a Google Sheet Connector

  1. Goto Configuration -> SaaS Applications -> Google Sheet

  2. Click the + icon to add a new configuration

  3. Enter name for the configuration in the above screen

  4. Select the authentication option for your account (OAuth / Service Account). Let us use OAuth for this guide.

  5. This requires using Google OAuth to authorize Dataworkz to access Google Sheets

  6. Select the workspace

  7. Provide URL of the google sheet

  8. From the dropdown select the tab from the list of tabs in the spreadsheet

  9. Click Save

Newly created connector would show up in the list of Google Sheet configurations

Step 3 : Applying Transformation on the spreadsheet data

  1. Click the Data Studio -> Dataset link from the top menu

  2. Click the workspace selected in the configuration (default in this case) and drill down to lowest leaf node gsheet -> Dataset v0 in this example. This would display the data pulled from the google sheet

  1. Click the transform button at top right of the screen

  2. Click the burger menu at top of any column (say of String type) and select any operation that you intend to perform (say converting a String to Uppercase)

  1. After applying the transformation function you’ll see the modified values for the column

  1. Transformed data can now be written to a different Dataset. For this purpose, click the “Execute Job” button at the top right of the screen. This would pop-up the screen below.

  1. Select the columns from the dropdown that need to be written to the destination/target Dataset. Click “Next”

  2. Select the target workspace in the next screen (say default in this case) and the collection that should contain the target dataset (say s3_<account-name> that would be available by default)

  3. Choose the Dataset or create a new one (select the root S3 folder and right click and select NewDataset). Enter name for the new dataset. Click “Next”

  4. Select the Job Frequency (default “One Time” should be fine for this use case). Click “Submit”

  5. This would submit the transformation job to the queue.

  6. After a wait of 5-10 minutes (depending upon the size of data being processed) you should be able to see the transformed data written to the target dataset.

  7. To view the target dataset click “Dataset” from the top menu and navigate to target workspace (default in this case) and then further down to the target dataset (under s3_<account-name> in this case)

Last updated