Product Docs
  • What is Dataworkz?
  • Getting Started
    • What You Will Need (Prerequisites)
    • Create with Default Settings: RAG Quickstart
    • Custom Settings: RAG Quickstart
    • Data Transformation Quickstart
    • Create an Agent: Quickstart
  • Concepts
    • RAG Applications
      • Overview
      • Ingestion
      • Embedding Models
      • Vectorization
      • Retrieve
    • AI Agents
      • Introduction
      • Overview
      • Tools
        • Implementation
      • Type
      • Tools Repository
      • Tool Execution Framework
      • Agents
      • Scenarios
      • Agent Builder
    • Data Studio
      • No-code Transformations
      • Datasets
      • Dataflows
        • Single Dataflows:
        • Composite dataflows:
        • Benefits of Dataflows:
      • Discovery
        • How to: Discovery
      • Lineage
        • Features of Lineage:
        • Viewing a dataset's lineage:
      • Catalog
      • Monitoring
      • Statistics
  • Guides
    • RAG Applications
      • Configure LLM's
        • AWS Bedrock
      • Embedding Models
        • Privately Hosted Embedding Models
        • Amazon Bedrock Hosted Embedding Model
        • OpenAI Embedding Model
      • Connecting Your Data
        • Finding Your Data Storage: Collections
      • Unstructured Data Ingestion
        • Ingesting Unstructured Data
        • Unstructured File Ingestion
        • Html/Sharepoint Ingestion
      • Create Vector Embeddings
        • How to Build the Vector embeddings from Scratch
        • How do Modify Existing Chunking/Embedding Dataflows
      • Response History
      • Creating RAG Experiments with Dataworkz
      • Advanced RAG - Access Control for your data corpus
    • AI Agents
      • Concepts
      • Tools
        • Dataset
        • AI App
        • Rest API
        • LLM Tool
        • Relational DB
        • MongoDB
        • Snowflake
      • Agent Builder
      • Agents
      • Guidelines
    • Data Studio
      • Transformation Functions
        • Column Transformations
          • String Operations
            • Format Operations
            • String Calculation Operations
            • Remove Stop Words Operation
            • Fuzzy Match Operation
            • Masking Operations
            • 1-way Hash Operation
            • Copy Operation
            • Unnest Operation
            • Convert Operation
            • Vlookup Operation
          • Numeric Operations
            • Tiles Operation
            • Numeric Calculation Operations
            • Custom Calculation Operation
            • Numeric Encode Operation
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Convert Operation
            • VLookup Operation
          • Boolean Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
          • Date Operations
            • Date Format Operations
            • Date Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Convert Operation
          • Datetime/Timestamp Operations
            • Datetime Format Operations
            • Datetime Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Page 1
        • Dataset Transformations
          • Utility Functions
            • Area Under the Curve
            • Page Rank Utility Function
            • Transpose Utility Function
            • Semantic Search Template Utility Function
            • New Header Utility Function
            • Transform to JSON Utility Function
            • Text Utility Function
            • UI Utility Function
          • Window Functions
          • Case Statement
            • Editor Query
            • UI Query
          • Filter
            • Editor Query
            • UI Query
      • Data Prep
        • Joins
          • Configuring a Join
        • Union
          • Configuring a Union
      • Working with CSV files
      • Job Monitoring
    • Utility Features
      • IP safelist
      • Connect to data source(s)
        • Cloud Data Platforms
          • AWS S3
          • BigQuery
          • Google Cloud Storage
          • Azure
          • Snowflake
          • Redshift
          • Databricks
        • Databases
          • MySQL
          • Microsoft SQL Server
          • Oracle
          • MariaDB
          • Postgres
          • DB2
          • MongoDB
          • Couchbase
          • Aerospike
          • Pinecone
        • SaaS Applications
          • Google Ads
          • Google Analytics
          • Marketo
          • Zoom
          • JIRA
          • Salesforce
          • Zendesk
          • Hubspot
          • Outreach
          • Fullstory
          • Pendo
          • Box
          • Google Sheets
          • Slack
          • OneDrive / Sharepoint
          • ServiceNow
          • Stripe
      • Authentication
      • User Management
    • How To
      • Data Lake to Salesforce
      • Embed RAG into your App
  • API
    • Generate API Key in Dataworkz
    • RAG Apps API
    • Agents API
  • Open Source License Types
Powered by GitBook
On this page
  • Step 1 : Configure a S3 Connector
  • Step 2 : Configure a Google Sheet Connector
  • Step 3 : Applying Transformation on the spreadsheet data
  1. Getting Started

Data Transformation Quickstart

PreviousCustom Settings: RAG QuickstartNextCreate an Agent: Quickstart

Last updated 1 month ago

Dataworkz allows user to transform data from any source (DB, Cloud Storage, SaaS Applications etc.) and write the transformed data to a datasource of their choice (DB, Warehouse, Cloud Storage etc.).

This guide provides an example with the required steps to demonstrate the transformation capability of Dataworkz platform. It shows how effortlessly data from any data source (Google Spreadsheet in this case) can be transformed and written to any datasource (Snowflake in this case).

Step 1 : Configure a S3 Connector

For every Dataworkz account there is a default connector that is preconfigured for a dedicated S3 bucket for that account. You can choose to configure connector to another bucket of your choice by referring to Connector.

Step 2 : Configure a Google Sheet Connector

  1. Goto Configuration -> SaaS Applications -> Google Sheet

  2. Click the + icon to add a new configuration

  3. Enter name for the configuration in the above screen

  4. Select the authentication option for your account (OAuth / Service Account). Let us use OAuth for this guide.

  5. This requires using Google OAuth to authorize Dataworkz to access Google Sheets

  6. Select the workspace

  7. Provide URL of the google sheet

  8. From the dropdown select the tab from the list of tabs in the spreadsheet

  9. Click Save

Newly created connector would show up in the list of Google Sheet configurations

Step 3 : Applying Transformation on the spreadsheet data

  1. Click the Data Studio -> Dataset link from the top menu

  2. Click the workspace selected in the configuration (default in this case) and drill down to lowest leaf node gsheet -> Dataset v0 in this example. This would display the data pulled from the google sheet

  1. Click the transform button at top right of the screen

  2. Click the burger menu at top of any column (say of String type) and select any operation that you intend to perform (say converting a String to Uppercase)

  1. After applying the transformation function you’ll see the modified values for the column

  1. Transformed data can now be written to a different Dataset. For this purpose, click the “Execute Job” button at the top right of the screen. This would pop-up the screen below.

  1. Select the columns from the dropdown that need to be written to the destination/target Dataset. Click “Next”

  2. Select the target workspace in the next screen (say default in this case) and the collection that should contain the target dataset (say s3_<account-name> that would be available by default)

  3. Choose the Dataset or create a new one (select the root S3 folder and right click and select NewDataset). Enter name for the new dataset. Click “Next”

  4. Select the Job Frequency (default “One Time” should be fine for this use case). Click “Submit”

  5. This would submit the transformation job to the queue.

  6. After a wait of 5-10 minutes (depending upon the size of data being processed) you should be able to see the transformed data written to the target dataset.

  7. To view the target dataset click “Dataset” from the top menu and navigate to target workspace (default in this case) and then further down to the target dataset (under s3_<account-name> in this case)

AWS S3