Product Docs
  • What is Dataworkz?
  • Getting Started
    • What You Will Need (Prerequisites)
    • Create with Default Settings: RAG Quickstart
    • Custom Settings: RAG Quickstart
    • Data Transformation Quickstart
    • Create an Agent: Quickstart
  • Concepts
    • RAG Applications
      • Overview
      • Ingestion
      • Embedding Models
      • Vectorization
      • Retrieve
    • AI Agents
      • Introduction
      • Overview
      • Tools
        • Implementation
      • Type
      • Tools Repository
      • Tool Execution Framework
      • Agents
      • Scenarios
      • Agent Builder
    • Data Studio
      • No-code Transformations
      • Datasets
      • Dataflows
        • Single Dataflows:
        • Composite dataflows:
        • Benefits of Dataflows:
      • Discovery
        • How to: Discovery
      • Lineage
        • Features of Lineage:
        • Viewing a dataset's lineage:
      • Catalog
      • Monitoring
      • Statistics
  • Guides
    • RAG Applications
      • Configure LLM's
        • AWS Bedrock
      • Embedding Models
        • Privately Hosted Embedding Models
        • Amazon Bedrock Hosted Embedding Model
        • OpenAI Embedding Model
      • Connecting Your Data
        • Finding Your Data Storage: Collections
      • Unstructured Data Ingestion
        • Ingesting Unstructured Data
        • Unstructured File Ingestion
        • Html/Sharepoint Ingestion
      • Create Vector Embeddings
        • How to Build the Vector embeddings from Scratch
        • How do Modify Existing Chunking/Embedding Dataflows
      • Response History
      • Creating RAG Experiments with Dataworkz
      • Advanced RAG - Access Control for your data corpus
    • AI Agents
      • Concepts
      • Tools
        • Dataset
        • AI App
        • Rest API
        • LLM Tool
        • Relational DB
        • MongoDB
        • Snowflake
      • Agent Builder
      • Agents
      • Guidelines
    • Data Studio
      • Transformation Functions
        • Column Transformations
          • String Operations
            • Format Operations
            • String Calculation Operations
            • Remove Stop Words Operation
            • Fuzzy Match Operation
            • Masking Operations
            • 1-way Hash Operation
            • Copy Operation
            • Unnest Operation
            • Convert Operation
            • Vlookup Operation
          • Numeric Operations
            • Tiles Operation
            • Numeric Calculation Operations
            • Custom Calculation Operation
            • Numeric Encode Operation
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Convert Operation
            • VLookup Operation
          • Boolean Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
          • Date Operations
            • Date Format Operations
            • Date Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Convert Operation
          • Datetime/Timestamp Operations
            • Datetime Format Operations
            • Datetime Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Page 1
        • Dataset Transformations
          • Utility Functions
            • Area Under the Curve
            • Page Rank Utility Function
            • Transpose Utility Function
            • Semantic Search Template Utility Function
            • New Header Utility Function
            • Transform to JSON Utility Function
            • Text Utility Function
            • UI Utility Function
          • Window Functions
          • Case Statement
            • Editor Query
            • UI Query
          • Filter
            • Editor Query
            • UI Query
      • Data Prep
        • Joins
          • Configuring a Join
        • Union
          • Configuring a Union
      • Working with CSV files
      • Job Monitoring
    • Utility Features
      • IP safelist
      • Connect to data source(s)
        • Cloud Data Platforms
          • AWS S3
          • BigQuery
          • Google Cloud Storage
          • Azure
          • Snowflake
          • Redshift
          • Databricks
        • Databases
          • MySQL
          • Microsoft SQL Server
          • Oracle
          • MariaDB
          • Postgres
          • DB2
          • MongoDB
          • Couchbase
          • Aerospike
          • Pinecone
        • SaaS Applications
          • Google Ads
          • Google Analytics
          • Marketo
          • Zoom
          • JIRA
          • Salesforce
          • Zendesk
          • Hubspot
          • Outreach
          • Fullstory
          • Pendo
          • Box
          • Google Sheets
          • Slack
          • OneDrive / Sharepoint
          • ServiceNow
          • Stripe
      • Authentication
      • User Management
    • How To
      • Data Lake to Salesforce
      • Embed RAG into your App
  • API
    • Generate API Key in Dataworkz
    • RAG Apps API
    • Agents API
  • Open Source License Types
Powered by GitBook
On this page
  • Pre-requisite:
  • Set up:
  • When to Use:
  1. Guides
  2. AI Agents
  3. Tools

Dataset

PreviousToolsNextAI App

Last updated 1 month ago

Dataworkz Datasets provide a common abstraction over a number of data sources. These include Relational Databases such as MySQL or Postgres, Cloud Databases such as MongoDB or Snowflake, and a variety of other systems. Dataworkz imports the schema and metadata of the table or collection of data from the connected data source. Dataworkz Agents can leverage this data via the Dataset Tool.

Pre-requisite:

Follow the Discovery process to discover relevant tables/collections into your Workspace

Set up:

To use a Dataset Tool you have to -

  • Select AI Agents > Create a Tool > Dataset to create the Dataset Tool

  • Name: Provide a name for the tool

  • Description: Provide a description of what the tool achieves and should be used for

  • Dataset: Select the Dataset from the Dataset Explorer in the tool

  • Filter Criteria: Provide your filter criteria depending on what you want your tool to achieve. E.g. if your tool is meant to return a list of orders of a customer, then you would want to filter the data by customerId.

    • The dialect used is a simple SQL dialect. The filter criteria is the where part of a SQL SELECT query.

    • Parameters to the filter query can be referenced by using the format ${parameter_name}

      • .e.g customer_id = ‘${customer_id}’

  • Input Parameters: Any filter parameters will automatically show up in the Input Parameters section

    • You should adjust the type and the description of the parameters. The description should include any additional information such as a list of values if the parameter can only take a few fixed values or the format of the data

  • Output Parameters: Here’s where you select the projections - columns/headers/fields that you want this tool to output

    • Providing a description helps the Agent chain tools correctly when this tool is used as a part of a larger plan

  • Pro Tip: If you are using the same Dataset for multiple tools, you should edit the Header descriptions in the Catalog of the Dataset and then you will be able to reuse the descriptions in all tools

When to Use:

When you need to access data such as operational databases, lookups, etc. The Dataset tool currently works for Relational Databases, Snowflake and MongoDB. The Dataset tool is limited to simple filtered views of the data but this usually works well with Agents. You can create multiple Dataset tools and the Agent is capable of performing joins by chaining the parameters over multiple tool calls across Datasets in one scenario.