Product Docs
  • What is Dataworkz?
  • Getting Started
    • What You Will Need (Prerequisites)
    • Create with Default Settings: RAG Quickstart
    • Custom Settings: RAG Quickstart
    • Data Transformation Quickstart
    • Create an Agent: Quickstart
  • Concepts
    • RAG Applications
      • Overview
      • Ingestion
      • Embedding Models
      • Vectorization
      • Retrieve
    • AI Agents
      • Introduction
      • Overview
      • Tools
        • Implementation
      • Type
      • Tools Repository
      • Tool Execution Framework
      • Agents
      • Scenarios
      • Agent Builder
    • Data Studio
      • No-code Transformations
      • Datasets
      • Dataflows
        • Single Dataflows:
        • Composite dataflows:
        • Benefits of Dataflows:
      • Discovery
        • How to: Discovery
      • Lineage
        • Features of Lineage:
        • Viewing a dataset's lineage:
      • Catalog
      • Monitoring
      • Statistics
  • Guides
    • RAG Applications
      • Configure LLM's
        • AWS Bedrock
      • Embedding Models
        • Privately Hosted Embedding Models
        • Amazon Bedrock Hosted Embedding Model
        • OpenAI Embedding Model
      • Connecting Your Data
        • Finding Your Data Storage: Collections
      • Unstructured Data Ingestion
        • Ingesting Unstructured Data
        • Unstructured File Ingestion
        • Html/Sharepoint Ingestion
      • Create Vector Embeddings
        • How to Build the Vector embeddings from Scratch
        • How do Modify Existing Chunking/Embedding Dataflows
      • Response History
      • Creating RAG Experiments with Dataworkz
      • Advanced RAG - Access Control for your data corpus
    • AI Agents
      • Concepts
      • Tools
        • Dataset
        • AI App
        • Rest API
        • LLM Tool
        • Relational DB
        • MongoDB
        • Snowflake
      • Agent Builder
      • Agents
      • Guidelines
    • Data Studio
      • Transformation Functions
        • Column Transformations
          • String Operations
            • Format Operations
            • String Calculation Operations
            • Remove Stop Words Operation
            • Fuzzy Match Operation
            • Masking Operations
            • 1-way Hash Operation
            • Copy Operation
            • Unnest Operation
            • Convert Operation
            • Vlookup Operation
          • Numeric Operations
            • Tiles Operation
            • Numeric Calculation Operations
            • Custom Calculation Operation
            • Numeric Encode Operation
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Convert Operation
            • VLookup Operation
          • Boolean Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
          • Date Operations
            • Date Format Operations
            • Date Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Convert Operation
          • Datetime/Timestamp Operations
            • Datetime Format Operations
            • Datetime Calculation Operations
            • Mask Operation
            • 1-way Hash Operation
            • Copy Operation
            • Encode Operation
            • Page 1
        • Dataset Transformations
          • Utility Functions
            • Area Under the Curve
            • Page Rank Utility Function
            • Transpose Utility Function
            • Semantic Search Template Utility Function
            • New Header Utility Function
            • Transform to JSON Utility Function
            • Text Utility Function
            • UI Utility Function
          • Window Functions
          • Case Statement
            • Editor Query
            • UI Query
          • Filter
            • Editor Query
            • UI Query
      • Data Prep
        • Joins
          • Configuring a Join
        • Union
          • Configuring a Union
      • Working with CSV files
      • Job Monitoring
    • Utility Features
      • IP safelist
      • Connect to data source(s)
        • Cloud Data Platforms
          • AWS S3
          • BigQuery
          • Google Cloud Storage
          • Azure
          • Snowflake
          • Redshift
          • Databricks
        • Databases
          • MySQL
          • Microsoft SQL Server
          • Oracle
          • MariaDB
          • Postgres
          • DB2
          • MongoDB
          • Couchbase
          • Aerospike
          • Pinecone
        • SaaS Applications
          • Google Ads
          • Google Analytics
          • Marketo
          • Zoom
          • JIRA
          • Salesforce
          • Zendesk
          • Hubspot
          • Outreach
          • Fullstory
          • Pendo
          • Box
          • Google Sheets
          • Slack
          • OneDrive / Sharepoint
          • ServiceNow
          • Stripe
      • Authentication
      • User Management
    • How To
      • Data Lake to Salesforce
      • Embed RAG into your App
  • API
    • Generate API Key in Dataworkz
    • RAG Apps API
    • Agents API
  • Open Source License Types
Powered by GitBook
On this page
  1. Getting Started

What You Will Need (Prerequisites)

This section gives a brief overview of what a new user to Dataworkz will need to bring to create AI Applications.

PreviousGetting StartedNextCreate with Default Settings: RAG Quickstart

Last updated 6 months ago

To get started with Dataworkz, you will need a few essential components and configurations. Here’s a breakdown of the requirements to ensure a smooth setup:

1. LLM API (Large Language Model API)

To begin processing data and leveraging Dataworkz’s capabilities, you'll need access to an LLM API. This could be from any of the following providers:

  • OpenAI

  • Amazon Bedrock

  • Azure OpenAI Service Or another compatible LLM of your choice.

Configuring the LLM will be integral to interacting with your data and utilizing the AI-powered features within Dataworkz, we do not provide any pre-configured language models.

2. Storage Setup

Dataworkz provides several storage options to handle your data. You’ll have access to the following by default:

  • S3 Storage: Dataworkz will offer S3-compatible storage for ingesting PDF files. Keep in mind that the file size for uploads from your local machine is capped at 1MB. If you plan to upload larger files, please ensure they meet this limitation or consider setting up your own external storage for large file ingestion.

  • Vector Storage (MongoDB): The free tier includes vector storage powered by MongoDB, which is used to store text chunks, embeddings, and other relevant metadata. This will help you store and query large amounts of data efficiently.

3. Free Tier Limitations

On the free tier, there are a couple of restrictions to be aware of:

  • You can crawl up to three HTML pages via the web crawling feature. This allows you to extract data from websites, but please note that this feature is limited to three crawls per month.

4. Important Notes on Storage and Data Privacy

While Dataworkz provides a "Default" workspace for you to store and manage data (including PDFs and vectors), we recommend against uploading proprietary or sensitive files into the default workspace until you’ve set up your own vector storage and object storage solution (such as S3, GCS, or another cloud storage provider). This will help ensure the privacy and security of your files and data.

Once you’ve configured your own storage and vector setup, you can migrate or upload your proprietary files and adjust the workspace settings accordingly.