What You Will Need (Prerequisites)
This section gives a brief overview of what a new user to Dataworkz will need to bring to create Data Applications.
To get started with Dataworkz, you will need a few essential components and configurations. Here’s a breakdown of the requirements to ensure a smooth setup:
1. Storage Setup
Dataworkz gives you flexible options for storing and managing data, so you can tailor your setup to your use case. Whether you’re working with documents for RAG, structured enterprise databases, or graph data, you’ll find integrations already available in the platform.
Default Storage
When you first sign up, Dataworkz provides you with a default workspace that comes with two built-in storage options:
S3-compatible object storage: This is where you can upload files such as PDFs, Word docs, or PowerPoint decks. It’s commonly used for RAG ingestion pipelines. Note that direct uploads from your local machine are limited to 1 MB per file. For larger uploads, you’ll want to connect your own external storage.
MongoDB (Vector storage): By default, embeddings, text chunks, and metadata are stored in MongoDB. This acts as your vector database, allowing semantic search across ingested datasets.
These defaults are designed for quick experimentation, but most production projects will connect external enterprise-grade storage.
Supported Storage Options
From the Databases panel in Dataworkz, you can configure a variety of storage systems. These are grouped by category:
Vector Databases: Pinecone, OpenSearch, Weaviate (coming soon).
NoSQL Databases: MongoDB, Pg-vector, Couchbase, Aerospike, Datastax.
Relational Databases: Oracle, Microsoft SQL Server, MySQL, MariaDB, DB2, Postgres, Cockroach
Graph Database: Neo4J
This range allows you to either stick with the defaults or integrate directly with your organization’s data infrastructure.

Data Privacy and Best Practices
While the default workspace is a convenient way to get started, we strongly recommend setting up your own dedicated vector and object storage (such as S3, GCS, or your enterprise DB of choice) before uploading sensitive or proprietary data.
Once external storage is configured, you can:
Migrate existing datasets out of the default workspace.
Adjust workspace and collection settings for stricter access control.
Ensure compliance with internal data governance policies.
This approach provides the best balance of ease-of-use for prototyping and security for production workloads.
Last updated

