What You Will Need
This section gives a brief overview of what a new user to Dataworkz will need to bring to create AI Applications.
Last updated
This section gives a brief overview of what a new user to Dataworkz will need to bring to create AI Applications.
Last updated
To get started with Dataworkz, you will need a few essential components and configurations. Here’s a breakdown of the requirements to ensure a smooth setup:
1. LLM API (Large Language Model API)
To begin processing data and leveraging Dataworkz’s capabilities, you'll need access to an LLM API. This could be from any of the following providers:
OpenAI
Amazon Bedrock
Azure OpenAI Service Or another compatible LLM of your choice.
Configuring the LLM will be integral to interacting with your data and utilizing the AI-powered features within Dataworkz, we do not provide any pre-configured language models.
2. Storage Setup
Dataworkz provides several storage options to handle your data. You’ll have access to the following by default:
S3 Storage: Dataworkz will offer S3-compatible storage for ingesting PDF files. Keep in mind that the file size for uploads from your local machine is capped at 1MB. If you plan to upload larger files, please ensure they meet this limitation or consider setting up your own external storage for large file ingestion.
Vector Storage (MongoDB): The free tier includes vector storage powered by MongoDB, which is used to store text chunks, embeddings, and other relevant metadata. This will help you store and query large amounts of data efficiently.
3. Free Tier Limitations
On the free tier, there are a couple of restrictions to be aware of:
You can crawl up to three HTML pages via the web crawling feature. This allows you to extract data from websites, but please note that this feature is limited to three crawls per month.
4. Important Notes on Storage and Data Privacy
While Dataworkz provides a "Default" workspace for you to store and manage data (including PDFs and vectors), we recommend against uploading proprietary or sensitive files into the default workspace until you’ve set up your own vector storage and object storage solution (such as S3, GCS, or another cloud storage provider). This will help ensure the privacy and security of your files and data.
Once you’ve configured your own storage and vector setup, you can migrate or upload your proprietary files and adjust the workspace settings accordingly.