Datasets
A Dataset is a structured container for data ingested from an external or internal source, and the primary unit of work in Data Studio.
A Dataset organises data into a structured format that supports transformation, analysis, and downstream publishing. In Dataworkz, Datasets can be created from external sources — such as APIs, relational databases, cloud object stores, and SaaS applications — as well as from internal sources such as user input and other Dataworkz objects.
Once a Dataset is created, you can apply transformations and manipulate its data using more than 50 built-in functions for cleaning, filtering, aggregation, masking, and enrichment. Datasets also support AI-powered transformation via the Transform with AI interface, which lets you describe operations in plain language.
What a Dataset Contains
Each Dataset in Dataworkz exposes the following attributes:
Name and version
The Dataset name and the current version number (e.g., accounts v0).
Certification status
Indicates whether the Dataset has been certified as trusted and production-ready.
Size
Total number of rows in the Dataset.
Headers
The number of columns, including names, data types, and optional PII flags.
Incoming datapipe
The data source or upstream Dataset feeding this Dataset.
Outgoing datapipe
The number of downstream consumers — Dataflows, RAG applications, or AI tools — that depend on this Dataset.
Quick Links
Direct access to Statistics, Monitoring, Catalog, and Lineage views.
Dataset Capabilities
Data source integration — Connect to cloud object stores, relational databases, SaaS applications, and NoSQL databases to create Datasets from live sources.
No-code transformation — Apply column-level and Dataset-level transformations using a visual interface without writing code.
AI-assisted transformation — Use Transform with AI to describe operations in plain language and have Dataworkz apply them automatically.
PII protection — Flag columns containing Personally Identifiable Information and apply masking or hashing transformations before the data is used downstream.
Collaboration — Share Datasets across teams via workspace-level access controls, enabling multiple users to build and analyse data-driven workflows together.
Lineage tracking — Every transformation applied to a Dataset is automatically recorded, providing a full audit trail from source to output.
Last updated

