Catalog
The schema-level reference for a Dataset — surfacing structural details, quality metrics, version history, and access information without opening the data itself.
The Dataset Catalog is the schema-level reference for a dataset. It surfaces the structural, statistical, and governance information needed to understand what a dataset contains, who can access it, and how it has evolved over time — without needing to open the data itself.
Accessing the Catalog
Navigate to Data Studio > Dataset.
Locate the dataset within its Workspace and Collection and open it.
Click Catalog in the Quick Links section at the top right of the dataset view.
The Catalog page opens, showing the dataset name in the header (e.g., Catalog - fin_bench_ingest) alongside key metadata and a set of tabbed sections.

Catalog Header
The top bar of every Catalog page displays a summary of the Dataset's key properties at a glance:

Status
The certification status of the Dataset (e.g., Uncertified).
Created date
The date the Dataset was first created (e.g., 12/08/2025).
Date range
The range of dates covered by the data in the Dataset (e.g., 12/08/2025 – 12/08/2025).
Format
The storage format of the Dataset (e.g., parquet).
Size
The total size of the Dataset on disk (e.g., 1.83 MB).
Version
The current version number of the Dataset.
Dictionary
The number of columns defined in the Dataset's data dictionary (e.g., 23 Columns).
View dataset
A shortcut to open the Dataset in the Table/List view.
Catalog Tabs
The Catalog is organised into five tabs:
Overview
A summary of the Dataset including its description, user comments, usage and view statistics, Collection properties, tags, and similar datasets.
Version
The version history of the Dataset, reflecting schema changes over time.
Access
The Workspace, Collection, and permission settings that control who can view or modify the Dataset.
Headers
The complete column-level schema: name, data type, format, description, PII flag, and key definitions for every column.
Quality
A quality assessment of the Dataset, including completeness and consistency indicators.
Retention Policy
The data retention rules configured for the Dataset.
Overview Tab
The Overview tab provides a human-readable summary of the Dataset, including:
Description — A free-text description of the Dataset's purpose (editable via the pencil icon). Example:
Directory created for pre-processing.User Comments — A rating and comment thread from users who have interacted with the Dataset. Displayed as an average score out of 5.
Usage Stats — A time-series chart showing how frequently the Dataset has been used (executed or queried) over the selected period.
View Stats — A time-series chart showing how many times the Dataset has been viewed over the selected period.
Collection Properties — The underlying storage details:
Collection Name
s3_ashish_dataworkz_account
Path
s3a://dataworkz-genai-dev-lake
Storage Type
S3
Recent updates — The users who created and last modified the Dataset.
Tags — Labels assigned to the Dataset for discovery and categorisation (editable via the pencil icon).
Similar Datasets — Other datasets in the platform with similar schema or content (shown if available).
Headers Tab
The Headers tab is the primary schema reference for the Dataset. It lists every column with its full metadata definition.
Toggle between Header (column definitions) and Sample data (a preview of actual values) using the radio buttons at the top.
Column Definitions
Each row in the headers table describes one column:
Header
The column name as it appears in the Dataset.
Data Type
The data type of the column (e.g., Integer, String, Long).
Data Format
The format of the data within the column, if applicable (e.g., date formats).
Header Description
A human-readable description of the column's content.
PII Data
Whether the column contains Personally Identifiable Information (Y / N).
Unique Identifier
Whether the column serves as a unique identifier for records (Y / N).
Foreign key
Whether the column is a foreign key referencing another Dataset (Y / N).
Reference Table
The name of the referenced table if the column is a foreign key.
Semantics Tag
A semantic label automatically or manually assigned to the column (e.g., None).
Action
Edit (✎) or delete (🗑) the column definition.
Example columns from fin_bench_ingest:
page_no
Integer
N
N
N
source_file_name
String
N
N
N
source_text
String
N
N
N
source_creation_date
Long
N
N
N
total_pages
Integer
N
N
N
version
String
N
N
N
Click Edit in the top-right of the headers table to modify column definitions in bulk.
Version Tab
The Version tab shows the history of schema changes across all versions of the Dataset. Each version reflects a snapshot of the schema at the time it was created, allowing you to trace how columns and data types have evolved.
💡 Note: The Catalog reflects the schema at the current version. To compare schema changes across versions, use the Version tab within the Catalog view.
Access Tab
The Access tab shows the Workspace and Collection the Dataset belongs to, along with the permission settings that control which users or roles can view or modify it. Access is governed by the RBAC configured at the Workspace level.
Quality Tab
The Quality tab provides an automated assessment of the Dataset's data quality, including indicators for completeness (null/missing values) and consistency (data type conformance and pattern adherence) across columns.
Retention Policy Tab
The Retention Policy tab displays any data retention rules configured for the Dataset — for example, how long data is retained before automatic archival or deletion.
Last updated

