Create Knowledge Graph
This document is step by step guide to build a Knowledge Graph
Prerequisites
GraphDB Connector should have been configured.
VectorDB (mongoDB/pg-vector/Pinecone/OpenSearch) should have been configured.
SearchEngine Connector should have been configured.
Embedding Model should have been configured
Upon login to Dataworkz Platform click "Knowledge Graph" from the top menu.

Click the Add button and select "Create Knowledge Graph". Follow the three steps listed below to build a Knowledge Graph in Dataworkz.
Step-1 — Basic Configuration

Goal: tell Dataworkz what data to read and extract entities
Specify a name for the knowledge graph
Select the graph database instace to which Knowledge Graph should be written to
Select from the list of preconfigured graph templates. A template comprises prompts selected from the prompt library
Select the LLM that should used for the Entity Extraction
Select the source dataset
Select the target to which extracted data should be written
select the workspace
select the collection
either select an existing directory or create a new one
Click Next
Step-2 — Advanced Options
Goal: configure where embeddings and lexical indexes are stored and which models are used.

Vector Database & Embeddings (required fields)
Workspace — select the workspace where vectors will live.
Collection — name the collection that will store your embeddings.
Directory — logical folder or namespace inside the collection. Either use an existing one or create a new one
Embedding model — choose the embedding model to compute vector representations.
Recommendation: pick a model tuned for your domain or a general semantic model (higher-dimension models capture more nuance but cost more).
Full Text Search Index
Search Engine — pick the search engine for lexical (keyword) search.
Lexical Search Storage — select which index or storage to use for keyword lookups.
Why configure both: semantic search (vector) + lexical search (keyword) gives best results — vectors handle meaning; lexical handles exact matches, filters, and facets.
Step-3 — Schedule the job
Goal: schedule the job graph creation job

Task Scheduling
Recurring job - turn it on and choose frequency at which the job should run
Advance settings - choose appropriate "Degree of Parallelism" and "No. of cores" for spark to parallelize processing of the data
Task Summary
Review the details
Click " Create Task" if everything looks okay
Monitor the Knowledge Graph Creation

The submitted job would show in the list of the Knowledge Graph (existing and the new one)
What to expect in the resulting Knowledge Graph
Nodes: extracted entities (people, orgs, products, documents, events, custom types).
Edges: inferred relationships (authored_by, located_in, uses, depends_on, mentions).
Metadata: source, date, confidence score, provenance (which document/chunk created the entity).
Capabilities unlocked: semantic search, multi-hop reasoning, graph queries, dashboards.
Last updated

