Create Knowledge Graph

This document is step by step guide to build a Knowledge Graph

Prerequisites

  1. GraphDB Connector should have been configured.

  2. VectorDB (mongoDB/pg-vector/Pinecone/OpenSearch) should have been configured.

  3. SearchEngine Connector should have been configured.

  4. Embedding Model should have been configured

Upon login to Dataworkz Platform click "Knowledge Graph" from the top menu.

Click the Add button and select "Create Knowledge Graph". Follow the three steps listed below to build a Knowledge Graph in Dataworkz.


Step-1 — Basic Configuration

Goal: tell Dataworkz what data to read and extract entities

  1. Specify a name for the knowledge graph

  2. Select the graph database instace to which Knowledge Graph should be written to

  3. Select from the list of preconfigured graph templates. A template comprises prompts selected from the prompt library

  4. Select the LLM that should used for the Entity Extraction

  5. Select the source dataset

  6. Select the target to which extracted data should be written

    1. select the workspace

    2. select the collection

    3. either select an existing directory or create a new one

  7. Click Next


Step-2 — Advanced Options

Goal: configure where embeddings and lexical indexes are stored and which models are used.

Vector Database & Embeddings (required fields)

  1. Workspace — select the workspace where vectors will live.

  2. Collection — name the collection that will store your embeddings.

  3. Directory — logical folder or namespace inside the collection. Either use an existing one or create a new one

  4. Embedding model — choose the embedding model to compute vector representations.

    • Recommendation: pick a model tuned for your domain or a general semantic model (higher-dimension models capture more nuance but cost more).

Full Text Search Index

  1. Search Engine — pick the search engine for lexical (keyword) search.

  2. Lexical Search Storage — select which index or storage to use for keyword lookups.

Why configure both: semantic search (vector) + lexical search (keyword) gives best results — vectors handle meaning; lexical handles exact matches, filters, and facets.


Step-3 — Schedule the job

Goal: schedule the job graph creation job

  1. Task Scheduling

    • Recurring job - turn it on and choose frequency at which the job should run

    • Advance settings - choose appropriate "Degree of Parallelism" and "No. of cores" for spark to parallelize processing of the data

  2. Task Summary

    • Review the details

    • Click " Create Task" if everything looks okay


Monitor the Knowledge Graph Creation

The submitted job would show in the list of the Knowledge Graph (existing and the new one)


What to expect in the resulting Knowledge Graph

  • Nodes: extracted entities (people, orgs, products, documents, events, custom types).

  • Edges: inferred relationships (authored_by, located_in, uses, depends_on, mentions).

  • Metadata: source, date, confidence score, provenance (which document/chunk created the entity).

  • Capabilities unlocked: semantic search, multi-hop reasoning, graph queries, dashboards.



Last updated