Build Q&A using RAG

Building a RAG Q&A System with Dataworkz: A Step-by-Step Guide

In today's digital era, customer self-service is becoming increasingly important, and having a robust Q&A system is a key component of delivering a seamless experience. At Dataworkz, we're taking it a step further by creating a Retrieval Augmented Generation (RAG) Q&A system for our customers. This document will guide you through the process of building this innovative system by crawling the MongoDB documents and assembling a RAG based Q&A system on those documents.

Step 1: Pre-processing Data

Access Pre-processing Settings:

  • Click on the gear icon in the top right, navigate to the AI Applications section, and select Pre-processing.

Create a Pre-processing Job:

  • Press the '+' button and name your pre-processing job.

Configuring Sources

  • Website URL(HTML):

    • For URL links, select the file type - html.

    • Add URLs of interest, check the sub-crawling and javascript enabled boxes.

Submit Pre-processing:

  • Once finished, choose the pre-processing target location and hit submit to perform pre-processing.

  • Verify the saved data in the dataset tab.

Step 2: Building the Q&A System

Access Q&A Settings:

  • Click on the gear icon, navigate to AI Applications, and select Q&A.

Create a Q&A System:

  • Click on the '+' button and give the Q&A system a name.

Configure Source:

  • Start by configuring the source using the datasets created during pre-processing.

  • Add question(text) and external link(PDF or URL) columns based on the source files.

Choose Embedding Model:

  • Select the embedding model for your Q&A system.

Configure Vector Storage (Using MongoDB):

  • Name your vector storage and choose cosine similarity metric.

  • Set the threshold, delimiter, chunk size, and overlap.

  • Save the configuration.

Choose Language Models:

  • Select language models (e.g., llama 2, ChatGPT, dolly, etc…).

  • Provide a custom prompt for the system.

Save and Submit:

Save and submit to create the Q&A system. MongoDB will automatically index.

Step 3: Using the Q&A System

  • Testing the System:

    • Ask a question in the chat box, and the system will return an answer along with the location of the data.

  • Reviewing Results:

    • The user can check the source links to see where the information generated has originated from.

Step 4: Testing and Optimization

  • Test Results:

    • Test the system by changing prompts and formatting metadata.

  • Formatting Metadata:

    • Experiment with different ways to make RAG sources more relevant, like versioning for the mongodb documents.

  • Adding Data from Other Sources:

    • Enhance the system by adding data from Google Sheets, Docs, or other internal sources.

By following these steps, you can create a powerful RAG Q&A system that not only answers queries effectively but also ensures accuracy and relevance across various data sources. Dataworkz empowers you to provide a top-notch self-service experience for any user.

Last updated