From a2fbbba9f7f889a1f84f8642cf5c75feb57e8668 Mon Sep 17 00:00:00 2001 From: Ryan Lin <123346659+jinhonglin-ryan@users.noreply.github.com> Date: Fri, 25 Apr 2025 03:12:35 -0400 Subject: [PATCH] feat: add tutorial using Milvus and Docling for RAG pipeline (#1449) * feat: add milvus rag with docling tutorial Signed-off-by: Ryan Lin * chore: run pre-commit Signed-off-by: Ryan Lin * feat: add RAG with Milvus example to mkdocs Signed-off-by: Ryan Lin --------- Signed-off-by: Ryan Lin --- docs/examples/rag_milvus.ipynb | 551 +++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 552 insertions(+) create mode 100644 docs/examples/rag_milvus.ipynb diff --git a/docs/examples/rag_milvus.ipynb b/docs/examples/rag_milvus.ipynb new file mode 100644 index 0000000..6366810 --- /dev/null +++ b/docs/examples/rag_milvus.ipynb @@ -0,0 +1,551 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# RAG with Milvus\n", + "\n", + "| Step | Tech | Execution |\n", + "| --- | --- | --- |\n", + "| Embedding | OpenAI (text-embedding-3-small) | 🌐 Remote |\n", + "| Vector store | Milvus | πŸ’» Local |\n", + "| Gen AI | OpenAI (gpt-4o) | 🌐 Remote |\n", + "\n", + "\n", + "## A recipe πŸ§‘β€πŸ³ πŸ₯ πŸ’š\n", + "\n", + "This is a code recipe that uses [Milvus](https://milvus.io/), the world's most advanced open-source vector database, to perform RAG over documents parsed by [Docling](https://docling-project.github.io/docling/).\n", + "\n", + "In this notebook, we accomplish the following:\n", + "* Parse documents using Docling's document conversion capabilities\n", + "* Perform hierarchical chunking of the documents using Docling\n", + "* Generate text embeddings with OpenAI\n", + "* Perform RAG using Milvus, the world's most advanced open-source vector database\n", + "\n", + "Note: For best results, please use **GPU acceleration** to run this notebook. Here are two options for running this notebook:\n", + "1. **Locally on a MacBook with an Apple Silicon chip.** Converting all documents in the notebook takes ~2 minutes on a MacBook M2 due to Docling's usage of MPS accelerators.\n", + "2. **Run this notebook on Google Colab.** Converting all documents in the notebook takes ~8 mintutes on a Google Colab T4 GPU.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparation\n", + "\n", + "### Dependencies and Environment\n", + "\n", + "To start, install the required dependencies by running the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install --upgrade pymilvus docling openai torch" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### GPU Checking" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Part of what makes Docling so remarkable is the fact that it can run on commodity hardware. This means that this notebook can be run on a local machine with GPU acceleration. If you're using a MacBook with a silicon chip, Docling integrates seamlessly with Metal Performance Shaders (MPS). MPS provides out-of-the-box GPU acceleration for macOS, seamlessly integrating with PyTorch and TensorFlow, offering energy-efficient performance on Apple Silicon, and broad compatibility with all Metal-supported GPUs.\n", + "\n", + "The code below checks to see if a GPU is available, either via CUDA or MPS." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MPS GPU is enabled.\n" + ] + } + ], + "source": [ + "import torch\n", + "\n", + "# Check if GPU or MPS is available\n", + "if torch.cuda.is_available():\n", + " device = torch.device(\"cuda\")\n", + " print(f\"CUDA GPU is enabled: {torch.cuda.get_device_name(0)}\")\n", + "elif torch.backends.mps.is_available():\n", + " device = torch.device(\"mps\")\n", + " print(\"MPS GPU is enabled.\")\n", + "else:\n", + " raise OSError(\n", + " \"No GPU or MPS device found. Please check your environment and ensure GPU or MPS support is configured.\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setting Up API Keys\n", + "\n", + "We will use OpenAI as the LLM in this example. You should prepare the [OPENAI_API_KEY](https://platform.openai.com/docs/quickstart) as an environment variable." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare the LLM and Embedding Model\n", + "\n", + "We initialize the OpenAI client to prepare the embedding model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "\n", + "openai_client = OpenAI()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Define a function to generate text embeddings using OpenAI client. We use the [text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) model as an example." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def emb_text(text):\n", + " return (\n", + " openai_client.embeddings.create(input=text, model=\"text-embedding-3-small\")\n", + " .data[0]\n", + " .embedding\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Generate a test embedding and print its dimension and first few elements." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1536\n", + "[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]\n" + ] + } + ], + "source": [ + "test_embedding = emb_text(\"This is a test\")\n", + "embedding_dim = len(test_embedding)\n", + "print(embedding_dim)\n", + "print(test_embedding[:10])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Process Data Using Docling\n", + "\n", + "Docling can parse various document formats into a unified representation (Docling Document), which can then be exported to different output formats. For a full list of supported input and output formats, please refer to [the official documentation](https://docling-project.github.io/docling/usage/supported_formats/).\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we will use a Markdown file ([source](https://milvus.io/docs/overview.md)) as the input. We will process the document using a **HierarchicalChunker** provided by Docling to generate structured, hierarchical chunks suitable for downstream RAG tasks." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from docling_core.transforms.chunker import HierarchicalChunker\n", + "\n", + "from docling.document_converter import DocumentConverter\n", + "\n", + "converter = DocumentConverter()\n", + "chunker = HierarchicalChunker()\n", + "\n", + "# Convert the input file to Docling Document\n", + "source = \"https://milvus.io/docs/overview.md\"\n", + "doc = converter.convert(source).document\n", + "\n", + "# Perform hierarchical chunking\n", + "texts = [chunk.text for chunk in chunker.chunk(doc)]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Data into Milvus\n", + "\n", + "### Create the collection\n", + "\n", + "With data in hand, we can create a `MilvusClient` instance and insert the data into a Milvus collection. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from pymilvus import MilvusClient\n", + "\n", + "milvus_client = MilvusClient(uri=\"./milvus_demo.db\")\n", + "collection_name = \"my_rag_collection\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> As for the argument of `MilvusClient`:\n", + "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n", + "> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.\n", + "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Check if the collection already exists and drop it if it does." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "if milvus_client.has_collection(collection_name):\n", + " milvus_client.drop_collection(collection_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a new collection with specified parameters.\n", + "\n", + "If we don’t specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "milvus_client.create_collection(\n", + " collection_name=collection_name,\n", + " dimension=embedding_dim,\n", + " metric_type=\"IP\", # Inner product distance\n", + " consistency_level=\"Strong\", # Supported values are (`\"Strong\"`, `\"Session\"`, `\"Bounded\"`, `\"Eventually\"`). See https://milvus.io/docs/consistency.md#Consistency-Level for more details.\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Insert data" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing chunks: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 38/38 [00:14<00:00, 2.59it/s]\n" + ] + }, + { + "data": { + "text/plain": [ + "{'insert_count': 38, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37], 'cost': 0}" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "data = []\n", + "\n", + "for i, chunk in enumerate(tqdm(texts, desc=\"Processing chunks\")):\n", + " embedding = emb_text(chunk)\n", + " data.append({\"id\": i, \"vector\": embedding, \"text\": chunk})\n", + "\n", + "milvus_client.insert(collection_name=collection_name, data=data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build RAG" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve data for a query\n", + "\n", + "Let’s specify a query question about the website we just scraped." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "question = (\n", + " \"What are the three deployment modes of Milvus, and what are their differences?\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Search for the question in the collection and retrieve the semantic top-3 matches." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "search_res = milvus_client.search(\n", + " collection_name=collection_name,\n", + " data=[emb_text(question)],\n", + " limit=3,\n", + " search_params={\"metric_type\": \"IP\", \"params\": {}},\n", + " output_fields=[\"text\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let’s take a look at the search results of the query\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " [\n", + " \"Milvus offers three deployment modes, covering a wide range of data scales\\u2014from local prototyping in Jupyter Notebooks to massive Kubernetes clusters managing tens of billions of vectors:\",\n", + " 0.6503315567970276\n", + " ],\n", + " [\n", + " \"Milvus Lite is a Python library that can be easily integrated into your applications. As a lightweight version of Milvus, it\\u2019s ideal for quick prototyping in Jupyter Notebooks or running on edge devices with limited resources. Learn more.\\nMilvus Standalone is a single-machine server deployment, with all components bundled into a single Docker image for convenient deployment. Learn more.\\nMilvus Distributed can be deployed on Kubernetes clusters, featuring a cloud-native architecture designed for billion-scale or even larger scenarios. This architecture ensures redundancy in critical components. Learn more.\",\n", + " 0.6281915903091431\n", + " ],\n", + " [\n", + " \"What is Milvus?\\nUnstructured Data, Embeddings, and Milvus\\nWhat Makes Milvus so Fast\\uff1f\\nWhat Makes Milvus so Scalable\\nTypes of Searches Supported by Milvus\\nComprehensive Feature Set\",\n", + " 0.6117826700210571\n", + " ]\n", + "]\n" + ] + } + ], + "source": [ + "import json\n", + "\n", + "retrieved_lines_with_distances = [\n", + " (res[\"entity\"][\"text\"], res[\"distance\"]) for res in search_res[0]\n", + "]\n", + "print(json.dumps(retrieved_lines_with_distances, indent=4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Use LLM to get a RAG response\n", + "\n", + "Convert the retrieved documents into a string format.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "context = \"\\n\".join(\n", + " [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "SYSTEM_PROMPT = \"\"\"\n", + "Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.\n", + "\"\"\"\n", + "USER_PROMPT = f\"\"\"\n", + "Use the following pieces of information enclosed in tags to provide an answer to the question enclosed in tags.\n", + "\n", + "{context}\n", + "\n", + "\n", + "{question}\n", + "\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use OpenAI ChatGPT to generate a response based on the prompts." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The three deployment modes of Milvus are:\n", + "\n", + "1. **Milvus Lite**: This is a Python library that integrates easily into your applications. It's a lightweight version ideal for quick prototyping in Jupyter Notebooks or for running on edge devices with limited resources.\n", + "\n", + "2. **Milvus Standalone**: This mode is a single-machine server deployment where all components are bundled into a single Docker image, making it convenient to deploy.\n", + "\n", + "3. **Milvus Distributed**: This mode is designed for deployment on Kubernetes clusters. It features a cloud-native architecture suited for managing scenarios at a billion-scale or larger, ensuring redundancy in critical components.\n" + ] + } + ], + "source": [ + "response = openai_client.chat.completions.create(\n", + " model=\"gpt-4o\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n", + " {\"role\": \"user\", \"content\": USER_PROMPT},\n", + " ],\n", + ")\n", + "print(response.choices[0].message.content)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/mkdocs.yml b/mkdocs.yml index dd842d6..cff7b4c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -101,6 +101,7 @@ nav: - "Figure enrichment": examples/develop_picture_enrichment.py - "Formula enrichment": examples/develop_formula_understanding.py - πŸ—‚οΈ More examples: + - examples/rag_milvus.ipynb - examples/rag_weaviate.ipynb - RAG with Granite [β†—]: https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/RAG/Granite_Docling_RAG.ipynb - examples/rag_azuresearch.ipynb