From 1b6c10a897a0c2379fa4c3fcb14e7fce486fe8de Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Mon, 14 Apr 2025 15:26:07 -0400 Subject: [PATCH] docs: hybrid search feature (#7573) * initial-page-and-some-overview * steps * remove-file * feat: enhance Astra DB hybrid search documentation * numbering * clarify hybrid search * dataframe-link * Apply suggestions from code review Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com> Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> Co-authored-by: Sarah Edwards * code-review * collection-and-string-not-list * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> --------- Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com> Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> Co-authored-by: Sarah Edwards --- .../Components/components-vector-stores.md | 50 +++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index 2d9a7c8dc..055a652d9 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -3,6 +3,8 @@ title: Vector stores slug: /components-vector-stores --- +import Icon from "@site/src/components/icon"; + # Vector store components in Langflow Vector databases store vector data, which backs AI workloads like chatbots and Retrieval Augmented Generation. @@ -78,6 +80,54 @@ For an example of using the **Astra DB Vector Store** component with an embeddin For more information, see the [Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). +### Hybrid search + +The **Astra DB** component includes **hybrid search**, which is enabled by default. + +The component fields related to hybrid search are **Search Query**, **Lexical Terms**, and **Reranker**. + +* **Search Query** finds results by vector similarity. +* **Lexical Terms** is a comma-separated string of keywords, like `features, data, attributes, characteristics`. +* **Reranker** is the re-ranker model used in the hybrid search. +The re-ranker model is `nvidia/llama-3.2-nv.reranker`. + +[Hybrid search](https://docs.datastax.com/en/astra-db-serverless/databases/hybrid-search.html) performs a vector similarity search and a lexical search, compares the results of both searches, and then returns the most relevant results overall. + +To use **Hybrid search** in the **Astra DB** component, do the following: + +1. Click **New Flow** > **RAG** > **Hybrid Search RAG**. +2. In the **OpenAI** model component, add your **OpenAI API key**. +3. In the **Astra DB** vector store component, add your **Astra DB Application Token**. +4. In the **Database** field, select your database. +5. In the **Collection** field, select the collection you want to search. +You must enable support for hybrid search when you create the collection. +6. In the **Playground**, enter a question about your data, such as `What are the features of my data?` +Your query is sent to two components: an **OpenAI** model component and the **Astra DB** vector database component. +The **OpenAI** component contains a prompt for creating the lexical query from your input: +```text +You are a database query planner that takes a user's requests, and then converts to a search against the subject matter in question. +You should convert the query into: +1. A list of keywords to use against a Lucene text analyzer index, no more than 4. Strictly unigrams. +2. A question to use as the basis for a QA embedding engine. +Avoid common keywords associated with the user's subject matter. +``` +7. To view the keywords and questions the **OpenAI** component generates from your collection, in the **OpenAI** component, click . +``` +1. Keywords: features, data, attributes, characteristics +2. Question: What characteristics can be identified in my data? +``` +8. To view the [DataFrame](/concepts-objects#dataframe-object) generated from the **OpenAI** component's response, in the **Structured Output** component, click . +The DataFrame is passed to a **Parser** component, which parses the contents of the **Keywords** column into a string. + + This string of comma-separated words is passed to the **Lexical Terms** port of the **Astra DB** component. + Note that the **Search Query** port of the Astra DB port is connected to the **Chat Input** component from step 6. + This **Search Query** is vectorized, and both the **Search Query** and **Lexical Terms** content are sent to the reranker at the `find_and_rerank` endpoint. + + The reranker compares the vector search results against the string of terms from the lexical search. + The highest-ranked results of your hybrid search are returned to the **Playground**. + +For more information, see the [DataStax documentation](https://docs.datastax.com/en/astra-db-serverless/databases/hybrid-search.html). + ## AstraDB Graph vector store This component implements a Vector Store using AstraDB with graph capabilities.