docs: hybrid search feature (#7573)

* initial-page-and-some-overview

* steps

* remove-file

* feat: enhance Astra DB hybrid search documentation

* numbering

* clarify hybrid search

* dataframe-link

* Apply suggestions from code review

Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com>
Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
Co-authored-by: Sarah Edwards <skedwards88@gmail.com>

* code-review

* collection-and-string-not-list

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>

---------

Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com>
Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
Co-authored-by: Sarah Edwards <skedwards88@gmail.com>
This commit is contained in:
Mendon Kissling 2025-04-14 15:26:07 -04:00 committed by GitHub
commit 1b6c10a897
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -3,6 +3,8 @@ title: Vector stores
slug: /components-vector-stores
---
import Icon from "@site/src/components/icon";
# Vector store components in Langflow
Vector databases store vector data, which backs AI workloads like chatbots and Retrieval Augmented Generation.
@ -78,6 +80,54 @@ For an example of using the **Astra DB Vector Store** component with an embeddin
For more information, see the [Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).
### Hybrid search
The **Astra DB** component includes **hybrid search**, which is enabled by default.
The component fields related to hybrid search are **Search Query**, **Lexical Terms**, and **Reranker**.
* **Search Query** finds results by vector similarity.
* **Lexical Terms** is a comma-separated string of keywords, like `features, data, attributes, characteristics`.
* **Reranker** is the re-ranker model used in the hybrid search.
The re-ranker model is `nvidia/llama-3.2-nv.reranker`.
[Hybrid search](https://docs.datastax.com/en/astra-db-serverless/databases/hybrid-search.html) performs a vector similarity search and a lexical search, compares the results of both searches, and then returns the most relevant results overall.
To use **Hybrid search** in the **Astra DB** component, do the following:
1. Click **New Flow** > **RAG** > **Hybrid Search RAG**.
2. In the **OpenAI** model component, add your **OpenAI API key**.
3. In the **Astra DB** vector store component, add your **Astra DB Application Token**.
4. In the **Database** field, select your database.
5. In the **Collection** field, select the collection you want to search.
You must enable support for hybrid search when you create the collection.
6. In the **Playground**, enter a question about your data, such as `What are the features of my data?`
Your query is sent to two components: an **OpenAI** model component and the **Astra DB** vector database component.
The **OpenAI** component contains a prompt for creating the lexical query from your input:
```text
You are a database query planner that takes a user's requests, and then converts to a search against the subject matter in question.
You should convert the query into:
1. A list of keywords to use against a Lucene text analyzer index, no more than 4. Strictly unigrams.
2. A question to use as the basis for a QA embedding engine.
Avoid common keywords associated with the user's subject matter.
```
7. To view the keywords and questions the **OpenAI** component generates from your collection, in the **OpenAI** component, click <Icon name="TextSearch" aria-label="Inspect icon" />.
```
1. Keywords: features, data, attributes, characteristics
2. Question: What characteristics can be identified in my data?
```
8. To view the [DataFrame](/concepts-objects#dataframe-object) generated from the **OpenAI** component's response, in the **Structured Output** component, click <Icon name="TextSearch" aria-label="Inspect icon" />.
The DataFrame is passed to a **Parser** component, which parses the contents of the **Keywords** column into a string.
This string of comma-separated words is passed to the **Lexical Terms** port of the **Astra DB** component.
Note that the **Search Query** port of the Astra DB port is connected to the **Chat Input** component from step 6.
This **Search Query** is vectorized, and both the **Search Query** and **Lexical Terms** content are sent to the reranker at the `find_and_rerank` endpoint.
The reranker compares the vector search results against the string of terms from the lexical search.
The highest-ranked results of your hybrid search are returned to the **Playground**.
For more information, see the [DataStax documentation](https://docs.datastax.com/en/astra-db-serverless/databases/hybrid-search.html).
## AstraDB Graph vector store
This component implements a Vector Store using AstraDB with graph capabilities.