diff --git a/docs/docs/Components/components-embedding-models.md b/docs/docs/Components/components-embedding-models.md index 5dfa36d18..abc270871 100644 --- a/docs/docs/Components/components-embedding-models.md +++ b/docs/docs/Components/components-embedding-models.md @@ -239,6 +239,51 @@ There are two embeddings models in this flow that you can replace with **Hugging 4. In the **Hugging Face** components, set the **Inference Endpoint** field to the URL of your local inference model. **The **API Key** field is not required for local inference.** 5. Run the flow. The local inference models generate embeddings for the input text. +## IBM watsonx embeddings + +This component generates text using [IBM watsonx.ai](https://www.ibm.com/watsonx) foundation models. + +To use **IBM watsonx.ai** embeddings components, replace an embeddings component with the IBM watsonx.ai component in a flow. + +An example document processing flow looks like the following: + +![IBM watsonx embeddings model loading a chroma-db with split text](/img/component-watsonx-embeddings-chroma.png) + +This flow loads a PDF file from local storage and splits the text into chunks. + +The **IBM watsonx** embeddings component converts the text chunks into embeddings, which are then stored in a Chroma DB vector store. + +The values for **API endpoint**, **Project ID**, **API key**, and **Model Name** are found in your IBM watsonx.ai deployment. +For more information, see the [Langchain documentation](https://python.langchain.com/docs/integrations/text_embedding/ibm_watsonx/). + +### Default models + +The component supports several default models with the following vector dimensions: + +- `sentence-transformers/all-minilm-l12-v2`: 384-dimensional embeddings +- `ibm/slate-125m-english-rtrvr-v2`: 768-dimensional embeddings +- `ibm/slate-30m-english-rtrvr-v2`: 768-dimensional embeddings +- `intfloat/multilingual-e5-large`: 1024-dimensional embeddings + +The component automatically fetches and updates the list of available models from your watsonx.ai instance when you provide your API endpoint and credentials. + +### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| url | watsonx API Endpoint | The base URL of the API.| +| project_id | watsonx project id | The project ID for your watsonx.ai instance. | +| api_key | API Key | The API Key to use for the model.| +| model_name | Model Name | The name of the embedding model to use.| +| truncate_input_tokens | Truncate Input Tokens | The maximum number of tokens to process. Default: `200`. | +| input_text | Include the original text in the output | Determines if the original text is included in the output. Default: `True`. | + +### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| embeddings | Embeddings | An instance for generating embeddings using watsonx.ai | + ## LM Studio Embeddings This component generates embeddings using [LM Studio](https://lmstudio.ai/docs) models. @@ -258,7 +303,6 @@ This component generates embeddings using [LM Studio](https://lmstudio.ai/docs) |------|--------------|------| | embeddings | Embeddings | The generated embeddings | - ## MistralAI This component generates embeddings using [MistralAI](https://docs.mistral.ai/) models. diff --git a/docs/static/img/component-watsonx-embeddings-chroma.png b/docs/static/img/component-watsonx-embeddings-chroma.png new file mode 100644 index 000000000..a117eb73c Binary files /dev/null and b/docs/static/img/component-watsonx-embeddings-chroma.png differ