Merge branch 'zustand/io/migration' of github.com:logspace-ai/langflow into zustand/io/migration

2024-04-04 00:25:02 -03:00 · 2024-04-04 00:25:02 -03:00 · 25873b0190
commit 25873b0190
parent c3967b6e92 5df26325e8
8 changed files with 167 additions and 82 deletions
--- a/docs/docs/components/embeddings.mdx
+++ b/docs/docs/components/embeddings.mdx
@ -2,19 +2,11 @@ import Admonition from "@theme/Admonition";

 # Embeddings

-<Admonition type="caution" icon="🚧" title="ZONE UNDER CONSTRUCTION">
-  <p>
-    We appreciate your understanding as we polish our documentation – it may
-    contain some rough edges. Share your feedback or report issues to help us
-    improve! 🛠️📝
-  </p>
-</Admonition>
-
 Embeddings are vector representations of text that capture the semantic meaning of the text. They are created using text embedding models and allow us to think about the text in a vector space, enabling us to perform tasks like semantic search, where we look for pieces of text that are most similar in the vector space.

 ---

-### BedrockEmbeddings
+### Amazon Bedrock Embeddings

 Used to load [Amazon Bedrocks’s](https://aws.amazon.com/bedrock/) embedding models.

@ -30,7 +22,7 @@ Used to load [Amazon Bedrocks’s](https://aws.amazon.com/bedrock/) embedding mo

 ---

-### CohereEmbeddings
+### Cohere Embeddings

 Used to load [Cohere’s](https://cohere.com/) embedding models.

@ -44,57 +36,93 @@ Used to load [Cohere’s](https://cohere.com/) embedding models.

 ---

-### HuggingFaceEmbeddings
+### Azure OpenAI Embeddings
+
+Generate embeddings using Azure OpenAI models.
+
+**Params**
+
+- **Azure Endpoint:** Your Azure endpoint, including the resource. Example: `https://example-resource.azure.openai.com/`
+- **Deployment Name:** The name of the deployment.
+- **API Version:** The API version to use. (Options: 2022-12-01, 2023-03-15-preview, 2023-05-15, 2023-06-01-preview, 2023-07-01-preview, 2023-08-01-preview)
+- **API Key:** The API key to access the Azure OpenAI service.
+
+---
+
+### Hugging Face API Embeddings
+
+Generate embeddings using Hugging Face Inference API models.
+
+**Params**
+
+- **API Key:** API key for accessing the Hugging Face Inference API. (Type: str)
+- **API URL:** URL of the Hugging Face Inference API. (Default: http://localhost:8080)
+- **Model Name:** Name of the model to use. (Default: BAAI/bge-large-en-v1.5)
+- **Cache Folder:** Folder path to cache Hugging Face models. (Advanced)
+- **Encode Kwargs:** Additional arguments for the encoding process. (Type: dict, Advanced)
+- **Model Kwargs:** Additional arguments for the model. (Type: dict, Advanced)
+- **Multi Process:** Whether to use multiple processes. (Default: False, Advanced)
+
+---
+
+### Hugging Face Embeddings

 Used to load [HuggingFace’s](https://huggingface.co) embedding models.

 **Params**

- **cache_folder:** Used to specify the folder where the embeddings will be cached. When embeddings are computed for a text, they can be stored in the cache folder so that they can be reused later without the need to recompute them. This can improve the performance of the application by avoiding redundant computations.
-
- **encode_kwargs:** Used to pass additional keyword arguments to the encoding method of the underlying HuggingFace model. These keyword arguments can be used to customize the encoding process, such as specifying the maximum length of the input sequence or enabling truncation or padding.
-
- **model_kwargs:** Used to customize the behavior of the model, such as specifying the model architecture, the tokenizer, or any other model-specific configuration options. By using `model_kwargs`, the user can configure the HuggingFace model according to specific needs and preferences.
-
- **model_name:** Used to specify the name or identifier of the HuggingFace model that will be used for generating embeddings. It allows users to choose a specific pre-trained model from the Hugging Face model hub — defaults to `sentence-transformers/all-mpnet-base-v2`.
+- **Cache Folder:** Folder path to cache HuggingFace models.
+- **Encode Kwargs:** Additional arguments for the encoding process. (Type: dict)
+- **Model Kwargs:** Additional arguments for the model. (Type: dict)
+- **Model Name:** Name of the HuggingFace model to use. (Default: sentence-transformers/all-mpnet-base-v2)
+- **Multi Process:** Whether to use multiple processes. (Default: False)

 ---

-### OpenAIEmbeddings
+### Ollama Embeddings
+
+Generate embeddings using Ollama models.
+
+**Params**
+
+- **Ollama Model:** Name of the Ollama model to use. (Default: llama2)
+- **Ollama Base URL:** Base URL of the Ollama API. (Default: http://localhost:11434)
+- **Model Temperature:** Temperature parameter for the model. (Type: float)
+
+---
+
+### OpenAI Embeddings

 Used to load [OpenAI’s](https://openai.com/) embedding models.

 **Params**

- **chunk_size:** Determines the maximum size of each chunk of text that is processed for embedding. If any of the incoming text chunks exceeds `chunk_size` characters, it will be split into multiple chunks of size `chunk_size` or less before being embedded — defaults to `1000`.
-
- **deployment:** Used to specify the deployment name or identifier of the text embedding model. It allows the user to choose a specific deployment of the model to use for embedding. When the deployment is provided, this can be useful when the user has multiple deployments of the same model with different configurations or versions — defaults to `text-embedding-ada-002`.
-
- **embedding_ctx_length:** This parameter determines the maximum context length for the text embedding model. It specifies the number of tokens that the model considers when generating embeddings for a piece of text — defaults to `8191` (this means that the model will consider up to 8191 tokens when generating embeddings).
-
- **max_retries:** Determines the maximum number of times to retry a request if the model provider returns an error from their API — defaults to `6`.
-
- **model:** Defines which pre-trained text embedding model to use — defaults to `text-embedding-ada-002`.
-
- **openai_api_base:** Refers to the base URL for the Azure OpenAI resource. It is used to configure the API to connect to the Azure OpenAI service. The base URL can be found in the Azure portal under the user Azure OpenAI resource.
-
- **openai_api_key:** Is used to authenticate and authorize access to the OpenAI service.
-
- **openai_api_type:** Is used to specify the type of OpenAI API being used, either the regular OpenAI API or the Azure OpenAI API. This parameter allows the `OpenAIEmbeddings` class to connect to the appropriate API service.
-
- **openai_api_version:** Is used to specify the version of the OpenAI API being used. This parameter allows the `OpenAIEmbeddings` class to connect to the appropriate version of the OpenAI API service.
-
- **openai_organization:** Is used to specify the organization associated with the OpenAI API key. If not provided, the default organization associated with the API key will be used.
-
- **openai_proxy:** Proxy enables better budgeting and cost management for making OpenAI API calls, including more transparency into pricing.
-
- **request_timeout:** Used to specify the maximum amount of time, in milliseconds, to wait for a response from the OpenAI API when generating embeddings for a given text.
-
- **tiktoken_model_name:** Used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name.
+- **OpenAI API Key:** The API key to use for accessing the OpenAI API. (Type: str)
+- **Default Headers:** Default headers for the HTTP requests. (Type: Dict[str, str], Optional)
+- **Default Query:** Default query parameters for the HTTP requests. (Type: NestedDict, Optional)
+- **Allowed Special:** Special tokens allowed for processing. (Type: List[str], Default: [])
+- **Disallowed Special:** Special tokens disallowed for processing. (Type: List[str], Default: ["all"])
+- **Chunk Size:** Chunk size for processing. (Type: int, Default: 1000)
+- **Client:** HTTP client for making requests. (Type: Any, Optional)
+- **Deployment:** Deployment name for the model. (Type: str, Default: "text-embedding-3-small")
+- **Embedding Context Length:** Length of embedding context. (Type: int, Default: 8191)
+- **Max Retries:** Maximum number of retries for failed requests. (Type: int, Default: 6)
+- **Model:** Name of the model to use. (Type: str, Default: "text-embedding-3-small")
+- **Model Kwargs:** Additional keyword arguments for the model. (Type: NestedDict, Optional)
+- **OpenAI API Base:** Base URL of the OpenAI API. (Type: str, Optional)
+- **OpenAI API Type:** Type of the OpenAI API. (Type: str, Optional)
+- **OpenAI API Version:** Version of the OpenAI API. (Type: str, Optional)
+- **OpenAI Organization:** Organization associated with the API key. (Type: str, Optional)
+- **OpenAI Proxy:** Proxy server for the requests. (Type: str, Optional)
+- **Request Timeout:** Timeout for the HTTP requests. (Type: float, Optional)
+- **Show Progress Bar:** Whether to show a progress bar for processing. (Type: bool, Default: False)
+- **Skip Empty:** Whether to skip empty inputs. (Type: bool, Default: False)
+- **TikToken Enable:** Whether to enable TikToken. (Type: bool, Default: True)
+- **TikToken Model Name:** Name of the TikToken model. (Type: str, Optional)

 ---

-### VertexAIEmbeddings
+### VertexAI Embeddings

 Wrapper around [Google Vertex AI](https://cloud.google.com/vertex-ai) [Embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).

@ -113,11 +141,3 @@ Vertex AI is a cloud computing platform offered by Google Cloud Platform (GCP).
 - **top_p:** Tokens are selected from most probable to least until the sum of their – defaults to `0.95`.
 - **tuned_model_name:** The name of a tuned model. If provided, model_name is ignored.
 - **verbose:** This parameter is used to control the level of detail in the output of the chain. When set to True, it will print out some internal states of the chain while it is being run, which can help debug and understand the chain's behavior. If set to False, it will suppress the verbose output – defaults to `False`.
-
-### OllamaEmbeddings
-
-Used to load [Ollama’s](https://ollama.ai/) embedding models. Wrapper around LangChain's [Ollama API](https://python.langchain.com/docs/integrations/text_embedding/ollama).
-
- **model** The name of the Ollama model to use – defaults to `llama2`.
- **base_url** The base URL for the Ollama API – defaults to `http://localhost:11434`.
- **temperature** Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to `0`.
--- a/docs/docs/index.mdx
+++ b/docs/docs/index.mdx
@ -1,11 +1,11 @@
-# 👋 Welcome to Langflow
-
-Langflow is an easy way to build from simple to complex AI applications. It is a low-code platform that allows you to integrate AI into everything you do.
-
 import ThemedImage from "@theme/ThemedImage";
 import useBaseUrl from "@docusaurus/useBaseUrl";
 import ZoomableImage from "/src/theme/ZoomableImage.js";

+# 👋 Welcome to Langflow
+
+Langflow is an easy way to build from simple to complex AI applications. It is a low-code platform that allows you to integrate AI into everything you do.
+
 {" "}

 {" "}
--- a/docs/docs/migration/global-variables.mdx
+++ b/docs/docs/migration/global-variables.mdx
@ -0,0 +1,65 @@
+import ZoomableImage from "/src/theme/ZoomableImage.js";
+import Admonition from "@theme/Admonition";
+
+# Global Variables
+
+Global Variables are a really useful feature of Langflow.
+They allow you to define reusable variables that can be accessed from any Text field in your project.
+
+The first thing you need to do is find a **Text field** in a Component, so let's talk about what a Text field is.
+
+## Text Fields
+
+Text fields are the fields in a Component where you can write text but that does not allow you to open a Text Area.
+
+The easiest way to find fields that are Text fields, though, is to look for fields that have a 🌐 button.
+
+<ZoomableImage
+  alt="Docusaurus themed image"
+  sources={{
+    light: "img/ollama-gv.png",
+    dark: "img/ollama-gv.png",
+  }}
+  style={{ width: "50%" }}
+/>
+
+## Creating a Global Variable
+
+To create a Global Variable, you need to click on the 🌐 button in a Text field and that will open a dropdown showing your currently available variables and at the end of it **+ Add New Variable**.
+
+<ZoomableImage
+  alt="Docusaurus themed image"
+  sources={{
+    light: "img/add-new-variable.png",
+    dark: "img/add-new-variable.png",
+  }}
+  style={{ width: "60%" }}
+/>
+
+Click on **+ Add New Variable** and a window will open where you can define your new Global Variable.
+
+In it, you can define the **Name** of the variable, the optional **Type** of the variable, and the **Value** of the variable.
+
+The **Name** is the name that you will use to refer to the variable in your Text fields.
+
+The **Type** is optional for now but will be used in the future to allow for more advanced features.
+
+The **Value** is the value that the variable will have.
+{/* say that all variables are encrypted */}
+
+<Admonition type="warning">
+  All Global Variables are encrypted and cannot be accessed by anyone but you.
+</Admonition>
+
+<ZoomableImage
+  alt="Docusaurus themed image"
+  sources={{
+    light: "img/create-variable-window.png",
+    dark: "img/create-variable-window.png",
+  }}
+  style={{ width: "60%" }}
+/>
+
+After you have defined your variable, click on **Save Variable** and your variable will be created.
+
+After that, once you click on the 🌐 button in a Text field, you will see your new variable in the dropdown.
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -23,29 +23,24 @@ module.exports = {
        "whats-new/migrating-to-one-point-zero",
      ],
    },
+
    {
      type: "category",
-      label: " Step-by-Step Guides",
-      collapsed: false,
-      items: ["guides/langfuse_integration"],
-    },
-    {
-      type: "category",
-      label: "Migration Guides",
+      label: " Migration Guides",
      collapsed: false,
      items: [
        // "migration/flow-of-data",
        "migration/inputs-and-outputs",
        // "migration/supported-frameworks",
-        // "migration/sidebar-and-interaction-panel",
-        // "migration/new-categories-and-components",
-        // "migration/text-and-record",
+        "migration/sidebar-and-interaction-panel",
+        "migration/new-categories-and-components",
+        "migration/text-and-record",
        // "migration/custom-component",
        "migration/compatibility",
-        // "migration/multiple-flows",
-        // "migration/component-status-and-data-passing",
+        "migration/multiple-flows",
+        "migration/component-status-and-data-passing",
        // "migration/connecting-output-components",
-        // "migration/renaming-and-editing-components",
+        "migration/renaming-and-editing-components",
        // "migration/passing-tweaks-and-inputs",
        "migration/global-variables",
        // "migration/experimental-components",
@ -68,6 +63,12 @@ module.exports = {
        "guidelines/custom-component",
      ],
    },
+    {
+      type: "category",
+      label: "Step-by-Step Guides",
+      collapsed: false,
+      items: ["guides/langfuse_integration"],
+    },
    {
      type: "category",
      label: "Core Components",
--- a/docs/static/img/add-new-variable.png
+++ b/docs/static/img/add-new-variable.png
--- a/docs/static/img/create-variable-window.png
+++ b/docs/static/img/create-variable-window.png
--- a/docs/static/img/ollama-gv.png
+++ b/docs/static/img/ollama-gv.png
--- a/src/backend/base/langflow/components/helpers/SplitText.py
+++ b/src/backend/base/langflow/components/helpers/SplitText.py
@ -1,14 +1,11 @@
 from typing import Optional

-from langchain.text_splitter import (
-    RecursiveCharacterTextSplitter,
-    CharacterTextSplitter,
-)
+from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
 from langchain_core.documents import Document

+from langflow.field_typing import Text
 from langflow.interface.custom.custom_component import CustomComponent
 from langflow.schema import Record
-from langflow.field_typing import Text
 from langflow.utils.util import unescape_string


@ -18,10 +15,10 @@ class SplitTextComponent(CustomComponent):

    def build_config(self):
        return {
-            "texts": {
-                "display_name": "Texts",
+            "inputs": {
+                "display_name": "Inputs",
                "info": "Texts to split.",
-                "input_types": ["Text"],
+                "input_types": ["Record", "Text"],
            },
            "separators": {
                "display_name": "Separators",
@ -48,7 +45,7 @@ class SplitTextComponent(CustomComponent):

    def build(
        self,
-        texts: list[Text],
+        inputs: list[Text],
        separators: Optional[list[str]] = [" "],
        chunk_size: Optional[int] = 1000,
        chunk_overlap: Optional[int] = 200,
@ -77,9 +74,11 @@ class SplitTextComponent(CustomComponent):
            )

        documents = []
-        for _text in texts:
-            # documents.append(_input.to_lc_document())
-            documents.append(Document(page_content=_text))
+        for _input in inputs:
+            if isinstance(_input, Record):
+                documents.append(_input.to_lc_document())
+            else:
+                documents.append(Document(page_content=_input))

        records = self.to_records(splitter.split_documents(documents))
        self.status = records