diff --git a/docs/docs/Components/components-processing.md b/docs/docs/Components/components-processing.md index 353705484..bf51acb1f 100644 --- a/docs/docs/Components/components-processing.md +++ b/docs/docs/Components/components-processing.md @@ -13,26 +13,7 @@ The **Split Text** processing component in this flow splits the incoming [Data]( The component offers control over chunk size, overlap, and separator, which affect context and granularity in vector store retrieval results. -![](/img/vector-store-document-ingestion.png) - -## Alter metadata - -This component modifies metadata of input objects. It can add new metadata, update existing metadata, and remove specified metadata fields. The component works with both [Message](/concepts-objects#message-object) and [Data](/concepts-objects#data-object) objects, and can also create a new Data object from user-provided text. - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| input_value | Input | Objects to which Metadata should be added | -| text_in | User Text | Text input; the value will be in the 'text' attribute of the [Data](/concepts-objects#data-object) object. Empty text entries are ignored. | -| metadata | Metadata | Metadata to add to each object | -| remove_fields | Fields to Remove | Metadata Fields to Remove | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| data | Data | List of Input objects, each with added metadata | +![A vector store ingesting documents](/img/vector-store-document-ingestion.png) ## Combine data @@ -60,6 +41,25 @@ The component iterates through the input list of data objects, merging them into This component concatenates two text sources into a single text chunk using a specified delimiter. +1. To use this component in a flow, connect two components that output [Messages](/concepts-objects#message-object) to the **Combine Text** component's **First Text** and **Second Text** inputs. +This example uses two **Text Input** components. + +![Combine text component](/img/component-combine-text.png) + +2. In the **Combine Text** component, in the **Text** fields of both **Text Input** components, enter some text to combine. +3. In the **Combine Text** component, enter an optional **Delimiter** value. +The delimiter character separates the combined texts. +This example uses `\n\n **end first text** \n\n **start second text** \n\n` to label the texts and create newlines between them. +4. Connect a **Chat Output** component to view the text combination. +5. Click **Playground**, and then click **Run Flow**. +The combined text appears in the **Playground**. +```text +This is the first text. Let's combine text! +end first text +start second text +Here's the second part. We'll see how combining text works. +``` + ### Inputs | Name | Display Name | Info | @@ -74,95 +74,6 @@ This component concatenates two text sources into a single text chunk using a sp |------|--------------|------| |message |Message |A [Message](/concepts-objects#message-object) object containing the combined text. - -## Create data - -:::important -This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. -::: - -This component dynamically creates a [Data](/concepts-objects#data-object) object with a specified number of fields. - -### Inputs -| Name | Display Name | Info | -|------|--------------|------| -| number_of_fields | Number of Fields | The number of fields to be added to the record. | -| text_key | Text Key | Key that identifies the field to be used as the text content. | -| text_key_validator | Text Key Validator | If enabled, checks if the given `Text Key` is present in the given `Data`. | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| data | Data | A [Data](/concepts-objects#data-object) object created with the specified fields and text key. | - -## Data to DataFrame - -This component converts one or multiple [Data](/concepts-objects#data-object) objects into a [DataFrame](/concepts-objects#dataframe-object). Each Data object corresponds to one row in the resulting DataFrame. Fields from the `.data` attribute become columns, and the `.text` field (if present) is placed in a 'text' column. - -1. To use this component in a flow, connect a component that outputs [Data](/concepts-objects#data-object) to the **Data to Dataframe** component's input. -This example connects a **Webhook** component to convert `text` and `data` into a DataFrame. -2. To view the flow's output, connect a **Chat Output** component to the **Data to Dataframe** component. - -![A webhook and data to dataframe](/img/component-data-to-dataframe.png) - -3. Send a POST request to the **Webhook** containing your JSON data. -Replace `YOUR_FLOW_ID` with your flow ID. -This example uses the default Langflow server address. -```text -curl -X POST "http://127.0.0.1:7860/api/v1/webhook/YOUR_FLOW_ID" \ --H 'Content-Type: application/json' \ --d '{ - "text": "Alex Cruz - Employee Profile", - "data": { - "Name": "Alex Cruz", - "Role": "Developer", - "Department": "Engineering" - } -}' -``` - -4. In the **Playground**, view the output of your flow. -The **Data to DataFrame** component converts the webhook request into a `DataFrame`, with `text` and `data` fields as columns. -```text -| text | data | -|:-----------------------------|:------------------------------------------------------------------------| -| Alex Cruz - Employee Profile | {'Name': 'Alex Cruz', 'Role': 'Developer', 'Department': 'Engineering'} | -``` - -5. Send another employee data object. -```text -curl -X POST "http://127.0.0.1:7860/api/v1/webhook/YOUR_FLOW_ID" \ --H 'Content-Type: application/json' \ --d '{ - "text": "Kalani Smith - Employee Profile", - "data": { - "Name": "Kalani Smith", - "Role": "Designer", - "Department": "Design" - } -}' -``` - -6. In the **Playground**, this request is also converted to `DataFrame`. -```text -| text | data | -|:--------------------------------|:---------------------------------------------------------------------| -| Kalani Smith - Employee Profile | {'Name': 'Kalani Smith', 'Role': 'Designer', 'Department': 'Design'} | -``` - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| data_list | Data or Data List | One or multiple Data objects to transform into a DataFrame. | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| dataframe | DataFrame | A DataFrame built from each Data object's fields plus a 'text' column. | - ## DataFrame operations This component performs operations on [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) rows and columns. @@ -246,33 +157,72 @@ This component can perform the following operations on Pandas [DataFrame](https: | output | DataFrame | The resulting DataFrame after the operation. | -## Data to message +## Data to DataFrame -:::important -This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.3. -Instead, use the [Parser](#parser) component. -::: +This component converts one or multiple [Data](/concepts-objects#data-object) objects into a [DataFrame](/concepts-objects#dataframe-object). Each Data object corresponds to one row in the resulting DataFrame. Fields from the `.data` attribute become columns, and the `.text` field (if present) is placed in a 'text' column. -:::important -Prior to Langflow version 1.1.3, this component was named **Parse Data**. -::: +1. To use this component in a flow, connect a component that outputs [Data](/concepts-objects#data-object) to the **Data to Dataframe** component's input. +This example connects a **Webhook** component to convert `text` and `data` into a DataFrame. +2. To view the flow's output, connect a **Chat Output** component to the **Data to Dataframe** component. -The ParseData component converts data objects into plain text using a specified template. -This component transforms structured data into human-readable text formats, allowing for customizable output through the use of templates. +![A webhook and data to dataframe](/img/component-data-to-dataframe.png) + +3. Send a POST request to the **Webhook** containing your JSON data. +Replace `YOUR_FLOW_ID` with your flow ID. +This example uses the default Langflow server address. +```text +curl -X POST "http://127.0.0.1:7860/api/v1/webhook/YOUR_FLOW_ID" \ +-H 'Content-Type: application/json' \ +-d '{ + "text": "Alex Cruz - Employee Profile", + "data": { + "Name": "Alex Cruz", + "Role": "Developer", + "Department": "Engineering" + } +}' +``` + +4. In the **Playground**, view the output of your flow. +The **Data to DataFrame** component converts the webhook request into a `DataFrame`, with `text` and `data` fields as columns. +```text +| text | data | +|:-----------------------------|:------------------------------------------------------------------------| +| Alex Cruz - Employee Profile | {'Name': 'Alex Cruz', 'Role': 'Developer', 'Department': 'Engineering'} | +``` + +5. Send another employee data object. +```text +curl -X POST "http://127.0.0.1:7860/api/v1/webhook/YOUR_FLOW_ID" \ +-H 'Content-Type: application/json' \ +-d '{ + "text": "Kalani Smith - Employee Profile", + "data": { + "Name": "Kalani Smith", + "Role": "Designer", + "Department": "Design" + } +}' +``` + +6. In the **Playground**, this request is also converted to `DataFrame`. +```text +| text | data | +|:--------------------------------|:---------------------------------------------------------------------| +| Kalani Smith - Employee Profile | {'Name': 'Kalani Smith', 'Role': 'Designer', 'Department': 'Design'} | +``` ### Inputs | Name | Display Name | Info | |------|--------------|------| -| data | Data | The data to convert to text. | -| template | Template | The template to use for formatting the data. It can contain the keys `{text}`, `{data}`, or any other key in the data. | -| sep | Separator | The separator to use between multiple data items. | +| data_list | Data or Data List | One or multiple Data objects to transform into a DataFrame. | ### Outputs | Name | Display Name | Info | |------|--------------|------| -| text | Text | The resulting formatted text string as a [Message](/concepts-objects#message-object) object. | +| dataframe | DataFrame | A DataFrame built from each Data object's fields plus a 'text' column. | ## Filter data @@ -317,24 +267,6 @@ The Filter values component filters a list of data items based on a specified ke |------|--------------|------| | filtered_data | Filtered data | The resulting list of filtered data items. | -## JSON cleaner - -The JSON cleaner component cleans JSON strings to ensure they are fully compliant with the JSON specification. - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| json_str | JSON String | The JSON string to be cleaned. This can be a raw, potentially malformed JSON string produced by language models or other sources that may not fully comply with JSON specifications. | -| remove_control_chars | Remove Control Characters | If set to True, this option removes control characters (ASCII characters 0-31 and 127) from the JSON string. This can help eliminate invisible characters that might cause parsing issues or make the JSON invalid. | -| normalize_unicode | Normalize Unicode | When enabled, this option normalizes Unicode characters in the JSON string to their canonical composition form (NFC). This ensures consistent representation of Unicode characters across different systems and prevents potential issues with character encoding. | -| validate_json | Validate JSON | If set to True, this option attempts to parse the JSON string to ensure it is well-formed before applying the final repair operation. It raises a ValueError if the JSON is invalid, allowing for early detection of major structural issues in the JSON. | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| output | Cleaned JSON String | The resulting cleaned, repaired, and validated JSON string that fully complies with the JSON specification. | ## Lambda filter @@ -385,7 +317,6 @@ This component routes requests to the most appropriate LLM based on OpenRouter m | output | Output | The response from the selected model | | selected_model | Selected Model | Name of the chosen model | - ## Message to data This component converts [Message](/concepts-objects#message-object) objects to [Data](/concepts-objects#data-object) objects. @@ -456,173 +387,56 @@ For an additional example of using the **Parser** component to format a DataFram |------|--------------|------| | parsed_text | Parsed Text | The resulting formatted text as a [Message](/concepts-objects#message-object) object. | -## Parse DataFrame - -:::important -This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.3. -Instead, use the [Parser](#parser) component. -::: - -This component converts DataFrames into plain text using templates. - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| df | DataFrame | The DataFrame to convert to text rows. | -| template | Template | Template for formatting (use `{column_name}` placeholders). | -| sep | Separator | String to join rows in output. | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| text | Text | All rows combined into single text. | - -## Parse JSON - -:::important -This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. -::: - -This component converts and extracts JSON fields using JQ queries. - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| input_value | Input | Data object to filter ([Message](/concepts-objects#message-object) or [Data](/concepts-objects#data-object)). | -| query | JQ Query | JQ Query to filter the data | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| filtered_data | Filtered Data | Filtered data as list of [Data](/concepts-objects#data-object) objects. | - -## Regex extractor - -This component extracts patterns from text using regular expressions. It can be used to find and extract specific patterns or information from text data. - -To use this component in a flow: - -1. Connect the **Regex Extractor** to a **URL** component and a **Chat Output** component. - -![Regex extractor connected to url component](/img/component-url-regex.png) - -2. In the **Regex Extractor** tool, enter a pattern to extract text from the **URL** component's raw output. -This example extracts the first paragraph from the "In the News" section of `https://en.wikipedia.org/wiki/Main_Page`: -``` -In the news\s*\n(.*?)(?=\n\n) -``` - -Result: -``` -Peruvian writer and Nobel Prize in Literature laureate Mario Vargas Llosa (pictured) dies at the age of 89. -``` - -## Save to File - -This component saves [DataFrames, Data, or Messages](/concepts-objects) to various file formats. - -1. To use this component in a flow, connect a component that outputs [DataFrames, Data, or Messages](/concepts-objects) to the **Save to File** component's input. -The following example connects a **Webhook** component to two **Save to File** components to demonstrate the different outputs. - -![Two Save-to File components connected to a webhook](/img/component-save-to-file.png) - -2. In the **Save to File** component's **Input Type** field, select the expected input type. -This example expects **Data** from the **Webhook**. -3. In the **File Format** field, select the file type for your saved file. -This example uses `.md` in one **Save to File** component, and `.xlsx` in another. -4. In the **File Path** field, enter the path for your saved file. -This example uses `./output/employees.xlsx` and `./output/employees.md` to save the files in a directory relative to where Langflow is running. -The component accepts both relative and absolute paths, and creates any necessary directories if they don't exist. -:::tip -If you enter a format in the `file_path` that is not accepted, the component appends the proper format to the file. -For example, if the selected `file_format` is `csv`, and you enter `file_path` as `./output/test.txt`, the file will be saved as `./output/test.txt.csv` so the file is not corrupted. -::: -5. Send a POST request to the **Webhook** containing your JSON data. -Replace `YOUR_FLOW_ID` with your flow ID. -This example uses the default Langflow server address. -```text -curl -X POST "http://127.0.0.1:7860/api/v1/webhook/YOUR_FLOW_ID" \ --H 'Content-Type: application/json' \ --d '{ - "Name": ["Alex Cruz", "Kalani Smith", "Noam Johnson"], - "Role": ["Developer", "Designer", "Manager"], - "Department": ["Engineering", "Design", "Management"] -}' -``` -6. In your local filesystem, open the `outputs` directory. -You should see two files created from the data you've sent: one in `.xlsx` for structured spreadsheets, and one in Markdown. -```text -| Name | Role | Department | -|:-------------|:----------|:-------------| -| Alex Cruz | Developer | Engineering | -| Kalani Smith | Designer | Design | -| Noam Johnson | Manager | Management | -``` - -### File input format options - -For `DataFrame` and `Data` inputs, the component can create: - - `csv` - - `excel` - - `json` - - `markdown` - - `pdf` - -For `Message` inputs, the component can create: - - `txt` - - `json` - - `markdown` - - `pdf` - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| input_text | Input Text | The text to analyze and extract patterns from. | -| pattern | Regex Pattern | The regular expression pattern to match in the text. | -| input_type | Input Type | Select the type of input to save.| -| df | DataFrame | The DataFrame to save. | -| data | Data | The Data object to save. | -| message | Message | The Message to save. | -| file_format | File Format | Select the file format to save the input. | -| file_path | File Path | The full file path including filename and extension. | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| data | Data | List of extracted matches as [Data](/concepts-objects#data-object) objects. | -| text | Message | The extracted matches formatted as a [Message](/concepts-objects#message-object) object. | -| confirmation | Confirmation | Confirmation message after saving the file. | - -## Select data - -:::important -This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. -::: - -This component selects a single [Data](/concepts-objects#data-object) item from a list. - -### Inputs - -| Name | Display Name | Info | -|------|--------------|------| -| data_list | Data List | List of data to select from | -| data_index | Data Index | Index of the data to select | - -### Outputs - -| Name | Display Name | Info | -|------|--------------|------| -| selected_data | Selected Data | The selected [Data](/concepts-objects#data-object) object. | - ## Split text -This component splits text into chunks based on specified criteria. +This component splits text into chunks based on specified criteria. It's ideal for chunking data to be tokenized and embedded into vector databases. + +The **Split Text** component outputs **Chunks** or **DataFrame**. +The **Chunks** output returns a list of individual text chunks. +The **DataFrame** output returns a structured data format, with additional `text` and `metadata` columns applied. + +1. To use this component in a flow, connect a component that outputs [Data or DataFrame](/concepts-objects) to the **Split Text** component's **Data** port. +This example uses the **URL** component, which is fetching JSON placeholder data. + +![Split text component and chroma-db](/img/component-split-text.png) + +2. In the **Split Text** component, define your data splitting parameters. + +This example splits incoming JSON data at the separator `},`, so each chunk contains one JSON object. + +The order of precedence is **Separator**, then **Chunk Size**, and then **Chunk Overlap**. +If any segment after separator splitting is longer than `chunk_size`, it is split again to fit within `chunk_size`. + +After `chunk_size`, **Chunk Overlap** is applied between chunks to maintain context. + +3. Connect a **Chat Output** component to the **Split Text** component's **DataFrame** output to view its output. +4. Click **Playground**, and then click **Run Flow**. +The output contains a table of JSON objects split at `},`. +```text +{ +"userId": 1, +"id": 1, +"title": "Introduction to Artificial Intelligence", +"body": "Learn the basics of Artificial Intelligence and its applications in various industries.", +"link": "https://example.com/article1", +"comment_count": 8 +}, +{ +"userId": 2, +"id": 2, +"title": "Web Development with React", +"body": "Build modern web applications using React.js and explore its powerful features.", +"link": "https://example.com/article2", +"comment_count": 12 +}, +``` +5. Clear the **Separator** field, and then run the flow again. +Instead of JSON objects, the output contains 50-character lines of text with 10 characters of overlap. +```text +First chunk: "title": "Introduction to Artificial Intelligence"" +Second chunk: "elligence", "body": "Learn the basics of Artif" +Third chunk: "s of Artificial Intelligence and its applications" +``` ### Inputs @@ -659,3 +473,140 @@ This component dynamically updates or appends data with specified fields. | Name | Display Name | Info | |------|--------------|------| | data | Data | Updated [Data](/concepts-objects#data-object) objects. | + +## Legacy components + +**Legacy** components are available to use but no longer supported. + +### Alter metadata + +This component modifies metadata of input objects. It can add new metadata, update existing metadata, and remove specified metadata fields. The component works with both [Message](/concepts-objects#message-object) and [Data](/concepts-objects#data-object) objects, and can also create a new Data object from user-provided text. + +#### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| input_value | Input | Objects to which Metadata should be added | +| text_in | User Text | Text input; the value will be in the 'text' attribute of the [Data](/concepts-objects#data-object) object. Empty text entries are ignored. | +| metadata | Metadata | Metadata to add to each object | +| remove_fields | Fields to Remove | Metadata fields to remove | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| data | Data | List of Input objects, each with added metadata | + +### Create data + +:::important +This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. +::: + +This component dynamically creates a [Data](/concepts-objects#data-object) object with a specified number of fields. + +#### Inputs +| Name | Display Name | Info | +|------|--------------|------| +| number_of_fields | Number of Fields | The number of fields to be added to the record. | +| text_key | Text Key | Key that identifies the field to be used as the text content. | +| text_key_validator | Text Key Validator | If enabled, checks if the given `Text Key` is present in the given `Data`. | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| data | Data | A [Data](/concepts-objects#data-object) object created with the specified fields and text key. | + +### Data to message + +:::important +This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.3. +Instead, use the [Parser](#parser) component. +::: + +:::important +Prior to Langflow version 1.1.3, this component was named **Parse Data**. +::: + +The ParseData component converts data objects into plain text using a specified template. +This component transforms structured data into human-readable text formats, allowing for customizable output through the use of templates. + +#### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| data | Data | The data to convert to text. | +| template | Template | The template to use for formatting the data. It can contain the keys `{text}`, `{data}`, or any other key in the data. | +| sep | Separator | The separator to use between multiple data items. | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| text | Text | The resulting formatted text string as a [Message](/concepts-objects#message-object) object. | + +### Parse DataFrame + +:::important +This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.3. +Instead, use the [Parser](#parser) component. +::: + +This component converts DataFrames into plain text using templates. + +#### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| df | DataFrame | The DataFrame to convert to text rows. | +| template | Template | Template for formatting (use `{column_name}` placeholders). | +| sep | Separator | String to join rows in output. | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| text | Text | All rows combined into single text. | + +### Parse JSON + +:::important +This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. +::: + +This component converts and extracts JSON fields using JQ queries. + +#### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| input_value | Input | Data object to filter ([Message](/concepts-objects#message-object) or [Data](/concepts-objects#data-object)). | +| query | JQ Query | JQ Query to filter the data | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| filtered_data | Filtered Data | Filtered data as list of [Data](/concepts-objects#data-object) objects. | + +### Select data + +:::important +This component is in **Legacy**, which means it is no longer in active development as of Langflow version 1.1.3. +::: + +This component selects a single [Data](/concepts-objects#data-object) item from a list. + +#### Inputs + +| Name | Display Name | Info | +|------|--------------|------| +| data_list | Data List | List of data to select from | +| data_index | Data Index | Index of the data to select | + +#### Outputs + +| Name | Display Name | Info | +|------|--------------|------| +| selected_data | Selected Data | The selected [Data](/concepts-objects#data-object) object. | diff --git a/docs/static/img/component-combine-text.png b/docs/static/img/component-combine-text.png new file mode 100644 index 000000000..cc92c2113 Binary files /dev/null and b/docs/static/img/component-combine-text.png differ diff --git a/docs/static/img/component-split-text.png b/docs/static/img/component-split-text.png new file mode 100644 index 000000000..c70943028 Binary files /dev/null and b/docs/static/img/component-split-text.png differ