From d676aef9b4231e67a0171a4f516a3893d83be156 Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Mon, 3 Feb 2025 15:48:34 +0000 Subject: [PATCH] refactor: Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`) (#6078) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: Implement serialization functions for various data types and add a unified serialize method * feat: Enhance serialization by adding support for primitive types, enums, and generic types * fix: Update Pinecone integration to use VectorStore and handle import errors gracefully * test: Add hypothesis-based tests for serialization functions across various data types * refactor: Replace custom serialization logic with unified serialize function for consistency and maintainability * refactor: Replace recursive serialization function with unified serialize method for improved clarity and maintainability * refactor: Replace custom serialization logic with unified serialize function for improved consistency and clarity * refactor: Enhance serialization logic by adding instance handling and streamlining type checks * refactor: Remove custom dictionary serialization from ResultDataResponse for streamlined handling * refactor: Enhance serialization in ResultDataResponse by adding max_items_length for improved handling of outputs, logs, messages, and artifacts * refactor: Move MAX_ITEMS_LENGTH and MAX_TEXT_LENGTH constants to serialization module for better organization * refactor: Simplify message serialization in Log model by utilizing unified serialize function * refactor: Remove unnecessary pytest marker from TestSerializationHypothesis class * optimize _serialize_bytes Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * feat: Add support for numpy integer type serialization * feat: Enhance serialization with support for pandas and numpy types * test: Add comprehensive serialization tests for numpy and pandas types * fix: Update _serialize_dispatcher to return string representation for unsupported types * fix: Update _serialize_dispatcher to return the object directly instead of its string representation * optmize conditional Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * optimize length check Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * fix: Update string and list truncation to include ellipsis for clarity * ⚡️ Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`) Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way. ### Changes Made. 1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed. 2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability. These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames. --------- Co-authored-by: Gabriel Luiz Freitas Almeida Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> --- src/backend/base/langflow/serialization/serialization.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/base/langflow/serialization/serialization.py b/src/backend/base/langflow/serialization/serialization.py index 269776323..d94a166ae 100644 --- a/src/backend/base/langflow/serialization/serialization.py +++ b/src/backend/base/langflow/serialization/serialization.py @@ -120,8 +120,10 @@ def _serialize_dataframe(obj: pd.DataFrame, max_length: int | None, max_items: i """Serialize pandas DataFrame to a dictionary format.""" if max_items is not None and len(obj) > max_items: obj = obj.head(max_items) - obj = obj.apply(lambda x: x.apply(lambda y: _truncate_value(y, max_length, max_items))) - return obj.to_dict(orient="records") + + data = obj.to_dict(orient="records") + + return serialize(data, max_length, max_items) def _serialize_series(obj: pd.Series, max_length: int | None, max_items: int | None) -> dict: