From d676aef9b4231e67a0171a4f516a3893d83be156 Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Mon, 3 Feb 2025 15:48:34 +0000
Subject: [PATCH] refactor: Speed up function `_serialize_dataframe` by 123% in
 PR #6044 (`refactor-serialization`) (#6078)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* feat: Implement serialization functions for various data types and add a unified serialize method

* feat: Enhance serialization by adding support for primitive types, enums, and generic types

* fix: Update Pinecone integration to use VectorStore and handle import errors gracefully

* test: Add hypothesis-based tests for serialization functions across various data types

* refactor: Replace custom serialization logic with unified serialize function for consistency and maintainability

* refactor: Replace recursive serialization function with unified serialize method for improved clarity and maintainability

* refactor: Replace custom serialization logic with unified serialize function for improved consistency and clarity

* refactor: Enhance serialization logic by adding instance handling and streamlining type checks

* refactor: Remove custom dictionary serialization from ResultDataResponse for streamlined handling

* refactor: Enhance serialization in ResultDataResponse by adding max_items_length for improved handling of outputs, logs, messages, and artifacts

* refactor: Move MAX_ITEMS_LENGTH and MAX_TEXT_LENGTH constants to serialization module for better organization

* refactor: Simplify message serialization in Log model by utilizing unified serialize function

* refactor: Remove unnecessary pytest marker from TestSerializationHypothesis class

* optimize _serialize_bytes

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* feat: Add support for numpy integer type serialization

* feat: Enhance serialization with support for pandas and numpy types

* test: Add comprehensive serialization tests for numpy and pandas types

* fix: Update _serialize_dispatcher to return string representation for unsupported types

* fix: Update _serialize_dispatcher to return the object directly instead of its string representation

* optmize conditional

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* optimize length check

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* fix: Update string and list truncation to include ellipsis for clarity

* ⚡️ Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`)
Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way.


### Changes Made.
1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed.
2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability.

These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames.

---------

Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
---
 src/backend/base/langflow/serialization/serialization.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/base/langflow/serialization/serialization.py b/src/backend/base/langflow/serialization/serialization.py
index 269776323..d94a166ae 100644
--- a/src/backend/base/langflow/serialization/serialization.py
+++ b/src/backend/base/langflow/serialization/serialization.py
@@ -120,8 +120,10 @@ def _serialize_dataframe(obj: pd.DataFrame, max_length: int | None, max_items: i
     """Serialize pandas DataFrame to a dictionary format."""
     if max_items is not None and len(obj) > max_items:
         obj = obj.head(max_items)
-    obj = obj.apply(lambda x: x.apply(lambda y: _truncate_value(y, max_length, max_items)))
-    return obj.to_dict(orient="records")
+
+    data = obj.to_dict(orient="records")
+
+    return serialize(data, max_length, max_items)
 
 
 def _serialize_series(obj: pd.Series, max_length: int | None, max_items: int | None) -> dict: