refactor: Speed up function _serialize_dataframe by 123% in PR #6044 (refactor-serialization) (#6078)

* feat: Implement serialization functions for various data types and add a unified serialize method

* feat: Enhance serialization by adding support for primitive types, enums, and generic types

* fix: Update Pinecone integration to use VectorStore and handle import errors gracefully

* test: Add hypothesis-based tests for serialization functions across various data types

* refactor: Replace custom serialization logic with unified serialize function for consistency and maintainability

* refactor: Replace recursive serialization function with unified serialize method for improved clarity and maintainability

* refactor: Replace custom serialization logic with unified serialize function for improved consistency and clarity

* refactor: Enhance serialization logic by adding instance handling and streamlining type checks

* refactor: Remove custom dictionary serialization from ResultDataResponse for streamlined handling

* refactor: Enhance serialization in ResultDataResponse by adding max_items_length for improved handling of outputs, logs, messages, and artifacts

* refactor: Move MAX_ITEMS_LENGTH and MAX_TEXT_LENGTH constants to serialization module for better organization

* refactor: Simplify message serialization in Log model by utilizing unified serialize function

* refactor: Remove unnecessary pytest marker from TestSerializationHypothesis class

* optimize _serialize_bytes

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* feat: Add support for numpy integer type serialization

* feat: Enhance serialization with support for pandas and numpy types

* test: Add comprehensive serialization tests for numpy and pandas types

* fix: Update _serialize_dispatcher to return string representation for unsupported types

* fix: Update _serialize_dispatcher to return the object directly instead of its string representation

* optmize conditional

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* optimize length check

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* fix: Update string and list truncation to include ellipsis for clarity

* ️ Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`)
Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way.



### Changes Made.
1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed.
2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability.

These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames.

---------

Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
This commit is contained in:
codeflash-ai[bot] 2025-02-03 15:48:34 +00:00 committed by GitHub
commit d676aef9b4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -120,8 +120,10 @@ def _serialize_dataframe(obj: pd.DataFrame, max_length: int | None, max_items: i
"""Serialize pandas DataFrame to a dictionary format."""
if max_items is not None and len(obj) > max_items:
obj = obj.head(max_items)
obj = obj.apply(lambda x: x.apply(lambda y: _truncate_value(y, max_length, max_items)))
return obj.to_dict(orient="records")
data = obj.to_dict(orient="records")
return serialize(data, max_length, max_items)
def _serialize_series(obj: pd.Series, max_length: int | None, max_items: int | None) -> dict: