refactor: Speed up function _serialize_dataframe by 123% in PR #6044 (refactor-serialization) (#6078)
* feat: Implement serialization functions for various data types and add a unified serialize method * feat: Enhance serialization by adding support for primitive types, enums, and generic types * fix: Update Pinecone integration to use VectorStore and handle import errors gracefully * test: Add hypothesis-based tests for serialization functions across various data types * refactor: Replace custom serialization logic with unified serialize function for consistency and maintainability * refactor: Replace recursive serialization function with unified serialize method for improved clarity and maintainability * refactor: Replace custom serialization logic with unified serialize function for improved consistency and clarity * refactor: Enhance serialization logic by adding instance handling and streamlining type checks * refactor: Remove custom dictionary serialization from ResultDataResponse for streamlined handling * refactor: Enhance serialization in ResultDataResponse by adding max_items_length for improved handling of outputs, logs, messages, and artifacts * refactor: Move MAX_ITEMS_LENGTH and MAX_TEXT_LENGTH constants to serialization module for better organization * refactor: Simplify message serialization in Log model by utilizing unified serialize function * refactor: Remove unnecessary pytest marker from TestSerializationHypothesis class * optimize _serialize_bytes Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * feat: Add support for numpy integer type serialization * feat: Enhance serialization with support for pandas and numpy types * test: Add comprehensive serialization tests for numpy and pandas types * fix: Update _serialize_dispatcher to return string representation for unsupported types * fix: Update _serialize_dispatcher to return the object directly instead of its string representation * optmize conditional Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * optimize length check Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * fix: Update string and list truncation to include ellipsis for clarity * ⚡️ Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`) Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way. ### Changes Made. 1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed. 2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability. These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames. --------- Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org> Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
This commit is contained in:
parent
1acc724e23
commit
d676aef9b4
1 changed files with 4 additions and 2 deletions
|
|
@ -120,8 +120,10 @@ def _serialize_dataframe(obj: pd.DataFrame, max_length: int | None, max_items: i
|
|||
"""Serialize pandas DataFrame to a dictionary format."""
|
||||
if max_items is not None and len(obj) > max_items:
|
||||
obj = obj.head(max_items)
|
||||
obj = obj.apply(lambda x: x.apply(lambda y: _truncate_value(y, max_length, max_items)))
|
||||
return obj.to_dict(orient="records")
|
||||
|
||||
data = obj.to_dict(orient="records")
|
||||
|
||||
return serialize(data, max_length, max_items)
|
||||
|
||||
|
||||
def _serialize_series(obj: pd.Series, max_length: int | None, max_items: int | None) -> dict:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue