refactor: (codeflash)️ Speed up method JSONCleaner._remove_control_characters by 1,491% (#5322)

* ️ Speed up method `JSONCleaner._remove_control_characters` by 1,491%
To optimize the function `_remove_control_characters`, we can use the `translate` method with a translation table to remove control characters. This method is generally faster than using regular expressions for character replacement/removal tasks.

Here is the optimized version of the program.



By precompiling the translation table in the `__init__` method, we're reducing the repeated overhead of creating this table every time `_remove_control_characters` is called. Using `str.translate` with this precompiled table significantly improves the performance compared to using a regular expression substitution.

* add super()

---------

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
This commit is contained in:
Saurabh Misra 2024-12-18 13:53:59 -08:00 committed by GitHub
commit 214829ab45
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,5 +1,4 @@
import json
import re
import unicodedata
from langflow.custom import Component
@ -83,7 +82,7 @@ class JSONCleaner(Component):
def _remove_control_characters(self, s: str) -> str:
"""Remove control characters from the string."""
return re.sub(r"[\x00-\x1F\x7F]", "", s)
return s.translate(self.translation_table)
def _normalize_unicode(self, s: str) -> str:
"""Normalize Unicode characters in the string."""
@ -97,3 +96,8 @@ class JSONCleaner(Component):
msg = f"Invalid JSON string: {e}"
raise ValueError(msg) from e
return s
def __init__(self):
# Create a translation table that maps control characters to None
super().__init__()
self.translation_table = str.maketrans("", "", "".join(chr(i) for i in range(32)) + chr(127))