feat: add StructuredOutput component (#4024)

* Add utility functions to build Pydantic models from schema definitions

* Add unit tests for build_model_from_schema function in test_base_model.py

- Implement various test cases to validate the functionality of build_model_from_schema.
- Test cases cover scenarios such as handling valid and empty schemas, managing unknown field types, and processing schemas with missing optional keys.
- Ensure proper handling of nested list and dict types, and verify the function's efficiency with large schemas.
- Confirm that the function raises exceptions for invalid input and handles duplicate field names correctly.

* Refactor tests in `test_base_model.py` to improve type handling and error checking

* Refactor output schema handling to use TableInput and build_model_from_schema

* Update OpenAI model components and hierarchical crew setup

- Refactor `OpenAIModelComponent` to use `TableInput` for `output_schema` and integrate `build_model_from_schema`.
- Modify `HierarchicalCrewComponent` to use unpacking for base inputs.
- Ensure consistent import statements across JSON files.
- Improve error handling and logging for vector store operations.

* Add chat result model with message building and execution logic

- Implement `build_messages_and_runnable` to construct message lists and configure runnable models.
- Add `get_chat_result` to execute language models with input messages, supporting streaming and custom configurations.
- Handle exceptions with optional custom error messages.

* Add "table" to DIRECT_TYPES in constants.py

* Add support for DataFrame input validation in TableInput class

* Add StructuredOutputComponent for generating structured outputs from language models

* Enhance structured output component with improved input descriptions and schema naming

* Convert DataFrame to list of dictionaries in TableInput validation

* Remove pandas dependency and refactor schema handling in structured_output.py

* Remove 'default' field from structured output schema and update field initialization

* Add 'number' and 'text' types to type mapping and remove default value from field creation

* Enhance error handling in structured output building process

* Improve error message for non-BaseModel output in structured_output.py

* Add unit tests for StructuredOutputComponent in helpers module

- Implement various test cases to ensure correct functionality of StructuredOutputComponent.
- Test successful structured output generation, handling of unsupported language models, and correct output model building.
- Validate handling of multiple outputs, empty and invalid output schemas, and nested schemas.
- Include tests for large input values and invalid language model configurations.

* Update description for StructuredOutputComponent to clarify functionality

* Add default values and error handling for structured output in helpers

* Remove unused 'method' parameter from 'with_structured_output' in MockLanguageModel

* refactor: rename test_base_model.py to test_base_model_from_schema.py

Rename the test_base_model.py file to test_base_model_from_schema.py to better reflect its purpose of testing the build_model_from_schema function. This change improves code clarity and maintainability.

* Add type ignore comments to suppress type checking errors

* Add Generic typing to StructuredOutputComponent and fix method call

* Revert "Refactor output schema handling to use TableInput and build_model_from_schema"

This reverts commit 2e84a8608689bcfb519dc589d3eeef852784f3e4.

* Deprecate JSON mode in OpenAIModel output schema documentation

* Remove unused Generic import and add type ignore comment in StructuredOutputComponent

* Refactor OpenAI model components and deprecate output schema

- Refactored `OpenAIModelComponent` to use `operator.ior` and `functools.reduce` for converting `output_schema` to a dictionary.
- Deprecated the `output_schema` field, updating its info to reflect the deprecation.
- Simplified the `_docs_to_data` method in `SplitTextComponent` for better readability.
- Updated import statements and removed unused imports across multiple JSON files.

* Add specific type ignore comments and update exception types in backend code
This commit is contained in:
Gabriel Luiz Freitas Almeida 2024-10-15 18:41:42 -03:00 committed by GitHub
commit 2be7c56939
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 693 additions and 57 deletions

View file

@ -0,0 +1,76 @@
import warnings
from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
from langflow.field_typing.constants import LanguageModel
from langflow.schema.message import Message
def build_messages_and_runnable(
input_value: str | Message, system_message: str | None, original_runnable: LanguageModel
) -> tuple[list[BaseMessage], LanguageModel]:
messages: list[BaseMessage] = []
system_message_added = False
runnable = original_runnable
if input_value:
if isinstance(input_value, Message):
with warnings.catch_warnings():
warnings.simplefilter("ignore")
if "prompt" in input_value:
prompt = input_value.load_lc_prompt()
if system_message:
prompt.messages = [
SystemMessage(content=system_message),
*prompt.messages, # type: ignore[has-type]
]
system_message_added = True
runnable = prompt | runnable
else:
messages.append(input_value.to_lc_message())
else:
messages.append(HumanMessage(content=input_value))
if system_message and not system_message_added:
messages.insert(0, SystemMessage(content=system_message))
return messages, runnable
def get_chat_result(
runnable: LanguageModel,
input_value: str | Message,
system_message: str | None = None,
config: dict | None = None,
*,
stream: bool = False,
):
if not input_value and not system_message:
msg = "The message you want to send to the model is empty."
raise ValueError(msg)
messages, runnable = build_messages_and_runnable(
input_value=input_value, system_message=system_message, original_runnable=runnable
)
inputs: list | dict = messages or {}
try:
if config and config.get("output_parser") is not None:
runnable = runnable | config["output_parser"]
if config:
runnable = runnable.with_config(
{
"run_name": config.get("display_name", ""),
"project_name": config.get("get_project_name", lambda: "")(),
"callbacks": config.get("get_langchain_callbacks", list)(),
}
)
if stream:
return runnable.stream(inputs)
message = runnable.invoke(inputs)
return message.content if hasattr(message, "content") else message
except Exception as e:
if config and config.get("_get_exception_message") and (message := config["_get_exception_message"](e)):
raise ValueError(message) from e
raise

View file

@ -0,0 +1,111 @@
from typing import cast
from pydantic import BaseModel, Field, create_model
from langflow.base.models.chat_result import get_chat_result
from langflow.custom import Component
from langflow.field_typing.constants import LanguageModel
from langflow.helpers.base_model import build_model_from_schema
from langflow.io import BoolInput, HandleInput, MessageTextInput, Output, StrInput, TableInput
from langflow.schema.data import Data
class StructuredOutputComponent(Component):
display_name = "Structured Output"
description = (
"Transforms LLM responses into **structured data formats**. Ideal for extracting specific information "
"or creating consistent outputs."
)
inputs = [
HandleInput(
name="llm",
display_name="Language Model",
info="The language model to use to generate the structured output.",
input_types=["LanguageModel"],
),
MessageTextInput(name="input_value", display_name="Input message"),
StrInput(
name="schema_name",
display_name="Schema Name",
info="Provide a name for the output data schema.",
),
TableInput(
name="output_schema",
display_name="Output Schema",
info="Define the structure and data types for the model's output.",
table_schema=[
{
"name": "name",
"display_name": "Name",
"type": "str",
"description": "Specify the name of the output field.",
},
{
"name": "description",
"display_name": "Description",
"type": "str",
"description": "Describe the purpose of the output field.",
},
{
"name": "type",
"display_name": "Type",
"type": "str",
"description": (
"Indicate the data type of the output field " "(e.g., str, int, float, bool, list, dict)."
),
"default": "text",
},
{
"name": "multiple",
"display_name": "Multiple",
"type": "boolean",
"description": "Set to True if this output field should be a list of the specified type.",
"default": "False",
},
],
),
BoolInput(
name="multiple",
display_name="Generate Multiple",
info="Set to True if the model should generate a list of outputs instead of a single output.",
),
]
outputs = [
Output(name="structured_output", display_name="Structured Output", method="build_structured_output"),
]
def build_structured_output(self) -> Data:
if not hasattr(self.llm, "with_structured_output"):
msg = "Language model does not support structured output."
raise TypeError(msg)
if not self.output_schema:
msg = "Output schema cannot be empty"
raise ValueError(msg)
_output_model = build_model_from_schema(self.output_schema)
if self.multiple:
output_model = create_model(
self.schema_name,
objects=(list[_output_model], Field(description=f"A list of {self.schema_name}.")), # type: ignore[valid-type]
)
else:
output_model = _output_model
try:
llm_with_structured_output = cast(LanguageModel, self.llm).with_structured_output(schema=output_model) # type: ignore[valid-type, attr-defined]
except NotImplementedError as exc:
msg = f"{self.llm.__class__.__name__} does not support structured output."
raise TypeError(msg) from exc
config_dict = {
"run_name": self.display_name,
"project_name": self.get_project_name(),
"callbacks": self.get_langchain_callbacks(),
}
output = get_chat_result(runnable=llm_with_structured_output, input_value=self.input_value, config=config_dict)
if isinstance(output, BaseModel):
output_dict = output.model_dump()
else:
msg = f"Output should be a Pydantic BaseModel, got {type(output)} ({output})"
raise TypeError(msg)
return Data(data=output_dict)

View file

@ -8,15 +8,7 @@ from langflow.base.models.model import LCModelComponent
from langflow.base.models.openai_constants import OPENAI_MODEL_NAMES
from langflow.field_typing import LanguageModel
from langflow.field_typing.range_spec import RangeSpec
from langflow.inputs import (
BoolInput,
DictInput,
DropdownInput,
FloatInput,
IntInput,
SecretStrInput,
StrInput,
)
from langflow.inputs import BoolInput, DictInput, DropdownInput, FloatInput, IntInput, SecretStrInput, StrInput
from langflow.inputs.inputs import HandleInput
@ -49,7 +41,7 @@ class OpenAIModelComponent(LCModelComponent):
advanced=True,
info="The schema for the Output of the model. "
"You must pass the word JSON in the prompt. "
"If left blank, JSON mode will be disabled.",
"If left blank, JSON mode will be disabled. [DEPRECATED]",
),
DropdownInput(
name="model_name",

View file

@ -1,6 +1,71 @@
from typing import Any, TypedDict
from pydantic import BaseModel as PydanticBaseModel
from pydantic import ConfigDict
from pydantic import ConfigDict, Field, create_model
TRUE_VALUES = ["true", "1", "t", "y", "yes"]
class SchemaField(TypedDict):
name: str
type: str
description: str
multiple: bool
class BaseModel(PydanticBaseModel):
model_config = ConfigDict(populate_by_name=True)
def _get_type_annotation(type_str: str, *, multiple: bool) -> type:
type_mapping = {
"str": str,
"int": int,
"float": float,
"bool": bool,
"boolean": bool,
"list": list[Any],
"dict": dict[str, Any],
"number": float,
"text": str,
}
try:
base_type = type_mapping[type_str]
except KeyError as e:
msg = f"Invalid type: {type_str}"
raise ValueError(msg) from e
if multiple:
return list[base_type] # type: ignore[valid-type]
return base_type # type: ignore[return-value]
def build_model_from_schema(schema: list[SchemaField]) -> type[PydanticBaseModel]:
fields = {}
for field in schema:
field_name = field["name"]
field_type_str = field["type"]
description = field.get("description", "")
multiple = field.get("multiple", False)
multiple = coalesce_bool(multiple)
field_type_annotation = _get_type_annotation(field_type_str, multiple=multiple)
fields[field_name] = (field_type_annotation, Field(description=description))
return create_model("OutputModel", **fields)
def coalesce_bool(value: Any) -> bool:
"""Coalesces the given value into a boolean.
Args:
value (Any): The value to be coalesced.
Returns:
bool: The coalesced boolean value.
"""
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.lower() in TRUE_VALUES
if isinstance(value, int):
return bool(value)
return False

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -2,6 +2,7 @@ import warnings
from collections.abc import AsyncIterator, Iterator
from typing import Any, get_args
from pandas import DataFrame
from pydantic import Field, field_validator
from langflow.inputs.validators import CoalesceBool
@ -34,6 +35,8 @@ class TableInput(BaseInputMixin, MetadataTraceMixin, TableMixin, ListableInputMi
@classmethod
def validate_value(cls, v: Any, _info):
# Check if value is a list of dicts
if isinstance(v, DataFrame):
v = v.to_dict(orient="records")
if not isinstance(v, list):
msg = f"TableInput value must be a list of dictionaries or Data. Value '{v}' is not a list."
raise ValueError(msg) # noqa: TRY004

View file

@ -2,7 +2,7 @@ from enum import Enum
from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator
VALID_TYPES = ["date", "number", "text", "json", "integer", "int", "float", "str", "string"]
VALID_TYPES = ["date", "number", "text", "json", "integer", "int", "float", "str", "string", "boolean"]
class FormatterType(str, Enum):
@ -10,6 +10,7 @@ class FormatterType(str, Enum):
text = "text"
number = "number"
json = "json"
boolean = "boolean"
class Column(BaseModel):

View file

@ -52,17 +52,7 @@ def python_function(text: str) -> str:
PYTHON_BASIC_TYPES = [str, bool, int, float, tuple, list, dict, set]
DIRECT_TYPES = [
"str",
"bool",
"dict",
"int",
"float",
"Any",
"prompt",
"code",
"NestedDict",
]
DIRECT_TYPES = ["str", "bool", "dict", "int", "float", "Any", "prompt", "code", "NestedDict", "table"]
LOADERS_INFO: list[dict[str, Any]] = [

View file

@ -0,0 +1,240 @@
from unittest.mock import MagicMock, patch
import pytest
from pydantic import BaseModel
from langflow.components.helpers.structured_output import StructuredOutputComponent
from langflow.schema.data import Data
@pytest.fixture
def client():
pass
class TestStructuredOutputComponent:
# Ensure that the structured output is successfully generated with the correct BaseModel instance returned by the mock function
def test_successful_structured_output_generation_with_patch_with_config(self):
from unittest.mock import patch
class MockLanguageModel:
def with_structured_output(self, schema):
return self
def with_config(self, config):
return self
def invoke(self, inputs):
return self
def mock_get_chat_result(runnable, input_value, config):
class MockBaseModel(BaseModel):
def model_dump(self):
return {"field": "value"}
return MockBaseModel()
component = StructuredOutputComponent(
llm=MockLanguageModel(),
input_value="Test input",
schema_name="TestSchema",
output_schema=[{"name": "field", "type": "str", "description": "A test field"}],
multiple=False,
)
with patch("langflow.components.helpers.structured_output.get_chat_result", mock_get_chat_result):
result = component.build_structured_output()
assert isinstance(result, Data)
assert result.data == {"field": "value"}
# Raises ValueError when the language model does not support structured output
def test_raises_value_error_for_unsupported_language_model(self):
# Mocking an incompatible language model
class MockLanguageModel:
pass
# Creating an instance of StructuredOutputComponent
component = StructuredOutputComponent(
llm=MockLanguageModel(),
input_value="Test input",
schema_name="TestSchema",
output_schema=[{"name": "field", "type": "str", "description": "A test field"}],
multiple=False,
)
with pytest.raises(TypeError, match="Language model does not support structured output."):
component.build_structured_output()
# Correctly builds the output model from the provided schema
def test_correctly_builds_output_model(self):
# Import internal organization modules, packages, and libraries
from langflow.helpers.base_model import build_model_from_schema
from langflow.inputs.inputs import TableInput
# Setup
component = StructuredOutputComponent()
schema = [
{
"name": "name",
"display_name": "Name",
"type": "str",
"description": "Specify the name of the output field.",
},
{
"name": "description",
"display_name": "Description",
"type": "str",
"description": "Describe the purpose of the output field.",
},
{
"name": "type",
"display_name": "Type",
"type": "str",
"description": (
"Indicate the data type of the output field " "(e.g., str, int, float, bool, list, dict)."
),
},
{
"name": "multiple",
"display_name": "Multiple",
"type": "boolean",
"description": "Set to True if this output field should be a list of the specified type.",
},
]
component.output_schema = TableInput(name="output_schema", display_name="Output Schema", table_schema=schema)
# Assertion
output_model = build_model_from_schema(schema)
assert isinstance(output_model, type)
# Properly handles multiple outputs when 'multiple' is set to True
def test_handles_multiple_outputs(self):
# Import internal organization modules, packages, and libraries
from langflow.helpers.base_model import build_model_from_schema
from langflow.inputs.inputs import TableInput
# Setup
component = StructuredOutputComponent()
schema = [
{
"name": "name",
"display_name": "Name",
"type": "str",
"description": "Specify the name of the output field.",
},
{
"name": "description",
"display_name": "Description",
"type": "str",
"description": "Describe the purpose of the output field.",
},
{
"name": "type",
"display_name": "Type",
"type": "str",
"description": (
"Indicate the data type of the output field " "(e.g., str, int, float, bool, list, dict)."
),
},
{
"name": "multiple",
"display_name": "Multiple",
"type": "boolean",
"description": "Set to True if this output field should be a list of the specified type.",
},
]
component.output_schema = TableInput(name="output_schema", display_name="Output Schema", table_schema=schema)
component.multiple = True
# Assertion
output_model = build_model_from_schema(schema)
assert isinstance(output_model, type)
def test_empty_output_schema(self):
component = StructuredOutputComponent(
llm=MagicMock(),
input_value="Test input",
schema_name="EmptySchema",
output_schema=[],
multiple=False,
)
with pytest.raises(ValueError, match="Output schema cannot be empty"):
component.build_structured_output()
def test_invalid_output_schema_type(self):
component = StructuredOutputComponent(
llm=MagicMock(),
input_value="Test input",
schema_name="InvalidSchema",
output_schema=[{"name": "field", "type": "invalid_type", "description": "Invalid field"}],
multiple=False,
)
with pytest.raises(ValueError, match="Invalid type: invalid_type"):
component.build_structured_output()
@patch("langflow.components.helpers.structured_output.get_chat_result")
def test_nested_output_schema(self, mock_get_chat_result):
class ChildModel(BaseModel):
child: str = "value"
class ParentModel(BaseModel):
parent: ChildModel = ChildModel()
mock_llm = MagicMock()
mock_llm.with_structured_output.return_value = mock_llm
mock_get_chat_result.return_value = ParentModel(parent=ChildModel(child="value"))
component = StructuredOutputComponent(
llm=mock_llm,
input_value="Test input",
schema_name="NestedSchema",
output_schema=[
{
"name": "parent",
"type": "dict",
"description": "Parent field",
"fields": [{"name": "child", "type": "str", "description": "Child field"}],
}
],
multiple=False,
)
result = component.build_structured_output()
assert isinstance(result, Data)
assert result.data == {"parent": {"child": "value"}}
@patch("langflow.components.helpers.structured_output.get_chat_result")
def test_large_input_value(self, mock_get_chat_result):
large_input = "Test input " * 1000
class MockBaseModel(BaseModel):
field: str = "value"
mock_get_chat_result.return_value = MockBaseModel(field="value")
component = StructuredOutputComponent(
llm=MagicMock(),
input_value=large_input,
schema_name="LargeInputSchema",
output_schema=[{"name": "field", "type": "str", "description": "A test field"}],
multiple=False,
)
result = component.build_structured_output()
assert isinstance(result, Data)
assert result.data == {"field": "value"}
mock_get_chat_result.assert_called_once()
def test_invalid_llm_config(self):
component = StructuredOutputComponent(
llm="invalid_llm", # Not a proper LLM instance
input_value="Test input",
schema_name="InvalidLLMSchema",
output_schema=[{"name": "field", "type": "str", "description": "A test field"}],
multiple=False,
)
with pytest.raises(TypeError, match="Language model does not support structured output."):
component.build_structured_output()

View file

@ -0,0 +1,160 @@
# Generated by qodo Gen
from typing import Any
import pytest
from pydantic import BaseModel
from pydantic_core import PydanticUndefined
from langflow.helpers.base_model import build_model_from_schema
class TestBuildModelFromSchema:
# Successfully creates a Pydantic model from a valid schema
def test_create_model_from_valid_schema(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value", "description": "A string field"},
{"name": "field2", "type": "int", "default": 0, "description": "An integer field"},
{"name": "field3", "type": "bool", "default": False, "description": "A boolean field"},
]
model = build_model_from_schema(schema)
instance = model(field1="test", field2=123, field3=True)
assert instance.field1 == "test"
assert instance.field2 == 123
assert instance.field3 is True
# Handles empty schema gracefully without errors
def test_handle_empty_schema(self):
schema = []
model = build_model_from_schema(schema)
instance = model()
assert instance is not None
# Ensure the model created from schema has the expected attributes by checking on an instance
def test_handles_multiple_fields_fixed_with_instance_check(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1"},
{"name": "field2", "type": "int", "default": 42},
{"name": "field3", "type": "list", "default": [1, 2, 3]},
{"name": "field4", "type": "dict", "default": {"key": "value"}},
]
model = build_model_from_schema(schema)
model_instance = model(field1="test", field2=123, field3=[1, 2, 3], field4={"key": "value"})
assert issubclass(model, BaseModel)
assert hasattr(model_instance, "field1")
assert hasattr(model_instance, "field2")
assert hasattr(model_instance, "field3")
assert hasattr(model_instance, "field4")
# Correctly accesses descriptions using the recommended fix
def test_correctly_accesses_descriptions_recommended_fix(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1", "description": "Description for field1"},
{"name": "field2", "type": "int", "default": 42, "description": "Description for field2"},
{"name": "field3", "type": "list", "default": [1, 2, 3], "description": "Description for field3"},
{"name": "field4", "type": "dict", "default": {"key": "value"}, "description": "Description for field4"},
]
model = build_model_from_schema(schema)
assert model.model_fields["field1"].description == "Description for field1"
assert model.model_fields["field2"].description == "Description for field2"
assert model.model_fields["field3"].description == "Description for field3"
assert model.model_fields["field4"].description == "Description for field4"
# Supports both single and multiple type annotations
def test_supports_single_and_multiple_type_annotations(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1", "description": "Description 1"},
{"name": "field2", "type": "list", "default": [1, 2, 3], "description": "Description 2", "multiple": True},
{"name": "field3", "type": "int", "default": 100, "description": "Description 3"},
]
model_type = build_model_from_schema(schema)
assert issubclass(model_type, BaseModel)
# Manages unknown field types by defaulting to Any
def test_manages_unknown_field_types(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1"},
{"name": "field2", "type": "unknown_type", "default": "default_value2"},
]
with pytest.raises(ValueError):
build_model_from_schema(schema)
# Confirms that the function raises a specific exception for invalid input
def test_raises_error_for_invalid_input_different_exception_with_specific_exception(self):
with pytest.raises(ValueError):
schema = [{"name": "field1", "type": "invalid_type", "default": "default_value"}]
build_model_from_schema(schema)
# Processes schemas with missing optional keys like description or multiple
def test_process_schema_missing_optional_keys_updated(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1"},
{"name": "field2", "type": "int", "default": 0, "description": "Field 2 description"},
{"name": "field3", "type": "list", "default": [], "multiple": True},
{"name": "field4", "type": "dict", "default": {}, "description": "Field 4 description", "multiple": True},
]
result_model = build_model_from_schema(schema)
assert result_model.__annotations__["field1"] == str # noqa: E721
assert result_model.model_fields["field1"].description == ""
assert result_model.__annotations__["field2"] == int # noqa: E721
assert result_model.model_fields["field2"].description == "Field 2 description"
assert result_model.__annotations__["field3"] == list[list[Any]]
assert result_model.model_fields["field3"].description == ""
assert result_model.__annotations__["field4"] == list[dict[str, Any]]
assert result_model.model_fields["field4"].description == "Field 4 description"
# Deals with schemas containing fields with None as default values
def test_schema_fields_with_none_default(self):
schema = [
{"name": "field1", "type": "str", "default": None, "description": "Field 1 description"},
{"name": "field2", "type": "int", "default": None, "description": "Field 2 description"},
{"name": "field3", "type": "list", "default": None, "description": "Field 3 description", "multiple": True},
]
model = build_model_from_schema(schema)
assert model.model_fields["field1"].default == PydanticUndefined # noqa: E711
assert model.model_fields["field2"].default == PydanticUndefined # noqa: E711
assert model.model_fields["field3"].default == PydanticUndefined # noqa: E711
# Checks for proper handling of nested list and dict types
def test_nested_list_and_dict_types_handling(self):
schema = [
{"name": "field1", "type": "list", "default": [], "description": "list field", "multiple": True},
{"name": "field2", "type": "dict", "default": {}, "description": "Dict field"},
]
model_type = build_model_from_schema(schema)
assert issubclass(model_type, BaseModel)
# Verifies that the function can handle large schemas efficiently
def test_handle_large_schemas_efficiently(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1", "description": "Description 1"},
{"name": "field2", "type": "int", "default": 100, "description": "Description 2"},
{"name": "field3", "type": "list", "default": [1, 2, 3], "description": "Description 3", "multiple": True},
{"name": "field4", "type": "dict", "default": {"key": "value"}, "description": "Description 4"},
]
model_type = build_model_from_schema(schema)
assert issubclass(model_type, BaseModel)
# Ensures that the function returns a valid Pydantic model class
def test_returns_valid_model_class(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1", "description": "Description for field1"},
{"name": "field2", "type": "int", "default": 42, "description": "Description for field2", "multiple": True},
]
model_class = build_model_from_schema(schema)
assert issubclass(model_class, BaseModel)
# Validates that the last occurrence of a duplicate field name defines the type in the schema
def test_no_duplicate_field_names_fixed_fixed(self):
schema = [
{"name": "field1", "type": "str", "default": "default_value1"},
{"name": "field2", "type": "int", "default": 0},
{"name": "field1", "type": "float", "default": 0.0}, # Duplicate field name
]
model = build_model_from_schema(schema)
assert model.__annotations__["field1"] == float # noqa: E721
assert model.__annotations__["field2"] == int # noqa: E721