langflow/docs/docs/components/model_specs.mdx

import Admonition from "@theme/Admonition";

# Large Language Models (LLMs)

A Large Language Model (LLM) is a foundational component of Langflow. It provides a uniform interface for interacting with LLMs from various providers, including OpenAI, Cohere, and HuggingFace. Langflow extensively uses LLMs across its chains and agents, employing them to generate text based on specific prompts or inputs.

---

## Anthropic

This is a wrapper for Anthropic's large language models. Learn more at [Anthropic](https://www.anthropic.com).

- **anthropic_api_key:** This key authenticates and authorizes access to the Anthropic API.
- **anthropic_api_url:** This URL connects to the Anthropic API.
- **temperature:** This parameter adjusts the randomness level in text generation. Set this to a non-negative number.

---

## ChatAnthropic

This is a wrapper for Anthropic's large language model designed for chat-based interactions. Learn more at [Anthropic](https://www.anthropic.com).

- **anthropic_api_key:** This key authenticates and authorizes access to the Anthropic API.
- **anthropic_api_url:** This URL connects to the Anthropic API.
- **temperature:** This parameter adjusts the randomness level in text generation. Set this to a non-negative number.

---

## CTransformers

`CTransformers` provides access to Transformer models implemented in C/C++ using the [GGML](https://github.com/ggerganov/ggml) library.

<Admonition type="info">
  Ensure the `ctransformers` Python package is installed. Discover more about
  installation, supported models, and usage
  [here](https://github.com/marella/ctransformers).
</Admonition>

- **config:** This configuration is for the Transformer models. Check the default settings and possible configurations at [config](https://github.com/marella/ctransformers#config).

```json
{
  "top_k": 40,
  "top_p": 0.95,
  "temperature": 0.8,
  "repetition_penalty": 1.1,
  "last_n_tokens": 64,
  "seed": -1,
  "max_new_tokens": 256,
  "stop": null,
  "stream": false,
  "reset": true,
  "batch_size": 8,
  "threads": -1,
  "context_length": -1,
  "gpu_layers": 0
}
```

- **model**: The file path, directory, or Hugging Face Hub model repository name.
- **model_file**: The specific model file name within the repository or directory.
- **model_type**: The type of transformer model used. For further information, visit [ctransformers](https://github.com/marella/ctransformers).

## ChatOpenAI Component

This component interfaces with [OpenAI's](https://openai.com) large language models, supporting a variety of tasks such as chatbots, generative question-answering, and summarization.

- **max_tokens**: The maximum number of tokens to generate for each completion. Set to `-1` to generate as many tokens as possible, based on the model's context size. The default is `256`.
- **model_kwargs**: A dictionary containing any additional model parameters for undefined calls.
- **model_name**: Specifies the OpenAI chat model in use.
- **openai_api_base**: The base URL for accessing the OpenAI API.
- **openai_api_key**: The API key required for authentication with the OpenAI API.
- **temperature**: Adjusts the randomness level of the text generation. This should be a non-negative number, defaulting to `0.7`.

## Cohere Component

A wrapper for accessing [Cohere's](https://cohere.com) large language models.

- **cohere_api_key**: The API key needed for Cohere service authentication.
- **max_tokens**: The limit on the number of tokens to generate per request, defaulting to `256`.
- **temperature**: Adjusts the randomness level in text generations. This should be a non-negative number, defaulting to `0.75`.

## HuggingFaceHub Component

A component facilitating access to models hosted on the [HuggingFace Hub](https://www.huggingface.co/models).

- **huggingfacehub_api_token**: The token required for API authentication.
- **model_kwargs**: Parameters passed to the model.
- **repo_id**: Specifies the model repository, defaulting to `gpt2`.
- **task**: The specific task to execute with the model, returning either `generated_text` or `summary_text`.

## LlamaCpp Component

This component provides access to `llama.cpp` models, ensuring high performance and flexibility.

- **echo**: Whether to echo the input prompt, defaulting to `False`.
- **f16_kv**: Indicates if half-precision should be used for the key/value cache, defaulting to `True`.
- **last_n_tokens_size**: The lookback size for applying repeat penalties, defaulting to `64`.
- **logits_all**: Whether to return logits for all tokens or just the last one, defaulting to `False`.
- **logprobs**: The number of log probabilities to return. If set to None, no probabilities are returned.
- **lora_base**: The path to the base Llama LoRA model.
- **lora_path**: The specific path to the Llama LoRA model. If set to None, no LoRA model is loaded.
- **max_tokens**: The maximum number of tokens to generate in one session, defaulting to `256`.
- **model_path**: The file path to the Llama model.
- **n_batch**: The number of tokens processed in parallel, defaulting to `8`.
- **n_ctx**: The context window size for tokens, defaulting to `512`.
- **repeat_penalty**: The penalty applied to repeated tokens, defaulting to `1.1`.
- **seed**: The seed for random number generation. If set to `-1`, a random seed is used.
- **stop**: A list of stop strings that terminate generation when encountered.
- **streaming**: Indicates whether to stream results token by token, defaulting to `True`.
- **suffix**: A suffix appended to generated text. If None, no suffix is appended.
- **tags**: Tags added to the execution trace for monitoring.
- **temperature**: The sampling temperature, defaulting to `0.8`.
- **top_k**: The top-k sampling setting, defaulting to `40`.
- **top_p**: The cumulative probability threshold for top-p sampling, defaulting to `0.95`.
- **use_mlock**: Forces the system to retain the model in RAM, defaulting to `False`.
- **use_mmap**: Indicates whether to maintain the model loaded in RAM, defaulting to `True`.
- **verbose**: Controls the verbosity of output details. When enabled, it provides insights into internal states to aid debugging and understanding, defaulting to `False`.
- **vocab_only**: Loads only the vocabulary without model weights, defaulting to `False`.

## VertexAI Component

This component integrates with [Google Vertex AI](https://cloud.google.com/vertex-ai) large language models to enhance AI capabilities.

- **credentials**: Custom

credentials used for API interactions.

- **location**: The default location for API calls, defaulting to `us-central1`.
- **max_output_tokens**: Limits the output tokens per prompt, defaulting to `128`.
- **model_name**: The name of the Vertex AI model in use, defaulting to `text-bison`.
- **project**: The default Google Cloud Platform project for API calls.
- **request_parallelism**: The level of request parallelism for VertexAI model interactions, defaulting to `5`.
- **temperature**: Adjusts the randomness level in text generations, defaulting to `0`.
- **top_k**: The setting for selecting the top-k tokens for outputs.
- **top_p**: The threshold for summing probabilities of the most likely tokens, defaulting to `0.95`.
- **tuned_model_name**: Specifies a tuned model name, which overrides the default model name if provided.
- **verbose**: Controls the output verbosity to assist in debugging and understanding the operational details, defaulting to `False`.

---