Merge branch 'main' into release

2023-07-20 10:11:04 -03:00 · 2023-07-20 10:11:04 -03:00 · 9fd026aa54
commit 9fd026aa54
parent 558e0d75ab fbb2b32cbb
5 changed files with 317 additions and 5 deletions
--- a/docs/docs/components/agents.mdx
+++ b/docs/docs/components/agents.mdx
@ -1,2 +1,82 @@
-# Agents 
-(coming soon)
+# Agents
+
+Agents are components that use reasoning to make decisions and take actions, designed to autonomously perform tasks or provide services with some degree of “freedom” (or agency). They combine the power of LLM chaining processes with access to external tools such as APIs to interact with applications and accomplish tasks.
+
+---
+
+### AgentInitializer
+
+The `AgentInitializer` component is a quick way to construct a zero-shot agent from a language model (LLM) and tools.
+
+**Params**
+
+- **LLM:** Language Model to use in the `AgentInitializer`.
+- **Memory:** Used to add memory functionality to an agent. It allows the agent to store and retrieve information from previous conversations.
+- **Tools:** Tools that the agent will have access to.
+- **Agent:** The type of agent to be instantiated. Current supported: `zero-shot-react-description`, `react-docstore`, `self-ask-with-search,conversational-react-description` and `openai-functions`.
+
+---
+
+### CSVAgent
+
+A `CSVAgent` is an agent that is designed to interact with CSV (Comma-Separated Values) files. CSV files are a common format for storing tabular data, where each row represents a record and each column represents a field. The CSV agent can perform various tasks, such as reading and writing CSV files, processing the data, and generating tables. It can extract information from the CSV file, manipulate the data, and perform operations like filtering, sorting, and aggregating.
+
+**Params**
+
+- **LLM:** Language Model to use in the `CSVAgent`.
+- **path:** The file path to the CSV data.
+
+---
+
+### JSONAgent
+
+The `JSONAgent` deals with JSON (JavaScript Object Notation) data. Similar to the CSVAgent, it works with a language model (LLM) and a toolkit designed for JSON manipulation. This agent can iteratively explore a JSON blob to find the information needed to answer the user's question. It can list keys, get values, and navigate through the structure of the JSON object.
+
+**Params**
+
+- **LLM:** Language Model to use in the `JSONAgent`.
+- **Toolkit:** Toolkit that the agent will have access to.
+
+---
+
+### SQLAgent
+
+A `SQLAgent` is an agent that is designed to interact with SQL databases. It is capable of performing various tasks, such as querying the database, retrieving data, and executing SQL statements. The agent can provide information about the structure of the database, including the tables and their schemas. It can also perform operations like inserting, updating, and deleting data in the database. The SQL agent is a helpful tool for managing and working with SQL databases efficiently.
+
+**Params**
+
+- **LLM:** Language Model to use in the `SQLAgent`.
+- **database_uri:** A string representing the connection URI for the SQL database.
+
+---
+
+### VectorStoreAgent
+
+The `VectorStoreAgent` is designed to work with a vector store – a data structure used for storing and querying vector-based representations of data. The `VectorStoreAgent` can query the vector store to find relevant information based on user inputs.
+
+**Params**
+
+- **LLM:** Language Model to use in the `VectorStoreAgent`.
+- **Vector Store Info:** `VectorStoreInfo` to use in the `VectorStoreAgent`.
+
+---
+
+### VectorStoreRouterAgent
+
+The `VectorStoreRouterAgent` is a custom agent that takes a vector store router as input. It is typically used when there’s a need to retrieve information from multiple vector stores. These can be connected through a `VectorStoreRouterToolkit` and sent over to the `VectorStoreRouterAgent`. An agent configured with multiple vector stores can route queries to the appropriate store based on the context.
+
+**Params**
+
+- **LLM:** Language Model to use in the `VectorStoreRouterAgent`.
+- **Vector Store Router Toolkit:** `VectorStoreRouterToolkit` to use in the `VectorStoreRouterAgent`.
+
+---
+
+### ZeroShotAgent
+
+The `ZeroShotAgent` is an agent that uses the ReAct framework to determine which tool to use based solely on the tool's description. It can be configured with any number of tools and requires a description for each tool. The agent is designed to be the most general-purpose action agent. It uses an `LLMChain` to determine which actions to take and in what order.
+
+**Params**
+
+- **Allowed Tools:** Tools that the agent will have access to.
+- **LLM Chain:** LLM Chain to be used by the agent.
--- a/docs/docs/components/chains.mdx
+++ b/docs/docs/components/chains.mdx
@ -1,2 +1,137 @@
+import ThemedImage from "@theme/ThemedImage";
+import useBaseUrl from "@docusaurus/useBaseUrl";
+import ZoomableImage from "/src/theme/ZoomableImage.js";
+
 # Chains
-(coming soon)
+
+Chains, in the context of language models, refer to a series of calls made to a language model. It allows for the output of one call to be used as the input for another call. Different types of chains allow for different levels of complexity. Chains are useful for creating pipelines and executing specific scenarios.
+
+---
+
+### CombineDocsChain
+
+The `CombineDocsChain` incorporates methods to combine or aggregate loaded documents for question-answering functionality.
+
+:::info
+
+Works as a proxy of LangChain’s [documents](https://python.langchain.com/docs/modules/chains/document/) chains generated by the `load_qa_chain` function.
+
+:::
+
+**Params**
+
+- **LLM:** Language Model to use in the chain.
+- **chain_type:** The chain type to be used. Each one of them applies a different “combination strategy”.
+   - **stuff**: The stuff [documents](https://python.langchain.com/docs/modules/chains/document/stuff) chain (“stuff" as in "to stuff" or "to fill") is the most straightforward of *the* document chains. It takes a list of documents, inserts them all into a prompt, and passes that prompt to an LLM. This chain is well-suited for applications where documents are small and only a few are passed in for most calls.
+   - **map_reduce**: The map-reduce [documents](https://python.langchain.com/docs/modules/chains/document/map_reduce) chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combined documents chain to get a single output (the Reduce step). It can optionally first compress or collapse the mapped documents to make sure that they fit in the combined documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.
+   - **map_rerank**: The map re-rank [documents](https://python.langchain.com/docs/modules/chains/document/map_rerank) chain runs an initial prompt on each document that not only tries to complete a task but also gives a score for how certain it is in its answer. The highest-scoring response is returned.
+   - **refine**: The refine [documents](https://python.langchain.com/docs/modules/chains/document/refine) chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.
+
+      Since the Refine chain only passes a single document to the LLM at a time, it is well-suited for tasks that require analyzing more documents than can fit in the model's context. The obvious tradeoff is that this chain will make far more LLM calls than, for example, the Stuff documents chain. There are also certain tasks that are difficult to accomplish iteratively. For example, the Refine chain can perform poorly when documents frequently cross-reference one another or when a task requires detailed information from many documents.
+
+---
+
+### ConversationChain
+
+The `ConversationChain` is a straightforward chain for interactive conversations with a language model, making it ideal for chatbots or virtual assistants. It allows for dynamic conversations, question-answering, and complex dialogues.
+
+**Params**
+
+- **LLM:** Language Model to use in the chain.
+- **Memory:** Default memory store.
+- **input_key:** Used to specify the key under which the user input will be stored in the conversation memory. It allows you to provide the user's input to the chain for processing and generating a response.
+- **output_key:** Used to specify the key under which the generated response will be stored in the conversation memory. It allows you to retrieve the response using the specified key.
+- **verbose:**  This parameter is used to control the level of detail in the output of the chain. When set to True, it will print out some internal states of the chain while it is being run, which can be helpful for debugging and understanding the chain's behavior. If set to False, it will suppress the verbose output — defaults to `False`.
+
+---
+
+### ConversationalRetrievalChain
+
+The `ConversationalRetrievalChain` extracts information and provides answers by combining document search and question-answering abilities.
+
+:::info
+
+A retriever is a component that finds documents based on a query. It doesn't store the documents themselves, but it returns the ones that match the query.
+
+:::
+
+**Params**
+
+- **LLM:** Language Model to use in the chain.
+- **Memory:** Default memory store.
+- **Retriever:** The retriever used to fetch relevant documents.
+- **chain_type:** The chain type to be used. Each one of them applies a different “combination strategy”.
+   - **stuff**: The stuff [documents](https://python.langchain.com/docs/modules/chains/document/stuff) chain (“stuff" as in "to stuff" or "to fill") is the most straightforward of *the* document chains. It takes a list of documents, inserts them all into a prompt, and passes that prompt to an LLM. This chain is well-suited for applications where documents are small and only a few are passed in for most calls.
+   - **map_reduce**: The map-reduce [documents](https://python.langchain.com/docs/modules/chains/document/map_reduce) chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combined documents chain to get a single output (the Reduce step). It can optionally first compress or collapse the mapped documents to make sure that they fit in the combined documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.
+   - **map_rerank**: The map re-rank [documents](https://python.langchain.com/docs/modules/chains/document/map_rerank) chain runs an initial prompt on each document that not only tries to complete a task but also gives a score for how certain it is in its answer. The highest-scoring response is returned.
+   - **refine**: The refine [documents](https://python.langchain.com/docs/modules/chains/document/refine) chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.
+
+      Since the Refine chain only passes a single document to the LLM at a time, it is well-suited for tasks that require analyzing more documents than can fit in the model's context. The obvious tradeoff is that this chain will make far more LLM calls than, for example, the Stuff documents chain. There are also certain tasks that are difficult to accomplish iteratively. For example, the Refine chain can perform poorly when documents frequently cross-reference one another or when a task requires detailed information from many documents.
+
+- **return_source_documents:** Used to specify whether or not to include the source documents that were used to answer the question in the output. When set to `True`, source documents will be included in the output along with the generated answer. This can be useful for providing additional context or references to the user — defaults to `True`.
+- **verbose:** Whether or not to run in verbose mode. In verbose mode, intermediate logs will be printed to the console — defaults to `False`.
+
+---
+
+### LLMChain
+
+The `LLMChain` is a straightforward chain that adds functionality around language models. It combines a prompt template with a language model. To use it, create input variables to format the prompt template. The formatted prompt is then sent to the language model, and the generated output is returned as the result of the `LLMChain`.
+
+**Params**
+
+- **LLM:** Language Model to use in the chain.
+- **Memory:** Default memory store.
+- **Prompt**: Prompt template object to use in the chain.
+- **output_key:** This parameter is used to specify which key in the LLM output dictionary should be returned as the final output. By default, the `LLMChain` returns both the input and output key values — defaults to `text`.
+- **verbose:** Whether or not to run in verbose mode. In verbose mode, intermediate logs will be printed to the console — defaults to `False`.
+
+---
+
+### LLMMathChain
+
+The `LLMMathChain` combines a language model (LLM) and a math calculation component. It allows the user to input math problems and get the corresponding solutions.
+
+The `LLMMathChain` works by using the language model with an `LLMChain` to understand the input math problem and generate a math expression. It then passes this expression to the math component, which evaluates it and returns the result.
+
+**Params**
+
+- **LLM:** Language Model to use in the chain.
+- **LLMChain:** LLM Chain to use in the chain.
+- **Memory:** Default memory store.
+- **input_key:** Used to specify the input value for the mathematical calculation. It allows you to provide the specific values or variables that you want to use in the calculation — defaults to `question`.
+- **output_key:** Used to specify the key under which the output of the mathematical calculation will be stored. It allows you to retrieve the result of the calculation using the specified key — defaults to `answer`.
+- **verbose:** Whether or not to run in verbose mode. In verbose mode, intermediate logs will be printed to the console — defaults to `False`.
+
+---
+
+### RetrievalQA
+
+`RetrievalQA` is a chain used to find relevant documents or information to answer a given query. The retriever is responsible for returning the relevant documents based on the query, and the QA component then extracts the answer from those documents. The retrieval QA system combines the capabilities of both the retriever and the QA component to provide accurate and relevant answers to user queries.
+
+:::info
+
+A retriever is a component that finds documents based on a query. It doesn't store the documents themselves, but it returns the ones that match the query.
+
+:::
+
+**Params**
+
+- **Combine Documents Chain:** Chain to use to combine the documents.
+- **Memory:** Default memory store.
+- **Retriever:**  The retriever used to fetch relevant documents.
+- **input_key:** This parameter is used to specify the key in the input data that contains the question. It is used to retrieve the question from the input data and pass it to the question-answering model for generating the answer — defaults to `query`.
+- **output_key:** This parameter is used to specify the key in the output data where the generated answer will be stored. It is used to retrieve the answer from the output data after the question-answering model has generated it — defaults to `result`.
+- **return_source_documents:** Used to specify whether or not to include the source documents that were used to answer the question in the output. When set to `True`, source documents will be included in the output along with the generated answer. This can be useful for providing additional context or references to the user — defaults to `True`.
+- **verbose:** Whether or not to run in verbose mode. In verbose mode, intermediate logs will be printed to the console — defaults to `False`.
+
+---
+
+### SQLDatabaseChain
+
+The `SQLDatabaseChain` finds answers to questions using a SQL database. It works by using the language model to understand the SQL query and generate the corresponding SQL code. It then passes the SQL code to the SQL database component, which executes the query on the database and returns the result.
+
+**Params**
+
+- **Db:** SQL Database to connect to.
+- **LLM:** Language Model to use in the chain.
+- **Prompt:** Prompt template to translate natural language to SQL.
--- a/docs/docs/components/embeddings.mdx
+++ b/docs/docs/components/embeddings.mdx
@ -1,2 +1,67 @@
 # Embeddings
-(coming soon)
+
+Embeddings are vector representations of text that capture the semantic meaning of the text. They are created using text embedding models and allow us to think about the text in a vector space, enabling us to perform tasks like semantic search, where we look for pieces of text that are most similar in the vector space.
+
+---
+
+### CohereEmbeddings
+
+Used to load [Cohere’s](https://cohere.com/) embedding models.
+
+**Params**
+
+- **cohere_api_key:** Holds the API key required to authenticate with the Cohere service.
+
+- **model:** The language model used for embedding text documents and performing queries —defaults to `embed-english-v2.0`.
+
+- **truncate:** Used to specify whether or not to truncate the input text. Truncation is useful when dealing with long texts that exceed the model's maximum input length. By truncating the text, the user can ensure that it fits within the model's constraints.
+
+---
+
+### HuggingFaceEmbeddings
+
+Used to load [HuggingFace’s](https://huggingface.co) embedding models.
+
+**Params**
+
+- **cache_folder:** Used to specify the folder where the embeddings will be cached. When embeddings are computed for a text, they can be stored in the cache folder so that they can be reused later without the need to recompute them. This can improve the performance of the application by avoiding redundant computations.
+
+- **encode_kwargs:** Used to pass additional keyword arguments to the encoding method of the underlying HuggingFace model. These keyword arguments can be used to customize the encoding process, such as specifying the maximum length of the input sequence or enabling truncation or padding.
+
+- **model_kwargs:** Used to customize the behavior of the model, such as specifying the model architecture, the tokenizer, or any other model-specific configuration options. By using `model_kwargs`, the user can configure the HuggingFace model according to specific needs and preferences.
+
+- **model_name:** Used to specify the name or identifier of the HuggingFace model that will be used for generating embeddings. It allows users to choose a specific pre-trained model from the Hugging Face model hub — defaults to `sentence-transformers/all-mpnet-base-v2`.
+
+---
+
+### OpenAIEmbeddings
+
+Used to load [OpenAI’s](https://openai.com/) embedding models.
+
+**Params**
+
+- **chunk_size:** Determines the maximum size of each chunk of text that is processed for embedding. If any of the incoming text chunks exceeds `chunk_size` characters, it will be split into multiple chunks of size `chunk_size` or less before being embedded — defaults to `1000`.
+
+- **deployment:** Used to specify the deployment name or identifier of the text embedding model. It allows the user to choose a specific deployment of the model to use for embedding. When the deployment is provided, this can be useful when the user has multiple deployments of the same model with different configurations or versions — defaults to `text-embedding-ada-002`.
+
+- **embedding_ctx_length:** This parameter determines the maximum context length for the text embedding model. It specifies the number of tokens that the model considers when generating embeddings for a piece of text — defaults to `8191` (this means that the model will consider up to 8191 tokens when generating embeddings).
+
+- **max_retries:** Determines the maximum number of times to retry a request if the model provider returns an error from their API — defaults to `6`.
+
+- **model:** Defines which pre-trained text embedding model to use — defaults to `text-embedding-ada-002`.
+
+- **openai_api_base:** Refers to the base URL for the Azure OpenAI resource. It is used to configure the API to connect to the Azure OpenAI service. The base URL can be found in the Azure portal under the user Azure OpenAI resource.
+
+- **openai_api_key:** Is used to authenticate and authorize access to the OpenAI service.
+
+- **openai_api_type:** Is used to specify the type of OpenAI API being used, either the regular OpenAI API or the Azure OpenAI API. This parameter allows the `OpenAIEmbeddings` class to connect to the appropriate API service.
+
+- **openai_api_version:** Is used to specify the version of the OpenAI API being used. This parameter allows the `OpenAIEmbeddings` class to connect to the appropriate version of the OpenAI API service.
+
+- **openai_organization:** Is used to specify the organization associated with the OpenAI API key. If not provided, the default organization associated with the API key will be used.
+
+- **openai_proxy:** Proxy enables better budgeting and cost management for making OpenAI API calls, including more transparency into pricing.
+
+- **request_timeout:** Used to specify the maximum amount of time, in milliseconds, to wait for a response from the OpenAI API when generating embeddings for a given text.
+
+- **tiktoken_model_name:** Used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name.
--- a/docs/docs/components/prompts.mdx
+++ b/docs/docs/components/prompts.mdx
@ -1,2 +1,15 @@
 # Prompts
-(coming soon)
+
+A prompt refers to the input given to a language model. It is constructed from multiple components and can be parametrized using prompt templates. A prompt template is a reproducible way to generate prompts and allow for easy customization through input variables.
+
+---
+
+### PromptTemplate
+
+The `PromptTemplate` component allows users to create prompts and define variables that provide control over instructing the model. The template can take in a set of variables from the end user and generates the prompt once the conversation is initiated.
+
+:::info
+Once a variable is defined in the prompt template, it becomes a component input of its own. Check out [Prompt Customization](../guidelines/prompt-customization.mdx) to learn more.
+:::
+
+- **template:** Template used to format an individual request.
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -23,6 +23,25 @@ module.exports = {
        "guidelines/chat-interface",
      ],
    },
+    {
+      type: "category",
+      label: "Component Reference",
+      collapsed: false,
+      items: [
+        "components/agents",
+        "components/chains",
+        "components/embeddings",
+        "components/llms",
+        "components/loaders",
+        "components/memories",
+        "components/prompts",
+        "components/text-splitters",
+        "components/toolkits",
+        "components/tools",
+        "components/vector-stores",
+        "components/wrappers",
+      ],
+    },
    {
      type: "category",
      label: "Step-by-Step Guides",