Llama2-chat of 7B parameters in 16bit format with vllm LLM serving. Please use in accordance with Llama-2's license terms.
The Llama 2 chat model was fine-tuned for chat using a specific structure for prompts, relying on the following special tokens:
<s> - the beginning of the entire sequence.
<<SYS>> - the beginning of the system message.
<</SYS>> - the end of the system message.
[INST] - the beginning of some instructions
[/INST] - the end of the instructions
Llama-2 is a family of pre-trained and fine-tuned Large Language Models (LLMs) developed and released by the research team at Meta AI. Llama-2 builds upon the success of Llama-1 and incorporates several improvements to enhance its performance and safety. These models are designed to excel in complex reasoning tasks across various domains, making them suitable for research and commercial use. Llama-2 is trained on a large corpus of publicly available data and fine-tuned to align with human preferences, ensuring usability and safety. The models are optimized for dialogue use cases and are available in a range of parameter sizes, including 7B, 13B, and 70B.
Llama 2-Chat is a family of fine-tuned Llama-2 models that are optimized for dialogue use cases. These models are specifically designed to generate human-like responses to natural language input, making them suitable for chatbot and conversational AI applications.
The Llama2-7B-chat is a 7 billion parameter model and is pretrained on a large corpus of text that includes conversational data, such as chat logs and social media posts. This allows the models to learn the patterns and structures of natural language dialogue and to generate coherent and contextually appropriate responses to user input.
In addition to the standard Llama-2 models, the Llama 2-Chat models are also fine-tuned on a set of safety and helpfulness benchmarks to ensure that they generate appropriate and useful responses. This includes measures to prevent the models from generating offensive or harmful content and to ensure that they provide accurate and relevant information to users.
The context window length in the Llama-2 model is 4096 tokens. This is an expansion from the context window length of the 2048 tokens used in the previous version of the model, Llama-1. The longer context window enables the model to process more information, which is particularly useful for supporting longer histories in chat applications, various summarization tasks, and understanding longer documents.
The dataset used to train Llama-2 is a large-scale, diverse corpus of text that was collected from various sources, including web pages, books, and articles. The corpus contains over 2 trillion tokens, making it one of the largest datasets used to train a language model to date.
The corpus was also filtered to ensure that it was diverse and representative of different domains and genres of text. This was done to prevent the model from overfitting to a specific domain or genre of text and to ensure that it could generalize well to new and unseen text.
To further improve the quality of the dataset, the text was also cleaned and normalized to remove spelling errors, punctuation errors, and other inconsistencies. This was done to ensure that the model could learn from high-quality and consistent text and to prevent it from learning from noisy or incorrect text.
Overall, the dataset used to train Llama-2 is a high-quality and diverse corpus of text that was carefully curated and preprocessed to ensure that the model could learn from high-quality and representative text. The large size of the dataset also allowed the model to learn from a wide range of text and to capture the complex patterns and structures of natural language.
The evaluation of Llama-2 was conducted on three main aspects: pretraining, fine-tuning, and safety
- Pretraining evaluation, the model was trained on a held-out set of data, and its performance was compared to other models with different context window lengths. The results showed that the longer context window length of Llama-2 (4096 tokens) outperformed the shorter context window length of Llama-1 (2048 tokens) on long-context benchmarks. The pretraining evaluation of the Llama-2 models included the MMLU benchmark, which measures the ability of language models to detect and correct common errors in written English. The Llama-2 models outperformed many other open-source models on this benchmark.
- Fine-tuning evaluation, the model was fine-tuned on several datasets, including conversational datasets and question-answering datasets. The results showed that the fine-tuned Llama-2 model outperformed other state-of-the-art models on several benchmarks, including the Persona-Chat dataset and the CoQA dataset 2. The fine-tuning evaluation of the Llama-2 models included several standard NLP benchmarks, such as GLUE, SuperGLUE, SQuAD, and code generation benchmarks. On these benchmarks, the Llama-2 models achieved state-of-the-art performance.
- safety evaluation, the model was evaluated for toxicity, truthfulness, and bias. The results showed that the model had relatively low truthfulness percentages for pre-trained models, but this percentage increased after instruction fine-tuning. The model also showed low toxicity and bias scores, indicating that it is relatively safe to use in production. Overall, the evaluation results suggest that Llama-2 is a high-performing language model that is safe to use in production, with some limitations and potential risks that should be taken into account.
The Llama 2-Chat models have been evaluated on a human evaluation task that measures the helpfulness of the model's responses compared to other open-source and closed-source models. On this benchmark, the Llama 2-Chat models outperform many other models.
- Limited proficiency in non-English languages: Llama models, including Llama-2, were primarily trained on English-language data. While some proficiency has been observed in other languages, the model's performance in languages other than English remains fragile and should be used with caution.
- Risk of generating harmful or biased content: Llama models, like other large language models, were trained on publicly available online datasets, which may contain harmful, offensive, or biased content. While efforts have been made to mitigate these issues through fine-tuning, some issues may remain, particularly for languages other than English where publicly available datasets were not available.
- Model Type IDText To Text
- DescriptionLlama 2-Chat is a fine-tuned large language model(LLM) with optimized for dialogue use cases.
- Last UpdatedJan 04, 2024
- Use Case