Llama 3.3 (70B) is a multilingual instruction-tuned LLM optimized for dialogue, trained on 15T+ tokens, supporting 8 languages, and incorporating strong safety measures
Llama-3.3-70B-Instruct is a multilingual large language model (LLM) developed by Meta. It is a pretrained and instruction-tuned generative model optimized for multilingual dialogue and text generation. The model achieves state-of-the-art performance across multiple industry benchmarks, surpassing many open-source and proprietary chat models.
Llama-3.3-70B-Instruct Model Details
Model Developer: Meta
Architecture: Auto-regressive transformer model with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF)
from clarifai.client.model import Model
prompt ="what's the future of AI?"inference_params =dict(temperature=0.7, max_tokens=200, top_k =50, top_p=0.95)# Model Predictmodel_prediction = Model("https://clarifai.com/meta/Llama-3/models/llama-3_3-70b-instruct").predict_by_bytes(prompt.encode(), input_type="text", inference_params=inference_params)print(model_prediction.outputs[0].data.text.raw)
Use Cases
Llama-3.3-70B-Instruct is designed for both commercial and research applications. Some primary use cases include:
Conversational AI: Enhanced chatbot interactions across multiple languages
Content Generation: Generating high-quality text for various domains
Code Generation: Supporting developers in writing and debugging code
Multilingual Assistance: Providing language-specific responses for different regions
Synthetic Data Generation: Facilitating model distillation and fine-tuning
Knowledge-based Question Answering: Answering domain-specific and general knowledge questions
Out-of-Scope Uses
Applications violating laws, trade compliance regulations, or the Acceptable Use Policy
Deployment in unsupported languages without additional fine-tuning
Evaluation and Benchmark Results
Llama-3.3-70B-Instruct demonstrates significant improvements in key benchmarks:
Category
Benchmark
# Shots
Metric
Llama 3.1 8B Instruct
Llama 3.1 70B Instruct
Llama-3.3 70B Instruct
Llama 3.1 405B Instruct
General
MMLU (CoT)
0
macro_avg/acc
73.0
86.0
86.0
88.6
MMLU Pro (CoT)
5
macro_avg/acc
48.3
66.4
68.9
73.3
Steerability
IFEval
-
-
80.4
87.5
92.1
88.6
Reasoning
GPQA Diamond (CoT)
0
acc
31.8
48.0
50.5
49.0
Code
HumanEval
0
pass@1
72.6
80.5
88.4
89.0
MBPP EvalPlus (base)
0
pass@1
72.8
86.0
87.6
88.6
Math
MATH (CoT)
0
sympy_intersection_score
51.9
68.0
77.0
73.8
Tool Use
BFCL v2
0
overall_ast_summary/macro_avg/valid
65.4
77.5
77.3
81.1
Multilingual
MGSM
0
em
68.9
86.9
91.1
91.6
Dataset
Llama-3.3-70B-Instruct was trained on a new mix of publicly available online data.
Pretraining Data: ~15 trillion tokens from publicly available sources
Fine-tuning Data: Over 25 million synthetically generated instruction examples
Data Freshness: Training data cutoff in December 2023
Advantages
State-of-the-art performance on multilingual benchmarks
Extended context length of 128k tokens for improved long-form reasoning
Advanced instruction tuning using RLHF for better alignment with human intent
Improved multilingual capabilities in 8 languages
Optimized for dialogue and task-specific prompting
Efficient inference with GQA for scalable deployments
Limitations
Limited to 8 officially supported languages, though it may generate text in other languages with varying quality
Potential for hallucination, especially on topics beyond its training data
Not designed for real-time updating, as it is a static model with a fixed knowledge cutoff
Requires external safeguards when integrated into production systems to mitigate risks
Biases in training data may lead to unintended outputs, requiring careful evaluation before deployment
ID
Model Type ID
Text To Text
Input Type
text
Output Type
text
Description
Llama 3.3 (70B) is a multilingual instruction-tuned LLM optimized for dialogue, trained on 15T+ tokens, supporting 8 languages, and incorporating strong safety measures