Phi-2 is a 2.7 billion-parameter LLM achieving state-of-the-art performance in QA, chat, and code tasks, with a focus on high-quality training data, demonstrating improved behavior in toxicity and bias.
Phi-2 is a language model developed to achieve high performance on various natural language processing tasks while maintaining a smaller scale compared to larger models.
Phi-2 Model
Key Insights Behind Phi-2
Phi-2 breaks conventional language model scaling laws by focusing on high-quality training data and employing innovative knowledge transfer techniques. The training data includes "textbook-quality" synthetic datasets for common sense reasoning and general knowledge. The model, with 2.7 billion parameters, outperforms its predecessor, Phi-1.5 (1.3 billion parameters), on various benchmarks.
Training Details
Transformer-based model with a next-word prediction objective
Trained on 1.4T tokens from synthetic and web datasets
No reinforcement learning from human feedback (RLHF) or instruct fine-tuning on Phi-2, yet it exhibits improved behaviour in terms of toxicity and bias compared to existing open-source models.
Run Phi-2 with an API
Running the API with Clarifai's Python SDK
You can run the Phi-2 Model API using Clarifai’s Python SDK.
Export your PAT as an environment variable. Then, import and initialize the API Client.
from clarifai.client.model import Model
prompt ="What’s the futuree of AI?"inference_params =dict(temperature=0.7, max_tokens=200, top_k =50, top_p=0.95)# Model Predictmodel_prediction = Model("https://clarifai.com/microsoft/text-generation/models/phi-2").predict_by_bytes(prompt.encode(), input_type="text", inference_params=inference_params)print(model_prediction.outputs[0].data.text.raw)
You can also run Phi-2 API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.
Aliases: Phi-2, phi-2, phi2
Prompt Format
QA Format:
You can provide the prompt as a standalone question as follows:
Write a detailed analogy between mathematics and a lighthouse.
where the model generates the text after “.”. To encourage the model to write more concise answers, you can also try the following QA format.
Instruct: Write a detailed analogy between mathematics and a lighthouse.
Output:
where the model generates the text after "Output:".
Chat Format:
Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: ...
where the model generates the text after the first "Bob:".
Use Cases
Phi-2 is best suited for QA, chat, and code prompts. It is intended for tasks where the model-generated text/code serves as a starting point rather than a definitive solution.
Evaluation
Phi-2 excels in academic benchmarks, surpassing larger models like Mistral and Llama-2 in various categories. It also matches or outperforms Google Gemini Nano 2, despite being smaller in size. Evaluation covers Big Bench Hard, commonsense reasoning, language understanding, math, and coding.
Benchmark Performance (Averaged)
Model
Size
BBH
Commonsense
Language Understanding
Math
Coding
Llama-2
7B
40.0
62.2
56.7
16.5
21.0
Mistral
7B
57.2
66.4
63.7
46.4
39.4
Phi-2
2.7B
59.2
68.8
62.0
61.1
53.7
Phi-2 vs. Gemini Nano 2
Model
Size
BBH
BoolQ
MBPP
MMLU
Gemini Nano 2
3.2B
42.4
79.3
27.2
55.8
Phi-2
2.7B
59.3
83.3
59.1
56.7
Dataset
Phi-2 was trained on a mixture of synthetic and web datasets totaling 1.4 trillion tokens. The training corpus includes carefully selected data to enhance common sense reasoning and general knowledge.
Advantages
Achieves high performance with 2.7 billion parameters
Faster training convergence through knowledge transfer from Phi-1.5
Demonstrates better behavior in terms of toxicity and bias compared to existing open-source models
Limitations
May generate inaccurate code and facts
Limited scope for code outside Python and common packages
Unreliable responses to complex or nuanced instructions
Primarily designed for standard English, may struggle with informal language or other languages
Potential societal biases despite efforts in training data safety
Possibility of producing harmful content if explicitly prompted
ID
Model Type ID
Text To Text
Input Type
text
Output Type
text
Description
Phi-2 is a 2.7 billion-parameter LLM achieving state-of-the-art performance in QA, chat, and code tasks, with a focus on high-quality training data, demonstrating improved behavior in toxicity and bias.