codestral-22b-instruct model | Clarifai

codestral-22b-instruct

Codestral-22B-v0.1 is an advanced generative LLM designed for versatile and efficient code generation across 80+ programming languages

Input

Prompt:

Press Ctrl + Enter to submit

Max Tokens

The maximum number of tokens to generate. Shorter token lengths will provide faster performance.

Temperature

A decimal number that determines the degree of randomness in the response

Top P

An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass.

System Prompt

A system prompt sets the behavior and context for an AI assistant in a conversation, such as modifying its personality.

Output

Submit a prompt for a response.

Notes

Note

Model Wrapped from MistralAI API
This is instruct endpoint of Codestral which answering questions about code snippets, and generating code based on specific instructions.
License: Codestral-22B-v0.1 is released under the MNPL-0.1 license.

Introduction

Codestral is code model, an open-weight generative AI model specifically designed for code generation tasks. This advanced model assists developers in writing and interacting with code through a shared instruction endpoint. Fluent in both code and English, Codestral is an invaluable tool for developing sophisticated AI applications tailored for software developers.

Codestral LLM

Codestral-22B-v0.1, the current version of the model, is trained on a diverse dataset encompassing over 80 programming languages. This includes popular languages such as Python, Java, C, C++, JavaScript, and Bash, along with more niche languages like Swift and Fortran. The model supports two main query modes:

Instruction-based: Answering questions about code snippets, writing documentation, explaining code, and generating code based on specific instructions.

Run Codestral Instruct with an API

Running the API with Clarifai's Python SDK

You can run the Codestral-Instruct Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

from clarifai.client.model import Model

prompt = "What’s the future of AI?"

inference_params = dict(temperature=0.2, max_tokens=100, top_p=0.95, top_k=40)

# Model Predict
model_prediction = Model("https://clarifai.com/mistralai/completion/models/codestral-22b-instruct").predict_by_bytes(prompt.encode(), input_type="text", inference_params=inference_params)

print(model_prediction.outputs[0].data.text.raw)

You can also run Codestral-Instruct API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Aliases: Codestral, codestral

Use Cases

Codestral is designed to save developers time and effort by:

Completing coding functions: Automating the writing of functions based on partial inputs.
Writing tests: Generating test cases for existing code.
Code Documentation: Generates comprehensive documentation for codebases, enhancing readability and maintainability.
Code Refactoring: Provides suggestions for optimizing and refactoring code, reducing technical debt.

Evaluation and Benchmark Results

Performance Highlights

Codestral, as a 22-billion-parameter model, sets a new standard in the performance/latency space for code generation. It features a large context window of 32k tokens, outperforming other models with context windows of 4k, 8k, or 16k. The model is evaluated using several benchmarks:

HumanEval (pass@1): Assesses Python code generation accuracy.
MBPP (sanitised pass@1): Evaluates Python code generation accuracy.
CruxEval: Evaluates Python output prediction.
RepoBench EM: Measures long-range repository-level code completion capabilities.

Detailed Benchmarks

Context Window: Codestral features a context window of 32k tokens, outperforming competitors with context windows of 4k, 8k, or 16k tokens in long-range evaluations such as RepoBench.
SQL Benchmark: Using the Spider benchmark to assess SQL code generation performance.
Language Coverage: HumanEval pass@1 across six additional languages (C++, Bash, Java, PHP, Typescript, C#) with average performance metrics provided.

Dataset

Codestral is trained on a comprehensive dataset of over 80 programming languages. This diverse training set ensures the model's proficiency in a wide range of coding environments and projects, making it a versatile tool for developers.

Advantages

Broad Language Support: With fluency in over 80 programming languages, Codestral is adaptable to various coding tasks.
Time and Effort Savings: Automates code completion, test generation, and bug reduction, enhancing developer productivity.
High Performance: Superior performance in benchmarks and a large context window make Codestral a leader in code generation.

Limitations

Niche Language Performance: While proficient in many languages, performance in less common languages may not match that of more popular ones.
Potential for Errors: While Codestral reduces the risk of errors, it is not infallible and may occasionally produce incorrect or suboptimal code, necessitating human oversight.

ID
Model Type ID
Text To Text
Input Type
text
Output Type
text
Description
Codestral-22B-v0.1 is an advanced generative LLM designed for versatile and efficient code generation across 80+ programming languages
Last Updated
Oct 17, 2024
Privacy
PUBLIC
Use Case
License
Share
Badge