QwQ is the reasoning model of the Qwen series, designed for enhanced problem-solving and downstream task performance. QwQ-32B competes with top reasoning models like DeepSeek-R1 and o1-mini.

Model is trained and ready for deployment

QwQ-32B-AWQ

Process an OpenAI-compatible request and send it to the appropriate OpenAI endpoint.

Args:
    msg: JSON string containing the request parameters including 'openai_endpoint'

Returns:
    JSON string containing the response or error

openai_transport

return

Process an OpenAI-compatible request and return a streaming response iterator.
This method is used when stream=True and returns an iterator of strings directly,
without converting to a list or JSON serializing. Supports chat completions and responses endpoints.

Args:
    msg: The request as a JSON string.

Returns:
    Iterator[str]: An iterator yielding text chunks from the streaming response.

openai_stream_transport

prompt

images

audios

videos

chat_history

audio

video

image

tools

tool_choice

The system-level prompt used to define the assistant's behavior.

system_prompt

The maximum number of tokens to generate. Shorter token lengths will provide faster performance.

max_tokens

A decimal number that determines the degree of randomness in the response.

temperature

An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass.

top_p

The level of reasoning effort to apply to the response. Currently supported values are low, medium, and high. 

reasoning_effort

predict

generate

QwQ is the reasoning model of the Qwen series, designed for enhanced problem-solving and downstream task performance. QwQ-32B competes with top reasoning