Available Models
The Morpheus Inference Marketplace provides access to a variety of open-source AI models. Models are hosted by providers in the decentralized marketplace, and availability may vary based on provider activity.Pricing: The Morpheus Inference API is currently FREE during the Open Beta program. Billing infrastructure will be implemented soon, with free inference available until 1/31/26.
Large Language Models (LLMs)
Flagship Models
These are the most capable models available for complex tasks.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
qwen3-coder-480b-a35b-instruct | 256K | Code, Function Calling | Code generation, programming |
qwen3-coder-480b-a35b-instruct:web | 256K | Code, Function Calling, Web | Code with web search |
hermes-3-llama-3.1-405b | 128K | Web Search | General purpose, instruction following |
hermes-3-llama-3.1-405b:web | 128K | Web Search | General purpose with web |
gpt-oss-120b | 128K | Function Calling | GPT-style responses |
gpt-oss-120b:web | 128K | Function Calling, Web | GPT-style with web search |
Reasoning Models
Models optimized for step-by-step thinking and complex problem solving.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
kimi-k2-thinking | 256K | Code, Function Calling, Reasoning | Deep reasoning, math, logic, coding |
kimi-k2-thinking:web | 256K | Code, Function Calling, Reasoning, Web | Deep reasoning with web search |
glm-4.7-thinking | 198K | Function Calling, Reasoning | Extended thinking, analysis |
glm-4.7-thinking:web | 198K | Function Calling, Reasoning, Web | Extended thinking with web |
qwen3-235b | 128K | Function Calling | Complex reasoning, long documents |
qwen3-235b:web | 128K | Function Calling, Web | Complex reasoning with web |
Mid-Size Models
Balanced performance and speed for most use cases.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
llama-3.3-70b | 128K | Function Calling | General purpose, reliable |
llama-3.3-70b:web | 128K | Function Calling, Web | General purpose with web |
qwen3-next-80b | 256K | Function Calling | Next-gen reasoning, long context |
qwen3-next-80b:web | 256K | Function Calling, Web | Next-gen with web search |
mistral-31-24b | 128K | Function Calling, Vision | Fast, efficient, image analysis |
mistral-31-24b:web | 128K | Function Calling, Vision, Web | Fast with web search |
glm-4.6 | 198K | Function Calling, Reasoning | General purpose, long context |
glm-4.6:web | 198K | Function Calling, Reasoning, Web | General purpose with web |
glm-4.7 | 198K | Function Calling, Reasoning | Improved GLM, largest context |
glm-4.7:web | 198K | Function Calling, Reasoning, Web | Improved GLM with web |
venice-uncensored | 32K | — | Uncensored, creative, roleplay |
venice-uncensored:web | 32K | Web | Uncensored with web search |
hermes-4-14b | 128K | — | Efficient instruction following |
Fast Models
Optimized for speed and low latency.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
llama-3.2-3b | 128K | Function Calling | Fastest responses, simple tasks |
llama-3.2-3b:web | 128K | Function Calling, Web | Fast with web search |
qwen3-4b | 32K | Function Calling, Reasoning | Lightweight, mobile, low-latency |
qwen3-4b:web | 32K | Function Calling, Reasoning, Web | Lightweight with web |
Embeddings Models
For vector embeddings and semantic search.| Model | Best For |
|---|---|
text-embedding-bge-m3 | Text embeddings, RAG, semantic search |
Audio Models
Text-to-Speech
| Model | Best For |
|---|---|
tts-kokoro | Natural-sounding voice synthesis |
Speech-to-Text
| Model | Best For |
|---|---|
whisper-v3-large-turbo | Transcription, audio processing |
Model Capabilities
Function Calling
Function Calling
Models with function calling can invoke tools and APIs. Use the
tools parameter in your chat completion request to define available functions.Supported models: Most models except venice-uncensored and hermes-3-llama-3.1-405bReasoning
Reasoning
Reasoning models support extended thinking and step-by-step problem solving. They’re optimized for complex math, logic, and analytical tasks.Supported models:
kimi-k2-thinking, glm-4.7-thinking, glm-4.6, glm-4.7, qwen3-4bVision
Vision
Vision-capable models can analyze images passed in the messages array.Supported models:
mistral-31-24bWeb Search
Web Search
Models with the
:web suffix can search the internet for current information.Supported models: All models have a :web variantCode Optimization
Code Optimization
Models specifically optimized for code generation and programming tasks.Supported models:
qwen3-coder-480b-a35b-instruct, kimi-k2-thinkingModel Naming Convention
Models with the:web suffix have web search capabilities enabled, allowing them to access current information from the internet.
| Suffix | Meaning |
|---|---|
| (none) | Base model without web access |
:web | Model with web search capabilities |
Using Models
Specify the model ID in your API requests:- curl
- Python
- JavaScript
List Active Models
Query the API to see currently available models:Model Selection Guide
Best for coding
Best for coding
qwen3-coder-480b-a35b-instruct- Top choice for code generation (256K context)kimi-k2-thinking- Best for complex algorithmic problems with reasoningllama-3.3-70b- Good balance of speed and quality
Best for long documents
Best for long documents
qwen3-next-80b- 256K context windowqwen3-coder-480b-a35b-instruct- 256K context windowglm-4.7- 198K context, excellent at document analysiskimi-k2-thinking- 256K context with reasoning
Best for speed
Best for speed
qwen3-4b- Fastest, 32K contextllama-3.2-3b- Very fast, 128K contextmistral-31-24b- Good speed with vision support
Best for reasoning
Best for reasoning
kimi-k2-thinking- Deep reasoning chains, 256K contextglm-4.7-thinking- Extended thinking mode, 198K contextqwen3-235b- Complex analysis, 128K context
Best for uncensored/creative
Best for uncensored/creative
venice-uncensored- Minimal content restrictions, roleplay

