Available Models
The Morpheus Inference Marketplace provides access to a variety of open-source AI models. Models are hosted by providers in the decentralized marketplace, and availability may vary based on provider activity.Large Language Models (LLMs)
Flagship Models
These are the most capable models available for complex tasks.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
minimax-m2.5 | 1M | Code, Function Calling, Reasoning | AI agents, autonomous workflows, multi-step tool orchestration |
qwen3-coder-480b-a35b-instruct | 256K | Code, Function Calling | Code generation, programming |
hermes-3-llama-3.1-405b | 128K | — | General purpose, instruction following |
gpt-oss-120b | 128K | Function Calling | GPT-style responses |
Reasoning Models
Models optimized for step-by-step thinking and complex problem solving.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
glm-5 | 200K | Code, Function Calling, Reasoning | Agentic engineering, complex systems, long-horizon tasks |
kimi-k2.5 | 256K | Code, Function Calling, Reasoning, Vision | Math, visual reasoning, parallel agent workflows |
kimi-k2-thinking | 256K | Code, Function Calling, Reasoning | Deep reasoning, math, logic, coding |
glm-4.7-thinking | 198K | Function Calling, Reasoning | Extended thinking, analysis |
qwen3-235b | 128K | Function Calling | Complex reasoning, long documents |
Mid-Size Models
Balanced performance and speed for most use cases.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
llama-3.3-70b | 128K | Function Calling | General purpose, reliable |
qwen3-next-80b | 256K | Function Calling | Next-gen reasoning, long context |
mistral-31-24b | 128K | Function Calling, Vision | Fast, efficient, image analysis |
venice-uncensored | 32K | — | Uncensored, creative, roleplay |
Fast Models
Optimized for speed and low latency.| Model | Context Window | Capabilities | Best For |
|---|---|---|---|
glm-4.7-flash | 200K | Function Calling, Reasoning | Agentic coding, tool-use workflows, local deployment |
llama-3.2-3b | 128K | Function Calling | Fastest responses, simple tasks |
Embeddings Models
For vector embeddings and semantic search.| Model | Best For |
|---|---|
text-embedding-bge-m3 | Text embeddings, RAG, semantic search |
Audio Models
Text-to-Speech
| Model | Best For |
|---|---|
tts-kokoro | Natural-sounding voice synthesis |
Model Capabilities
Function Calling
Function Calling
Models with function calling can invoke tools and APIs. Use the
tools parameter in your chat completion request to define available functions.Supported models: Most models except venice-uncensored and hermes-3-llama-3.1-405bReasoning
Reasoning
Reasoning models support extended thinking and step-by-step problem solving. They’re optimized for complex math, logic, and analytical tasks.Supported models:
glm-5, kimi-k2.5, kimi-k2-thinking, glm-4.7-thinking, glm-4.7-flash, glm-4.7, minimax-m2.5Vision
Vision
Vision-capable models can analyze images passed in the messages array.Supported models:
mistral-31-24b, kimi-k2.5Web Search
Web Search
Any model can be upgraded with real-time web search by appending
:web to the model name. See the Web Search section below for details.Available for: All modelsCode Optimization
Code Optimization
Models specifically optimized for code generation and programming tasks.Supported models:
qwen3-coder-480b-a35b-instruct, kimi-k2-thinking, glm-5, minimax-m2.5Web Search with :web
Every model listed above can be enhanced with real-time web search capabilities — you don’t need a separate model for it. Simply append :web to any model name, and the model will search the internet for current information before generating its response.
The model tables above only list base model names. To use any model with web search, just add
:web to the end. For example, llama-3.3-70b becomes llama-3.3-70b:web. This works universally across every model in the Morpheus marketplace.| Base Model | With Web Search | What Changes |
|---|---|---|
glm-5 | glm-5:web | Adds real-time internet search to responses |
kimi-k2.5 | kimi-k2.5:web | Combines deep reasoning with current web data |
glm-4.7-flash | glm-4.7-flash:web | Adds web search to fast, efficient responses |
| any model | model-name:web | Same pattern — works for all models |
Using Models
Specify the model ID in your API requests:- curl
- Python
- JavaScript
List Active Models
Query the API to see currently available models:Model Selection Guide
Best for coding
Best for coding
qwen3-coder-480b-a35b-instruct- Top choice for code generation (256K context)minimax-m2.5- SOTA agentic coding, full-stack development (1M context)glm-5- Agentic engineering, multi-file systems design (200K context)kimi-k2-thinking- Best for complex algorithmic problems with reasoningllama-3.3-70b- Good balance of speed and quality
Best for long documents
Best for long documents
minimax-m2.5- 1M context windowqwen3-next-80b- 256K context windowkimi-k2.5- 256K context with multimodal reasoningqwen3-coder-480b-a35b-instruct- 256K context windowglm-5- 200K context, excellent at document analysiskimi-k2-thinking- 256K context with reasoning
Best for speed
Best for speed
glm-4.7-flash- 30B MoE (3B active), 200K context, runs on consumer GPUsllama-3.2-3b- Very fast, 128K contextmistral-31-24b- Good speed with vision support
Best for reasoning
Best for reasoning
kimi-k2.5- Top math/logic benchmarks (AIME 96%), multimodal, 256K contextglm-5- Agentic engineering, systems reasoning, 200K contextkimi-k2-thinking- Deep reasoning chains, 256K contextglm-4.7-thinking- Extended thinking mode, 198K contextqwen3-235b- Complex analysis, 128K context
Best for AI agents
Best for AI agents
minimax-m2.5- Purpose-built for agents, 80.2% SWE-Bench, multi-step tool orchestrationglm-5- Long-horizon agentic tasks, #1 open-source on Vending Bench 2kimi-k2.5- Agent Swarm with up to 100 parallel sub-agentsglm-4.7-flash- Lightweight agentic coding, efficient tool-use workflows
Best for uncensored/creative
Best for uncensored/creative
venice-uncensored- Minimal content restrictions, roleplay
Next Steps
Quickstart
Get started making your first API call.
Chat Completions
Full API reference for chat completions.
Embeddings
Create embeddings for semantic search.
Text-to-Speech
Generate speech from text.

