Skip to main content

Available Models

The Morpheus Inference Marketplace provides access to a variety of open-source AI models. Models are hosted by providers in the decentralized marketplace, and availability may vary based on provider activity.

Large Language Models (LLMs)

Flagship Models

These are the most capable models available for complex tasks.
ModelContext WindowCapabilitiesBest For
minimax-m2.51MCode, Function Calling, ReasoningAI agents, autonomous workflows, multi-step tool orchestration
qwen3-coder-480b-a35b-instruct256KCode, Function CallingCode generation, programming
hermes-3-llama-3.1-405b128KGeneral purpose, instruction following
gpt-oss-120b128KFunction CallingGPT-style responses

Reasoning Models

Models optimized for step-by-step thinking and complex problem solving.
ModelContext WindowCapabilitiesBest For
glm-5200KCode, Function Calling, ReasoningAgentic engineering, complex systems, long-horizon tasks
kimi-k2.5256KCode, Function Calling, Reasoning, VisionMath, visual reasoning, parallel agent workflows
kimi-k2-thinking256KCode, Function Calling, ReasoningDeep reasoning, math, logic, coding
glm-4.7-thinking198KFunction Calling, ReasoningExtended thinking, analysis
qwen3-235b128KFunction CallingComplex reasoning, long documents

Mid-Size Models

Balanced performance and speed for most use cases.
ModelContext WindowCapabilitiesBest For
llama-3.3-70b128KFunction CallingGeneral purpose, reliable
qwen3-next-80b256KFunction CallingNext-gen reasoning, long context
mistral-31-24b128KFunction Calling, VisionFast, efficient, image analysis
venice-uncensored32KUncensored, creative, roleplay

Fast Models

Optimized for speed and low latency.
ModelContext WindowCapabilitiesBest For
glm-4.7-flash200KFunction Calling, ReasoningAgentic coding, tool-use workflows, local deployment
llama-3.2-3b128KFunction CallingFastest responses, simple tasks

Embeddings Models

For vector embeddings and semantic search.
ModelBest For
text-embedding-bge-m3Text embeddings, RAG, semantic search

Audio Models

Text-to-Speech

ModelBest For
tts-kokoroNatural-sounding voice synthesis

Model Capabilities

Models with function calling can invoke tools and APIs. Use the tools parameter in your chat completion request to define available functions.Supported models: Most models except venice-uncensored and hermes-3-llama-3.1-405b
Reasoning models support extended thinking and step-by-step problem solving. They’re optimized for complex math, logic, and analytical tasks.Supported models: glm-5, kimi-k2.5, kimi-k2-thinking, glm-4.7-thinking, glm-4.7-flash, glm-4.7, minimax-m2.5
Vision-capable models can analyze images passed in the messages array.Supported models: mistral-31-24b, kimi-k2.5
Models specifically optimized for code generation and programming tasks.Supported models: qwen3-coder-480b-a35b-instruct, kimi-k2-thinking, glm-5, minimax-m2.5

Web Search with :web

Every model listed above can be enhanced with real-time web search capabilities — you don’t need a separate model for it. Simply append :web to any model name, and the model will search the internet for current information before generating its response.
The model tables above only list base model names. To use any model with web search, just add :web to the end. For example, llama-3.3-70b becomes llama-3.3-70b:web. This works universally across every model in the Morpheus marketplace.
Base ModelWith Web SearchWhat Changes
glm-5glm-5:webAdds real-time internet search to responses
kimi-k2.5kimi-k2.5:webCombines deep reasoning with current web data
glm-4.7-flashglm-4.7-flash:webAdds web search to fast, efficient responses
any modelmodel-name:webSame pattern — works for all models
Verify exact model names — including :web variants — by querying the /models endpoint. The :web suffix is universal, but the base model name must match exactly what the API returns.

Using Models

Specify the model ID in your API requests:
curl https://api.mor.org/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

List Active Models

Query the API to see currently available models:
curl https://api.mor.org/api/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
Model availability depends on active providers in the Morpheus Inference Marketplace. The API automatically routes your request to the highest-rated provider for your selected model.

Model Selection Guide

  • qwen3-coder-480b-a35b-instruct - Top choice for code generation (256K context)
  • minimax-m2.5 - SOTA agentic coding, full-stack development (1M context)
  • glm-5 - Agentic engineering, multi-file systems design (200K context)
  • kimi-k2-thinking - Best for complex algorithmic problems with reasoning
  • llama-3.3-70b - Good balance of speed and quality
  • minimax-m2.5 - 1M context window
  • qwen3-next-80b - 256K context window
  • kimi-k2.5 - 256K context with multimodal reasoning
  • qwen3-coder-480b-a35b-instruct - 256K context window
  • glm-5 - 200K context, excellent at document analysis
  • kimi-k2-thinking - 256K context with reasoning
  • glm-4.7-flash - 30B MoE (3B active), 200K context, runs on consumer GPUs
  • llama-3.2-3b - Very fast, 128K context
  • mistral-31-24b - Good speed with vision support
  • kimi-k2.5 - Top math/logic benchmarks (AIME 96%), multimodal, 256K context
  • glm-5 - Agentic engineering, systems reasoning, 200K context
  • kimi-k2-thinking - Deep reasoning chains, 256K context
  • glm-4.7-thinking - Extended thinking mode, 198K context
  • qwen3-235b - Complex analysis, 128K context
  • minimax-m2.5 - Purpose-built for agents, 80.2% SWE-Bench, multi-step tool orchestration
  • glm-5 - Long-horizon agentic tasks, #1 open-source on Vending Bench 2
  • kimi-k2.5 - Agent Swarm with up to 100 parallel sub-agents
  • glm-4.7-flash - Lightweight agentic coding, efficient tool-use workflows
  • venice-uncensored - Minimal content restrictions, roleplay

Next Steps

Quickstart

Get started making your first API call.

Chat Completions

Full API reference for chat completions.

Embeddings

Create embeddings for semantic search.

Text-to-Speech

Generate speech from text.