Skip to main content

Available Models

The Morpheus Inference Marketplace provides access to a variety of open-source AI models. Models are hosted by providers in the decentralized marketplace, and availability may vary based on provider activity.
Pricing: The Morpheus Inference API is currently FREE during the Open Beta program. Billing infrastructure will be implemented soon, with free inference available until 1/31/26.

Large Language Models (LLMs)

Flagship Models

These are the most capable models available for complex tasks.
ModelContext WindowCapabilitiesBest For
qwen3-coder-480b-a35b-instruct256KCode, Function CallingCode generation, programming
qwen3-coder-480b-a35b-instruct:web256KCode, Function Calling, WebCode with web search
hermes-3-llama-3.1-405b128KWeb SearchGeneral purpose, instruction following
hermes-3-llama-3.1-405b:web128KWeb SearchGeneral purpose with web
gpt-oss-120b128KFunction CallingGPT-style responses
gpt-oss-120b:web128KFunction Calling, WebGPT-style with web search

Reasoning Models

Models optimized for step-by-step thinking and complex problem solving.
ModelContext WindowCapabilitiesBest For
kimi-k2-thinking256KCode, Function Calling, ReasoningDeep reasoning, math, logic, coding
kimi-k2-thinking:web256KCode, Function Calling, Reasoning, WebDeep reasoning with web search
glm-4.7-thinking198KFunction Calling, ReasoningExtended thinking, analysis
glm-4.7-thinking:web198KFunction Calling, Reasoning, WebExtended thinking with web
qwen3-235b128KFunction CallingComplex reasoning, long documents
qwen3-235b:web128KFunction Calling, WebComplex reasoning with web

Mid-Size Models

Balanced performance and speed for most use cases.
ModelContext WindowCapabilitiesBest For
llama-3.3-70b128KFunction CallingGeneral purpose, reliable
llama-3.3-70b:web128KFunction Calling, WebGeneral purpose with web
qwen3-next-80b256KFunction CallingNext-gen reasoning, long context
qwen3-next-80b:web256KFunction Calling, WebNext-gen with web search
mistral-31-24b128KFunction Calling, VisionFast, efficient, image analysis
mistral-31-24b:web128KFunction Calling, Vision, WebFast with web search
glm-4.6198KFunction Calling, ReasoningGeneral purpose, long context
glm-4.6:web198KFunction Calling, Reasoning, WebGeneral purpose with web
glm-4.7198KFunction Calling, ReasoningImproved GLM, largest context
glm-4.7:web198KFunction Calling, Reasoning, WebImproved GLM with web
venice-uncensored32KUncensored, creative, roleplay
venice-uncensored:web32KWebUncensored with web search
hermes-4-14b128KEfficient instruction following

Fast Models

Optimized for speed and low latency.
ModelContext WindowCapabilitiesBest For
llama-3.2-3b128KFunction CallingFastest responses, simple tasks
llama-3.2-3b:web128KFunction Calling, WebFast with web search
qwen3-4b32KFunction Calling, ReasoningLightweight, mobile, low-latency
qwen3-4b:web32KFunction Calling, Reasoning, WebLightweight with web

Embeddings Models

For vector embeddings and semantic search.
ModelBest For
text-embedding-bge-m3Text embeddings, RAG, semantic search

Audio Models

Text-to-Speech

ModelBest For
tts-kokoroNatural-sounding voice synthesis

Speech-to-Text

ModelBest For
whisper-v3-large-turboTranscription, audio processing

Model Capabilities

Models with function calling can invoke tools and APIs. Use the tools parameter in your chat completion request to define available functions.Supported models: Most models except venice-uncensored and hermes-3-llama-3.1-405b
Reasoning models support extended thinking and step-by-step problem solving. They’re optimized for complex math, logic, and analytical tasks.Supported models: kimi-k2-thinking, glm-4.7-thinking, glm-4.6, glm-4.7, qwen3-4b
Vision-capable models can analyze images passed in the messages array.Supported models: mistral-31-24b
Models specifically optimized for code generation and programming tasks.Supported models: qwen3-coder-480b-a35b-instruct, kimi-k2-thinking

Model Naming Convention

Models with the :web suffix have web search capabilities enabled, allowing them to access current information from the internet.
SuffixMeaning
(none)Base model without web access
:webModel with web search capabilities

Using Models

Specify the model ID in your API requests:
curl https://api.mor.org/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

List Active Models

Query the API to see currently available models:
curl https://api.mor.org/api/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
Model availability depends on active providers in the Morpheus Inference Marketplace. The API automatically routes your request to the highest-rated provider for your selected model.

Model Selection Guide

  • qwen3-coder-480b-a35b-instruct - Top choice for code generation (256K context)
  • kimi-k2-thinking - Best for complex algorithmic problems with reasoning
  • llama-3.3-70b - Good balance of speed and quality
  • qwen3-next-80b - 256K context window
  • qwen3-coder-480b-a35b-instruct - 256K context window
  • glm-4.7 - 198K context, excellent at document analysis
  • kimi-k2-thinking - 256K context with reasoning
  • qwen3-4b - Fastest, 32K context
  • llama-3.2-3b - Very fast, 128K context
  • mistral-31-24b - Good speed with vision support
  • kimi-k2-thinking - Deep reasoning chains, 256K context
  • glm-4.7-thinking - Extended thinking mode, 198K context
  • qwen3-235b - Complex analysis, 128K context
  • venice-uncensored - Minimal content restrictions, roleplay

Next Steps