Available Models

The Morpheus Inference Marketplace provides access to a variety of open-source AI models. Models are hosted by providers in the decentralized marketplace, and availability may vary based on provider activity.

Large Language Models (LLMs)

Flagship Models

These are the most capable models available for complex tasks.

Model	Context Window	Capabilities	Best For
`minimax-m2.5`	1M	Code, Function Calling, Reasoning	AI agents, autonomous workflows, multi-step tool orchestration
`qwen3-coder-480b-a35b-instruct`	256K	Code, Function Calling	Code generation, programming
`hermes-3-llama-3.1-405b`	128K	—	General purpose, instruction following
`gpt-oss-120b`	128K	Function Calling	GPT-style responses

Reasoning Models

Models optimized for step-by-step thinking and complex problem solving.

Model	Context Window	Capabilities	Best For
`glm-5`	200K	Code, Function Calling, Reasoning	Agentic engineering, complex systems, long-horizon tasks
`kimi-k2.5`	256K	Code, Function Calling, Reasoning, Vision	Math, visual reasoning, parallel agent workflows
`kimi-k2-thinking`	256K	Code, Function Calling, Reasoning	Deep reasoning, math, logic, coding
`glm-4.7-thinking`	198K	Function Calling, Reasoning	Extended thinking, analysis
`qwen3-235b`	128K	Function Calling	Complex reasoning, long documents

Mid-Size Models

Balanced performance and speed for most use cases.

Model	Context Window	Capabilities	Best For
`llama-3.3-70b`	128K	Function Calling	General purpose, reliable
`qwen3-next-80b`	256K	Function Calling	Next-gen reasoning, long context
`mistral-31-24b`	128K	Function Calling, Vision	Fast, efficient, image analysis
`venice-uncensored`	32K	—	Uncensored, creative, roleplay

Fast Models

Optimized for speed and low latency.

Model	Context Window	Capabilities	Best For
`glm-4.7-flash`	200K	Function Calling, Reasoning	Agentic coding, tool-use workflows, local deployment
`llama-3.2-3b`	128K	Function Calling	Fastest responses, simple tasks

Embeddings Models

For vector embeddings and semantic search.

Model	Best For
`text-embedding-bge-m3`	Text embeddings, RAG, semantic search

Audio Models

Text-to-Speech

Model	Best For
`tts-kokoro`	Natural-sounding voice synthesis

Model Capabilities

Function Calling

Models with function calling can invoke tools and APIs. Use the tools parameter in your chat completion request to define available functions.Supported models: Most models except venice-uncensored and hermes-3-llama-3.1-405b

Reasoning

Reasoning models support extended thinking and step-by-step problem solving. They’re optimized for complex math, logic, and analytical tasks.Supported models: glm-5, kimi-k2.5, kimi-k2-thinking, glm-4.7-thinking, glm-4.7-flash, glm-4.7, minimax-m2.5

Vision

Vision-capable models can analyze images passed in the messages array.Supported models: mistral-31-24b, kimi-k2.5

Web Search

Any model can be upgraded with real-time web search by appending :web to the model name. See the Web Search section below for details.Available for: All models

Code Optimization

Models specifically optimized for code generation and programming tasks.Supported models: qwen3-coder-480b-a35b-instruct, kimi-k2-thinking, glm-5, minimax-m2.5

Web Search with `:web`

Every model listed above can be enhanced with real-time web search capabilities — you don’t need a separate model for it. Simply append :web to any model name, and the model will search the internet for current information before generating its response.

The model tables above only list base model names. To use any model with web search, just add :web to the end. For example, llama-3.3-70b becomes llama-3.3-70b:web. This works universally across every model in the Morpheus marketplace.

Base Model	With Web Search	What Changes
`glm-5`	`glm-5:web`	Adds real-time internet search to responses
`kimi-k2.5`	`kimi-k2.5:web`	Combines deep reasoning with current web data
`glm-4.7-flash`	`glm-4.7-flash:web`	Adds web search to fast, efficient responses
any model	model-name`:web`	Same pattern — works for all models

Verify exact model names — including :web variants — by querying the /models endpoint. The :web suffix is universal, but the base model name must match exactly what the API returns.

Using Models

Specify the model ID in your API requests:

curl
Python
JavaScript

curl https://api.mor.org/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.mor.org/api/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.mor.org/api/v1",
});

const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

List Active Models

Query the API to see currently available models:

curl https://api.mor.org/api/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Model availability depends on active providers in the Morpheus Inference Marketplace. The API automatically routes your request to the highest-rated provider for your selected model.

Model Selection Guide

Best for coding

qwen3-coder-480b-a35b-instruct - Top choice for code generation (256K context)
minimax-m2.5 - SOTA agentic coding, full-stack development (1M context)
glm-5 - Agentic engineering, multi-file systems design (200K context)
kimi-k2-thinking - Best for complex algorithmic problems with reasoning
llama-3.3-70b - Good balance of speed and quality

Best for long documents

minimax-m2.5 - 1M context window
qwen3-next-80b - 256K context window
kimi-k2.5 - 256K context with multimodal reasoning
qwen3-coder-480b-a35b-instruct - 256K context window
glm-5 - 200K context, excellent at document analysis
kimi-k2-thinking - 256K context with reasoning

Best for speed

glm-4.7-flash - 30B MoE (3B active), 200K context, runs on consumer GPUs
llama-3.2-3b - Very fast, 128K context
mistral-31-24b - Good speed with vision support

Best for reasoning

kimi-k2.5 - Top math/logic benchmarks (AIME 96%), multimodal, 256K context
glm-5 - Agentic engineering, systems reasoning, 200K context
kimi-k2-thinking - Deep reasoning chains, 256K context
glm-4.7-thinking - Extended thinking mode, 198K context
qwen3-235b - Complex analysis, 128K context

Best for AI agents

minimax-m2.5 - Purpose-built for agents, 80.2% SWE-Bench, multi-step tool orchestration
glm-5 - Long-horizon agentic tasks, #1 open-source on Vending Bench 2
kimi-k2.5 - Agent Swarm with up to 100 parallel sub-agents
glm-4.7-flash - Lightweight agentic coding, efficient tool-use workflows

Best for uncensored/creative

venice-uncensored - Minimal content restrictions, roleplay

Next Steps

Quickstart

Get started making your first API call.

Chat Completions

Full API reference for chat completions.

Embeddings

Create embeddings for semantic search.

Text-to-Speech

Generate speech from text.

Getting Started

Models

Guides

SDKs

AI Coding Tools

Agent Frameworks

Workflow Automation

Chat UIs

Available Models

Available Models

Large Language Models (LLMs)

Flagship Models

Reasoning Models

Mid-Size Models

Fast Models

Embeddings Models

Audio Models

Text-to-Speech

Model Capabilities

Web Search with `:web`

Using Models

List Active Models

Model Selection Guide

Next Steps

Quickstart

Chat Completions

Embeddings

Text-to-Speech

Getting Started

Models

Guides

SDKs

AI Coding Tools

Agent Frameworks

Workflow Automation

Chat UIs

​Available Models

​Large Language Models (LLMs)

​Flagship Models

​Reasoning Models

​Mid-Size Models

​Fast Models

​Embeddings Models

​Audio Models

​Text-to-Speech

​Model Capabilities

​Web Search with :web

​Using Models

​List Active Models

​Model Selection Guide

​Next Steps

Quickstart

Chat Completions

Embeddings

Text-to-Speech

Available Models

Large Language Models (LLMs)

Flagship Models

Reasoning Models

Mid-Size Models

Fast Models

Embeddings Models

Audio Models

Text-to-Speech

Model Capabilities

Web Search with `:web`

Using Models

List Active Models

Model Selection Guide

Next Steps