Skip to main content

Vision (Image Analysis)

Vision-capable models can analyze images alongside text, enabling use cases like image description, visual reasoning, document extraction, and more. Images are passed as part of the messages array using the OpenAI-compatible multimodal format.
The Morpheus Inference API is fully OpenAI-compatible — vision works exactly like OpenAI’s multimodal API. If you’ve used GPT-4 Vision before, you already know how to use this.

Supported Models

ModelContext WindowBest For
kimi-k2.5256KVisual reasoning, math from images, complex multimodal tasks
mistral-31-24b128KFast image analysis, efficient visual processing

How It Works

Instead of sending a plain text string as the message content, you send an array of content parts — mixing text and images in a single message:
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What do you see in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
  ]
}
Images can be provided as:
  • URL — A direct link to an image (https://...)
  • Base64 — Inline image data (data:image/jpeg;base64,...)

Basic Example

Send an image URL for analysis:
curl https://api.mor.org/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What do you see in this image? Describe it in detail."},
          {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg"}}
        ]
      }
    ]
  }'

Using Base64 Images

For local images or when you want to avoid external URLs, encode the image as base64:
import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.mor.org/api/v1"
)

# Read and encode a local image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Multiple Images

You can send multiple images in a single message for comparison or multi-image analysis:
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images. What are the differences?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image1.jpg"}
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image2.jpg"}
                }
            ]
        }
    ]
)

Use Cases

Ask the model to describe what it sees in an image — useful for accessibility, content moderation, or cataloging.
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in one paragraph."},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
    }]
)
Extract structured data from photos of documents, receipts, or invoices.
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all line items, totals, and the date from this receipt. Return as JSON."},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
        ]
    }]
)
kimi-k2.5 excels at solving math problems from images and interpreting diagrams.
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Solve the math problem shown in this image. Show your work step by step."},
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
        ]
    }]
)
Have the model read and explain code from screenshots.
response = client.chat.completions.create(
    model="mistral-31-24b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Read the code in this screenshot. Explain what it does and suggest improvements."},
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
        ]
    }]
)

Tips

Choosing a model: Use kimi-k2.5 for complex visual reasoning, math, and multi-image analysis. Use mistral-31-24b when you need faster responses for simpler image tasks.
Supported formats: JPEG, PNG, GIF, and WebP images are supported. For base64, include the appropriate MIME type in the data URI (e.g., data:image/png;base64,...).

Next Steps