Vision-capable models can analyze images alongside text, enabling use cases like image description, visual reasoning, document extraction, and more. Images are passed as part of the messages array using the OpenAI-compatible multimodal format.
The Morpheus Inference API is fully OpenAI-compatible — vision works exactly like OpenAI’s multimodal API. If you’ve used GPT-4 Vision before, you already know how to use this.
Instead of sending a plain text string as the message content, you send an array of content parts — mixing text and images in a single message:
Copy
{ "role": "user", "content": [ {"type": "text", "text": "What do you see in this image?"}, {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}} ]}
Images can be provided as:
URL — A direct link to an image (https://...)
Base64 — Inline image data (data:image/jpeg;base64,...)
Ask the model to describe what it sees in an image — useful for accessibility, content moderation, or cataloging.
Copy
response = client.chat.completions.create( model="kimi-k2.5", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe this image in one paragraph."}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}} ] }])
Document & Receipt Extraction
Extract structured data from photos of documents, receipts, or invoices.
Copy
response = client.chat.completions.create( model="kimi-k2.5", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Extract all line items, totals, and the date from this receipt. Return as JSON."}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] }])
Math & Diagram Reasoning
kimi-k2.5 excels at solving math problems from images and interpreting diagrams.
Copy
response = client.chat.completions.create( model="kimi-k2.5", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Solve the math problem shown in this image. Show your work step by step."}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}} ] }])
Code Screenshot Analysis
Have the model read and explain code from screenshots.
Copy
response = client.chat.completions.create( model="mistral-31-24b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Read the code in this screenshot. Explain what it does and suggest improvements."}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}} ] }])
Choosing a model: Use kimi-k2.5 for complex visual reasoning, math, and multi-image analysis. Use mistral-31-24b when you need faster responses for simpler image tasks.
Supported formats: JPEG, PNG, GIF, and WebP images are supported. For base64, include the appropriate MIME type in the data URI (e.g., data:image/png;base64,...).