Integrate Morpheus Inference API with OpenAI Python SDK
Learn how to integrate the Morpheus Inference API with OpenAI’s official Python SDK. This guide covers basic chat completions, streaming responses, tool calling, and async operations.
Overview
The Morpheus Inference API is fully OpenAI-compatible . Simply point the official OpenAI Python SDK to the Morpheus base URL and start building.
Base URL: https://api.mor.org/api/v1
Prerequisites
Before you begin, ensure you have:
Python 3.8+ installed on your system
A Morpheus API key from app.mor.org
Basic knowledge of Python and async/await patterns
Familiarity with REST APIs
Create a Morpheus API Key
Visit app.mor.org and sign in to create your API key.
Navigate to the API Keys section
Click “Create API Key” and provide a name
Copy your API key immediately (it won’t be shown again)
Store your API key securely. Never commit it to version control or expose it in publicly accessible code.
Install the OpenAI Python SDK
Install the official OpenAI Python library: Verify installation by running pip show openai to see the installed version.
Configure Environment Variables
Create a .env file in your project root or set environment variables: MORPHEUS_API_KEY = your_api_key_here
For better security, use environment variables instead of hardcoding API keys: import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv( "MORPHEUS_API_KEY" )
Never commit your API key to version control. Add .env to your .gitignore file.
Basic Integration
Setting Up the Client
Configure the OpenAI client to use the Morpheus Inference API by setting a custom base_url:
import os
from openai import OpenAI
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
The only difference from using the standard OpenAI client is the base_url parameter. All other functionality remains the same.
Available Models
Query the available models using the Morpheus API:
from openai import OpenAI
import os
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
# List all available models
models = client.models.list()
for model in models.data:
print ( f "Model: { model.id } " )
Popular models available through Morpheus:
llama-3.3-70b:web - Meta’s Llama 3.3 with web search capabilities
llama-3.3-70b - Meta’s Llama 3.3 base model
qwen3-235b:web - Qwen 3 with web search capabilities
qwen3-235b - Qwen 3 base model
Model availability may vary based on provider availability in the Morpheus marketplace. The API automatically routes to the highest-rated provider for your selected model. The :web suffix indicates models optimized for web browsing tasks.
Text Generation
Basic Chat Completions
Use the chat.completions.create() method for standard, non-streaming text generation:
from openai import OpenAI
import os
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
response = client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain quantum computing in simple terms." }
],
temperature = 0.7 ,
max_tokens = 1024
)
print (response.choices[ 0 ].message.content)
Streaming Responses
For real-time output, enable streaming to receive tokens as they’re generated:
from openai import OpenAI
import os
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
stream = client.chat.completions.create(
model = "llama-3.3-70b:web" ,
messages = [
{ "role" : "user" , "content" : "Write a short story about artificial intelligence." }
],
stream = True ,
temperature = 0.8
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content is not None :
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
print () # New line after streaming completes
Streaming provides a better user experience by showing output immediately rather than waiting for the entire response.
Asynchronous Operations
Async Client Setup
Use the AsyncOpenAI client for concurrent operations and async/await patterns:
import asyncio
import os
from openai import AsyncOpenAI
async def main ():
client = AsyncOpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
response = await client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [
{ "role" : "user" , "content" : "What is the capital of France?" }
]
)
print (response.choices[ 0 ].message.content)
asyncio.run(main())
Async Streaming
Combine async operations with streaming for efficient, concurrent request handling:
import asyncio
import os
from openai import AsyncOpenAI
async def stream_chat ( client , prompt ):
stream = await client.chat.completions.create(
model = "llama-3.3-70b:web" ,
messages = [{ "role" : "user" , "content" : prompt}],
stream = True
)
print ( f " \n Prompt: { prompt } " )
print ( "Response: " , end = "" )
async for chunk in stream:
if chunk.choices[ 0 ].delta.content is not None :
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
print ( " \n " )
async def main ():
client = AsyncOpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
# Process multiple streams concurrently
await asyncio.gather(
stream_chat(client, "Explain Python generators" ),
stream_chat(client, "What is machine learning?" ),
stream_chat(client, "Describe blockchain technology" )
)
asyncio.run(main())
Async operations are ideal for handling multiple concurrent requests efficiently, making your application more responsive.
Enable your AI models to execute functions and interact with external systems through tool calling.
Define tools using JSON schemas to specify available functions:
import json
import os
from openai import OpenAI
def get_weather ( location : str , unit : str = "celsius" ) -> dict :
"""
Get the current weather for a location.
Args:
location: City name or location
unit: Temperature unit (celsius or fahrenheit)
Returns:
Weather information dictionary
"""
# In a real application, call a weather API here
return {
"location" : location,
"temperature" : 22 ,
"unit" : unit,
"condition" : "sunny"
}
def calculate ( expression : str ) -> dict :
"""
Evaluate a mathematical expression.
Args:
expression: Mathematical expression to evaluate
Returns:
Calculation result
"""
try :
result = eval (expression)
return { "result" : result, "expression" : expression}
except Exception as e:
return { "error" : str (e)}
# Define tool schemas
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get the current weather in a given location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "The city name, e.g. San Francisco"
},
"unit" : {
"type" : "string" ,
"enum" : [ "celsius" , "fahrenheit" ],
"description" : "Temperature unit"
}
},
"required" : [ "location" ]
}
}
},
{
"type" : "function" ,
"function" : {
"name" : "calculate" ,
"description" : "Perform a mathematical calculation" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"expression" : {
"type" : "string" ,
"description" : "Mathematical expression to evaluate, e.g. '2 + 2'"
}
},
"required" : [ "expression" ]
}
}
}
]
# Map function names to implementations
available_functions = {
"get_weather" : get_weather,
"calculate" : calculate
}
Integrate tools with chat completions to enable function calling:
import json
import os
from openai import OpenAI
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
messages = [
{ "role" : "user" , "content" : "What's the weather like in Tokyo and calculate 15 * 23" }
]
# Initial request with tools
response = client.chat.completions.create(
model = "llama-3.3-70b:web" ,
messages = messages,
tools = tools,
tool_choice = "auto"
)
response_message = response.choices[ 0 ].message
messages.append(response_message)
# Process tool calls
if response_message.tool_calls:
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print ( f "Calling function: { function_name } " )
print ( f "Arguments: { function_args } " )
# Execute the function
function_to_call = available_functions[function_name]
function_response = function_to_call( ** function_args)
# Add function response to messages
messages.append({
"tool_call_id" : tool_call.id,
"role" : "tool" ,
"name" : function_name,
"content" : json.dumps(function_response)
})
# Get final response with tool results
final_response = client.chat.completions.create(
model = "llama-3.3-70b:web" ,
messages = messages
)
print ( " \n Final Response:" )
print (final_response.choices[ 0 ].message.content)
else :
print (response_message.content)
Here’s a complete example with error handling and streaming:
import json
import os
from openai import OpenAI
from typing import Dict, Any, Callable
class ToolHandler :
def __init__ ( self , client : OpenAI):
self .client = client
self .functions: Dict[ str , Callable] = {}
self .tools = []
def register_function ( self , func : Callable, schema : dict ):
"""Register a function and its schema for tool calling."""
self .functions[func. __name__ ] = func
self .tools.append({
"type" : "function" ,
"function" : schema
})
def execute_tool_call ( self , tool_call ) -> dict :
"""Execute a single tool call."""
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
if function_name not in self .functions:
return { "error" : f "Function { function_name } not found" }
try :
result = self .functions[function_name]( ** function_args)
return result
except Exception as e:
return { "error" : str (e)}
def chat_with_tools ( self , messages : list , model : str = "llama-3.3-70b:web" ,
max_iterations : int = 5 ) -> str :
"""
Handle chat completions with automatic tool calling.
Args:
messages: List of message dictionaries
model: Model to use
max_iterations: Maximum number of tool calling iterations
Returns:
Final assistant response
"""
for iteration in range (max_iterations):
response = self .client.chat.completions.create(
model = model,
messages = messages,
tools = self .tools if self .tools else None ,
tool_choice = "auto" if self .tools else None
)
response_message = response.choices[ 0 ].message
messages.append(response_message)
# Check if we're done
if not response_message.tool_calls:
return response_message.content
# Process tool calls
for tool_call in response_message.tool_calls:
print ( f "[Tool Call { iteration + 1 } ] { tool_call.function.name } " )
result = self .execute_tool_call(tool_call)
messages.append({
"tool_call_id" : tool_call.id,
"role" : "tool" ,
"name" : tool_call.function.name,
"content" : json.dumps(result)
})
return "Max iterations reached without completion"
# Initialize client and handler
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
handler = ToolHandler(client)
# Define and register functions
def search_web ( query : str ) -> dict :
"""Search the web for information."""
return {
"query" : query,
"results" : [
{ "title" : "Example Result" , "snippet" : "This is a sample search result." }
]
}
handler.register_function(search_web, {
"name" : "search_web" ,
"description" : "Search the web for current information" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"query" : {
"type" : "string" ,
"description" : "The search query"
}
},
"required" : [ "query" ]
}
})
# Use the handler
messages = [
{ "role" : "user" , "content" : "Search for recent AI developments" }
]
response = handler.chat_with_tools(messages)
print ( f " \n Final Response: \n { response } " )
Always provide clear, detailed descriptions for your tools and parameters. This helps the model understand when and how to use each function.
Advanced Configuration
Custom Timeouts and Retries
Configure timeouts and retry behavior for production applications:
import os
import httpx
from openai import OpenAI
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1" ,
timeout = httpx.Timeout(
connect = 5.0 , # Connection timeout
read = 60.0 , # Read timeout
write = 10.0 , # Write timeout
pool = 60.0 # Pool timeout
),
max_retries = 3
)
# Override timeout for specific requests
response = client.with_options( timeout = 30.0 ).chat.completions.create(
model = "llama-3.3-70b" ,
messages = [{ "role" : "user" , "content" : "Quick question" }]
)
Token Usage Tracking
Monitor token consumption and costs:
import os
from openai import OpenAI
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
response = client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [
{ "role" : "user" , "content" : "Explain neural networks" }
]
)
usage = response.usage
print ( f "Prompt tokens: { usage.prompt_tokens } " )
print ( f "Completion tokens: { usage.completion_tokens } " )
print ( f "Total tokens: { usage.total_tokens } " )
# Log usage to database or analytics
def log_usage ( model : str , usage_data : dict ):
"""Log token usage for monitoring."""
print ( f "Model: { model } " )
print ( f "Usage: { usage_data } " )
# Add your logging logic here
log_usage(response.model, {
"prompt_tokens" : usage.prompt_tokens,
"completion_tokens" : usage.completion_tokens,
"total_tokens" : usage.total_tokens
})
Error Handling
Implement robust error handling for production deployments:
import os
from openai import OpenAI, APIError, APITimeoutError, RateLimitError
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1" ,
max_retries = 2
)
def safe_chat_completion ( messages : list , model : str = "llama-3.3-70b" ) -> str :
"""
Make a chat completion with comprehensive error handling.
Args:
messages: List of message dictionaries
model: Model to use
Returns:
Response text or error message
"""
try :
response = client.chat.completions.create(
model = model,
messages = messages,
timeout = 30.0
)
return response.choices[ 0 ].message.content
except APITimeoutError:
return "Request timed out. Please try again."
except RateLimitError:
return "Rate limit exceeded. Please wait before making more requests."
except APIError as e:
print ( f "API Error: { e.status_code } - { e.message } " )
return f "An API error occurred: { e.message } "
except Exception as e:
print ( f "Unexpected error: { str (e) } " )
return "An unexpected error occurred. Please try again."
# Use the safe function
messages = [
{ "role" : "user" , "content" : "Tell me about Python decorators" }
]
result = safe_chat_completion(messages)
print (result)
Context Manager Pattern
Use context managers for automatic resource cleanup:
import os
from openai import OpenAI
def process_queries ( queries : list ):
"""Process multiple queries with automatic cleanup."""
with OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
) as client:
for query in queries:
response = client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [{ "role" : "user" , "content" : query}]
)
print ( f "Q: { query } " )
print ( f "A: { response.choices[ 0 ].message.content } \n " )
# HTTP client is automatically closed here
queries = [
"What is async/await in Python?" ,
"Explain list comprehensions" ,
"What are Python decorators?"
]
process_queries(queries)
Troubleshooting
Connection errors or timeouts
Cause : Network issues, firewall restrictions, or server unavailability.Solution :
Check your internet connection
Verify the base URL is correct: https://api.mor.org/api/v1
Increase timeout values for slower connections
Ensure your firewall allows HTTPS connections
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1" ,
timeout = 60.0 , # Increase timeout
max_retries = 3 # Enable retries
)
Authentication errors (401 Unauthorized)
Cause : Invalid or missing API key.Solution :
Verify your API key is correct
Ensure the API key is properly loaded from environment variables
Check that the key hasn’t been deleted from your Morpheus account
import os
# Debug API key loading
api_key = os.getenv( "MORPHEUS_API_KEY" )
print ( f "API key loaded: { api_key is not None } " )
print ( f "API key length: { len (api_key) if api_key else 0 } " )
if not api_key:
raise ValueError ( "MORPHEUS_API_KEY environment variable not set" )
Tool calls fail or return unexpected results
Streaming stops prematurely
Cause : Network interruption, timeout, or model completion.Solution :
Check the finish_reason in the response
Implement error handling for streams
Use appropriate timeout values
try :
stream = client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [{ "role" : "user" , "content" : "Long task" }],
stream = True ,
timeout = 120.0 # Longer timeout for streaming
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
# Check finish reason
if chunk.choices[ 0 ].finish_reason:
print ( f " \n Finish reason: { chunk.choices[ 0 ].finish_reason } " )
except Exception as e:
print ( f "Stream error: { str (e) } " )
Cause : Requested model is not available or misspelled.Solution :
List available models first
Use exact model names including suffixes (:web)
Check model availability in the marketplace
# List available models
models = client.models.list()
available_models = [model.id for model in models.data]
print ( "Available models:" , available_models)
# Verify model exists before using
desired_model = "llama-3.3-70b:web"
if desired_model in available_models:
response = client.chat.completions.create(
model = desired_model,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
else :
print ( f "Model { desired_model } not available. Using default." )
Async operations not working
Cause : Incorrect async/await usage or event loop issues.Solution :
Use AsyncOpenAI instead of OpenAI
Properly await all async operations
Run async functions with asyncio.run()
import asyncio
from openai import AsyncOpenAI
async def correct_async_usage ():
client = AsyncOpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
# Await the response
response = await client.chat.completions.create(
model = "llama-3.3-70b" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
return response.choices[ 0 ].message.content
# Run the async function
result = asyncio.run(correct_async_usage())
print (result)
Best Practices
Use environment variables Always store API keys in environment variables, never hardcode them in your source code.
Implement retry logic Use the built-in max_retries parameter or implement custom retry logic for production applications.
Monitor token usage Track token consumption to understand your application’s resource needs and optimize prompts.
Handle errors gracefully Implement comprehensive error handling to provide good user experiences when API calls fail.
Use async for concurrency Leverage AsyncOpenAI for applications that need to handle multiple concurrent requests.
Validate tool schemas Test tool calling implementations thoroughly and provide clear descriptions for reliable function execution.
Example Applications
Command-Line Chat Application
A simple command-line chat interface:
import os
from openai import OpenAI
def main ():
client = OpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." }
]
print ( "Morpheus Chat (type 'quit' to exit)" )
print ( "-" * 50 )
while True :
user_input = input ( " \n You: " ).strip()
if user_input.lower() in [ 'quit' , 'exit' , 'q' ]:
print ( "Goodbye!" )
break
if not user_input:
continue
messages.append({ "role" : "user" , "content" : user_input})
try :
stream = client.chat.completions.create(
model = "llama-3.3-70b:web" ,
messages = messages,
stream = True ,
temperature = 0.7
)
print ( " \n Assistant: " , end = "" , flush = True )
full_response = ""
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
content = chunk.choices[ 0 ].delta.content
print (content, end = "" , flush = True )
full_response += content
print ()
messages.append({ "role" : "assistant" , "content" : full_response})
except Exception as e:
print ( f " \n Error: { str (e) } " )
if __name__ == "__main__" :
main()
Batch Processing Script
Process multiple prompts efficiently:
import os
import asyncio
from openai import AsyncOpenAI
from typing import List, Dict
async def process_batch ( prompts : List[ str ], model : str = "llama-3.3-70b" ) -> List[Dict]:
"""
Process multiple prompts concurrently.
Args:
prompts: List of user prompts
model: Model to use
Returns:
List of response dictionaries
"""
client = AsyncOpenAI(
api_key = os.getenv( "MORPHEUS_API_KEY" ),
base_url = "https://api.mor.org/api/v1"
)
async def process_single ( prompt : str ) -> Dict:
try :
response = await client.chat.completions.create(
model = model,
messages = [{ "role" : "user" , "content" : prompt}],
timeout = 30.0
)
return {
"prompt" : prompt,
"response" : response.choices[ 0 ].message.content,
"tokens" : response.usage.total_tokens,
"success" : True
}
except Exception as e:
return {
"prompt" : prompt,
"error" : str (e),
"success" : False
}
# Process all prompts concurrently
tasks = [process_single(prompt) for prompt in prompts]
results = await asyncio.gather( * tasks)
return results
# Example usage
async def main ():
prompts = [
"What is Python?" ,
"Explain machine learning" ,
"What are REST APIs?" ,
"Describe cloud computing" ,
"What is Docker?"
]
print ( f "Processing { len (prompts) } prompts..." )
results = await process_batch(prompts)
# Display results
for i, result in enumerate (results, 1 ):
print ( f " \n { '=' * 60 } " )
print ( f "Prompt { i } : { result[ 'prompt' ] } " )
print ( f " { '=' * 60 } " )
if result[ 'success' ]:
print ( f "Response: { result[ 'response' ] } " )
print ( f "Tokens used: { result[ 'tokens' ] } " )
else :
print ( f "Error: { result[ 'error' ] } " )
if __name__ == "__main__" :
asyncio.run(main())
Next Steps
Summary
You’ve successfully integrated the Morpheus Inference API with OpenAI’s Python SDK! Key takeaways:
OpenAI Compatibility : Morpheus works seamlessly with the official OpenAI Python SDK by using a custom base_url
Flexible Deployment : Use synchronous or asynchronous clients based on your application needs
Streaming Support : Real-time streaming responses work identically to OpenAI’s API
Tool Calling : Define and execute custom functions with JSON schema-based tool definitions
The combination of Morpheus’s free, decentralized AI inference and the OpenAI Python SDK’s robust features enables you to build powerful AI applications without infrastructure costs or vendor lock-in.