MoonDream Troubleshooting
Common issues, solutions, and debugging strategies for MoonDream Vision service.
Service Startup Issues
"Module 'torch' not found"
Error:
ModuleNotFoundError: No module named 'torch'
Solutions:
# Install PyTorch for your platform
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Or for CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
"CUDA out of memory"
Error:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
Solutions:
-
Reduce workers:
# moondream/.env
MOONDREAM_WORKERS=1 -
Use CPU mode:
export CUDA_VISIBLE_DEVICES=""
python main.py -
Clear GPU memory:
# Kill other GPU processes
nvidia-smi
kill -9 <process_id>
"MPS not available on Intel Macs"
Error:
RuntimeError: MPS is not available
Solution: MoonDream will automatically fall back to CPU. For better performance, consider:
- Upgrading to Apple Silicon (M1/M2/M3)
- Using CPU optimization flags
- Reducing model precision
Port Already in Use
Error:
OSError: [Errno 48] Address already in use
Solutions:
# Find process using port 20200
lsof -i :20200
# Kill the process
kill -9 <process_id>
# Or change port in main.py
# Modify uvicorn.run(port=20201)
Model Loading Issues
"Model download failed"
Error:
OSError: Couldn't reach server
Solutions:
-
Check internet connection
-
Use local model cache:
export HF_HOME=/path/to/cache
export TRANSFORMERS_CACHE=/path/to/cache -
Manual download:
# Download model manually
git lfs clone https://huggingface.co/vikhyatk/moondream2
# Place in ~/.cache/huggingface/hub/
"Revision not found"
Error:
OSError: Revision '2025-06-21' not found
Solution: Update to the latest available revision:
# In main.py, check available revisions
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="main", # Use 'main' instead of specific date
trust_remote_code=True,
device_map=device_map
)
Slow Model Loading
Issue: Model takes >5 minutes to load
Solutions:
-
Use faster storage:
# Move model to SSD
export HF_HOME=/ssd/cache -
Pre-download model:
from transformers import AutoModelForCausalLM
# Pre-download model
AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-06-21"
) -
Use model sharding:
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
device_map="auto", # Automatic sharding
max_memory={0: "4GB", "cpu": "8GB"}
)
API Request Issues
422 Validation Errors
Error:
{
"detail": [
{
"loc": ["body", "obj"],
"msg": "field required",
"type": "value_error.missing"
}
]
}
Solutions:
-
Check required parameters:
/v1/queryrequiresquestionandinit_image/v1/detectrequiresobjandinit_image/v1/pointrequiresobjandinit_image/v1/captionrequiresinit_image
-
Verify parameter names:
# Correct
-F "obj=login button" \
-F "question=What do you see?"
# Incorrect
-F "object=login button" \
-F "q=What do you see?" -
Check image file:
# Ensure image file exists and is readable
ls -la screenshot.png
file screenshot.png
500 Internal Server Errors
Error:
{
"detail": "Model inference error: ..."
}
Solutions:
-
Check model loading:
curl http://localhost:20200/
# Should show successful model loading -
Monitor worker status:
curl http://localhost:20200/health
# Check model_stats for errors -
Restart service:
pkill -f "python main.py"
sleep 2
cd moondream && python main.py
Image Processing Errors
Issue: API returns errors related to image processing
Common causes:
-
Unsupported image format:
# Check image format
file image.png
# Should be: PNG, JPEG, WebP, BMP, or TIFF -
Corrupted image file:
# Test image integrity
convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted" -
Image too large:
# Check file size
ls -lh image.png
# Should be under 10MB
Solutions:
# Resize large images
convert input.png -resize 1920x1080 output.png
# Convert to supported format
convert input.bmp output.png
# Optimize file size
convert input.png -quality 85 output.jpg
Performance Issues
Slow Response Times
Issue: Requests taking >5 seconds
Diagnosis:
# Check MoonDream logs
tail -f moondream.log
# Monitor system resources
top -p $(pgrep python)
nvidia-smi # For GPU systems
Solutions:
-
Scale workers:
MOONDREAM_WORKERS=2 # Increase from 1 -
Optimize image size:
const optimizedImage = await resizeImage(screenshot, 1024); -
Enable caching:
// Use coordinate cache for repeated elements
const cached = await getCachedCoordinates(flowId, deviceId, element);
if (cached) return cached;
High Memory Usage
Issue: Service consuming too much RAM
Monitoring:
# Check memory usage
ps aux | grep python | head -1
# Monitor over time
watch -n 5 'ps aux | grep python'
Solutions:
-
Reduce workers:
MOONDREAM_WORKERS=1 -
Enable memory optimization:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512" -
Use smaller batch sizes:
MAX_BATCH_SIZE = 1 # Reduce from default
GPU Memory Issues
Issue: CUDA out of memory during inference
Solutions:
-
Reduce batch size:
BATCH_SIZE = 1 -
Use gradient checkpointing:
model.gradient_checkpointing_enable() -
Clear cache between requests:
torch.cuda.empty_cache()
Network and Connectivity Issues
Connection Refused
Error:
ConnectionError: Connection refused
Solutions:
-
Check if service is running:
curl http://localhost:20200/
# Should return service status -
Verify port configuration:
netstat -tlnp | grep :20200
# Should show Python process listening on port 20200 -
Check if port is in use:
lsof -i :20200
# Kill any conflicting processes -
Test local connectivity:
# Test if service responds locally
curl http://127.0.0.1:20200/
curl http://localhost:20200/
Timeout Errors
Issue: Requests timing out
Solutions:
-
Check service load:
curl http://localhost:20200/health
# Check if all workers are responding -
Monitor system resources:
# Check CPU/memory usage
top -p $(pgrep python) -n 1
# Check GPU usage (if applicable)
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv -
Scale workers:
# Increase MOONDREAM_WORKERS in .env
echo "MOONDREAM_WORKERS=2" >> moondream/.env -
Restart service:
pkill -f "python main.py"
sleep 2
cd moondream && python main.py
Integration Issues
External Application Can't Connect
Error: Connection refused or Service unavailable
Solutions:
-
Verify service is running:
curl http://localhost:20200/
# Should return status information -
Check network accessibility:
# Test from different interfaces
curl http://127.0.0.1:20200/
curl http://localhost:20200/
curl http://0.0.0.0:20200/ # If bound to all interfaces -
Firewall configuration:
# Linux
sudo ufw allow 20200
sudo ufw status
# macOS - check System Preferences > Security & Privacy
API Compatibility Issues
Issue: API requests failing due to parameter mismatches
Solutions:
-
Verify parameter names:
# Correct parameter names
-F "obj=button" # not "object"
-F "question=What?" # correct
-F "init_image=@file" # correct -
Check content type:
# Must use multipart/form-data
curl -X POST "http://localhost:20200/v1/detect" \
-H "Content-Type: multipart/form-data" \
-F "obj=button" \
-F "init_image=@image.png" -
Validate response format:
# All endpoints return JSON
curl -H "Accept: application/json" http://localhost:20200/v1/point
Load Balancing Issues
Issue: Uneven load distribution or worker failures
Solutions:
-
Check worker health:
curl http://localhost:20200/health
# Verify all workers have recent last_used timestamps -
Monitor worker statistics:
curl http://localhost:20200/ | jq '.load_balancer_stats.model_stats'
# Check request counts and error rates -
Restart problematic workers:
# Kill and restart the service
pkill -f "python main.py"
sleep 3
cd moondream && python main.py -
Adjust worker count:
# In moondream/.env
MOONDREAM_WORKERS=1 # Reduce for troubleshooting
Logging and Debugging
Enable Debug Logging
# In main.py
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('moondream_debug.log'),
logging.StreamHandler()
]
)
Request Tracing
// Add request IDs for tracing
const requestId = generateId();
console.log(`[${requestId}] Starting detection for: ${element}`);
const result = await detectObject(screenshot, element);
console.log(`[${requestId}] Detection result:`, {
found: result.found,
duration: Date.now() - startTime
});
Performance Profiling
import cProfile
# Profile a single request
cProfile.run('detect_object(image, "button")', 'profile_output.prof')
# Analyze results
import pstats
p = pstats.Stats('profile_output.prof')
p.sort_stats('cumulative').print_stats(10)
Advanced Debugging
Model Inspection
# Check model properties
print("Model device:", model.device)
print("Model dtype:", model.dtype)
print("Model parameters:", sum(p.numel() for p in model.parameters()))
# Test model with simple input
with torch.no_grad():
test_output = model(test_input)
print("Model output shape:", test_output.shape)
Memory Analysis
import torch
import gc
# Check memory usage
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")
print("GPU memory reserved:", torch.cuda.memory_reserved() / 1024**2, "MB")
# Force garbage collection
gc.collect()
torch.cuda.empty_cache()
print("After cleanup:")
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")
Network Debugging
# Test network connectivity
curl -v http://localhost:20200/v1/point \
-F "object=test" \
-F "init_image=@test.png"
# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:20200/
# curl-format.txt
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
# time_starttransfer: %{time_starttransfer}\n
# time_total: %{time_total}\n
Common Error Patterns
Error: "Model inference error"
Cause: Model processing failed during inference
Solution:
-
Check image format:
file image.png # Should be PNG/JPEG/WebP/BMP/TIFF -
Verify image integrity:
convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted" -
Check model health:
curl http://localhost:20200/health
# Look for error counts in model_stats -
Restart service:
pkill -f "python main.py"
sleep 2
cd moondream && python main.py
Error: "422 Validation Error"
Cause: Missing or invalid request parameters
Solution:
-
Check required parameters:
# /v1/query needs: question + init_image
# /v1/detect needs: obj + init_image
# /v1/point needs: obj + init_image
# /v1/caption needs: init_image -
Verify parameter names:
-F "obj=button" # correct
-F "object=button" # incorrect -
Ensure multipart/form-data:
curl -X POST "http://localhost:20200/v1/detect" \
-F "obj=button" \
-F "init_image=@image.png"
Error: "413 Payload Too Large"
Cause: Image file exceeds size limits
Solution:
-
Check file size:
ls -lh image.png # Should be < 10MB -
Resize large images:
convert input.png -resize 1920x1080 output.png -
Compress images:
convert input.png -quality 80 output.jpg
Error: "Connection reset by peer"
Cause: Service crashed or worker died during request
Solution:
-
Check service status:
curl http://localhost:20200/ || echo "Service down" -
Monitor logs:
tail -f moondream.log -
Implement retry logic:
# With exponential backoff
for i in 1 2 4 8; do
curl http://localhost:20200/v1/detect -F "obj=button" -F "init_image=@img.png" && break
sleep $i
done -
Scale workers down:
# Temporarily reduce load
echo "MOONDREAM_WORKERS=1" > moondream/.env
Recovery Procedures
Service Restart
# Graceful restart
kill -TERM $(pgrep -f "python main.py")
sleep 5
python main.py
# Force restart
pkill -f "python main.py"
python main.py
Model Recovery
Issue: Model becomes unresponsive or returns errors
Solutions:
-
Check model status:
curl http://localhost:20200/health
# Verify model_stats show recent activity -
Restart model workers:
pkill -f "python main.py"
sleep 3
cd moondream && python main.py -
Clear GPU cache:
# In Python REPL
import torch
torch.cuda.empty_cache() -
Reduce worker count:
# Temporarily reduce load
echo "MOONDREAM_WORKERS=1" > moondream/.env
Prevention Best Practices
Service Monitoring
Health Check Script:
#!/bin/bash
# moondream_health_check.sh
MOONDREAM_URL="http://localhost:20200"
# Check basic connectivity
if ! curl -s "$MOONDREAM_URL/" > /dev/null; then
echo "$(date): MoonDream service is down"
# Send alert or restart service
exit 1
fi
# Check model health
HEALTH=$(curl -s "$MOONDREAM_URL/health")
if echo "$HEALTH" | grep -q '"status": "healthy"'; then
echo "$(date): MoonDream service healthy"
else
echo "$(date): MoonDream health check failed: $HEALTH"
exit 1
fi
Automated Monitoring:
# Add to crontab for regular checks
*/5 * * * * /path/to/moondream_health_check.sh
Resource Monitoring
System Resource Checks:
#!/bin/bash
# monitor_resources.sh
# Check memory usage
MEM_USAGE=$(ps aux --no-headers -o pmem -C python | awk '{sum+=$1} END {print sum}')
if (( $(echo "$MEM_USAGE > 80" | bc -l) )); then
echo "High memory usage: ${MEM_USAGE}%"
fi
# Check GPU usage (if applicable)
if command -v nvidia-smi &> /dev/null; then
GPU_USAGE=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits)
if [ "$GPU_USAGE" -gt 90 ]; then
echo "High GPU usage: ${GPU_USAGE}%"
fi
fi
# Check disk space for model cache
CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
DISK_USAGE=$(df "$CACHE_DIR" | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
echo "Low disk space: ${DISK_USAGE}% used"
fi
Log Management
Log Rotation Setup:
# /etc/logrotate.d/moondream
/var/log/moondream/*.log {
daily
rotate 30
compress
missingok
notifempty
create 644 moondream moondream
postrotate
systemctl reload moondream || true
endscript
}
Backup and Recovery
Model Cache Backup:
#!/bin/bash
# backup_model_cache.sh
CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
BACKUP_DIR="/backup/moondream"
# Create backup
rsync -av --delete "$CACHE_DIR/" "$BACKUP_DIR/"
# Verify backup integrity
if [ $? -eq 0 ]; then
echo "Model cache backup completed successfully"
else
echo "Model cache backup failed"
exit 1
fi
This comprehensive troubleshooting guide covers the most common issues you'll encounter with MoonDream Vision. Regular monitoring, proper logging, and following the best practices outlined here will help maintain a stable and performant vision service. 🔧