MoonDream Troubleshooting

Common issues, solutions, and debugging strategies for MoonDream Vision service.

Service Startup Issues

"Module 'torch' not found"

Error:

ModuleNotFoundError: No module named 'torch'

Solutions:

# Install PyTorch for your platform
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Or for CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

"CUDA out of memory"

Error:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB

Solutions:

Reduce workers:
```
# moondream/.env
MOONDREAM_WORKERS=1
```

Use CPU mode:

export CUDA_VISIBLE_DEVICES=""
python main.py

Clear GPU memory:

# Kill other GPU processes
nvidia-smi
kill -9 <process_id>

"MPS not available on Intel Macs"

Error:

RuntimeError: MPS is not available

Solution: MoonDream will automatically fall back to CPU. For better performance, consider:

Upgrading to Apple Silicon (M1/M2/M3)
Using CPU optimization flags
Reducing model precision

Port Already in Use

Error:

OSError: [Errno 48] Address already in use

Solutions:

# Find process using port 20200
lsof -i :20200

# Kill the process
kill -9 <process_id>

# Or change port in main.py
# Modify uvicorn.run(port=20201)

Model Loading Issues

"Model download failed"

Error:

OSError: Couldn't reach server

Solutions:

Check internet connection

Use local model cache:

export HF_HOME=/path/to/cache
export TRANSFORMERS_CACHE=/path/to/cache

Manual download:

# Download model manually
git lfs clone https://huggingface.co/vikhyatk/moondream2
# Place in ~/.cache/huggingface/hub/

"Revision not found"

Error:

OSError: Revision '2025-06-21' not found

Solution: Update to the latest available revision:

# In main.py, check available revisions
model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="main",  # Use 'main' instead of specific date
    trust_remote_code=True,
    device_map=device_map
)

Slow Model Loading

Issue: Model takes >5 minutes to load

Solutions:

Use faster storage:

# Move model to SSD
export HF_HOME=/ssd/cache

Pre-download model:

from transformers import AutoModelForCausalLM

# Pre-download model
AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="2025-06-21"
)

Use model sharding:

model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    device_map="auto",  # Automatic sharding
    max_memory={0: "4GB", "cpu": "8GB"}
)

API Request Issues

422 Validation Errors

Error:

{
  "detail": [
    {
      "loc": ["body", "obj"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

Solutions:

Check required parameters:
- /v1/query requires question and init_image
- /v1/detect requires obj and init_image
- /v1/point requires obj and init_image
- /v1/caption requires init_image

Verify parameter names:

# Correct
-F "obj=login button" \
-F "question=What do you see?"

# Incorrect
-F "object=login button" \
-F "q=What do you see?"

Check image file:

# Ensure image file exists and is readable
ls -la screenshot.png
file screenshot.png

500 Internal Server Errors

Error:

{
  "detail": "Model inference error: ..."
}

Solutions:

Check model loading:

curl http://localhost:20200/
# Should show successful model loading

Monitor worker status:

curl http://localhost:20200/health
# Check model_stats for errors

Restart service:

pkill -f "python main.py"
sleep 2
cd moondream && python main.py

Image Processing Errors

Issue: API returns errors related to image processing

Common causes:

Unsupported image format:

# Check image format
file image.png
# Should be: PNG, JPEG, WebP, BMP, or TIFF

Corrupted image file:

# Test image integrity
convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted"

Image too large:

# Check file size
ls -lh image.png
# Should be under 10MB

Solutions:

# Resize large images
convert input.png -resize 1920x1080 output.png

# Convert to supported format
convert input.bmp output.png

# Optimize file size
convert input.png -quality 85 output.jpg

Performance Issues

Slow Response Times

Issue: Requests taking >5 seconds

Diagnosis:

# Check MoonDream logs
tail -f moondream.log

# Monitor system resources
top -p $(pgrep python)
nvidia-smi  # For GPU systems

Solutions:

Scale workers:
```
MOONDREAM_WORKERS=2  # Increase from 1
```

Optimize image size:

const optimizedImage = await resizeImage(screenshot, 1024);

Enable caching:

// Use coordinate cache for repeated elements
const cached = await getCachedCoordinates(flowId, deviceId, element);
if (cached) return cached;

High Memory Usage

Issue: Service consuming too much RAM

Monitoring:

# Check memory usage
ps aux | grep python | head -1

# Monitor over time
watch -n 5 'ps aux | grep python'

Solutions:

Reduce workers:
```
MOONDREAM_WORKERS=1
```

Enable memory optimization:

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

Use smaller batch sizes:

MAX_BATCH_SIZE = 1  # Reduce from default

GPU Memory Issues

Issue: CUDA out of memory during inference

Solutions:

Reduce batch size:
```
BATCH_SIZE = 1
```
Use gradient checkpointing:
```
model.gradient_checkpointing_enable()
```
Clear cache between requests:
```
torch.cuda.empty_cache()
```

Network and Connectivity Issues

Connection Refused

Error:

ConnectionError: Connection refused

Solutions:

Check if service is running:

curl http://localhost:20200/
# Should return service status

Verify port configuration:

netstat -tlnp | grep :20200
# Should show Python process listening on port 20200

Check if port is in use:

lsof -i :20200
# Kill any conflicting processes

Test local connectivity:

# Test if service responds locally
curl http://127.0.0.1:20200/
curl http://localhost:20200/

Timeout Errors

Issue: Requests timing out

Solutions:

Check service load:

curl http://localhost:20200/health
# Check if all workers are responding

Monitor system resources:

# Check CPU/memory usage
top -p $(pgrep python) -n 1

# Check GPU usage (if applicable)
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

Scale workers:

# Increase MOONDREAM_WORKERS in .env
echo "MOONDREAM_WORKERS=2" >> moondream/.env

Restart service:

pkill -f "python main.py"
sleep 2
cd moondream && python main.py

Integration Issues

External Application Can't Connect

Error: Connection refused or Service unavailable

Solutions:

Verify service is running:

curl http://localhost:20200/
# Should return status information

Check network accessibility:

# Test from different interfaces
curl http://127.0.0.1:20200/
curl http://localhost:20200/
curl http://0.0.0.0:20200/  # If bound to all interfaces

Firewall configuration:

# Linux
sudo ufw allow 20200
sudo ufw status

# macOS - check System Preferences > Security & Privacy

API Compatibility Issues

Issue: API requests failing due to parameter mismatches

Solutions:

Verify parameter names:

# Correct parameter names
-F "obj=button"          # not "object"
-F "question=What?"      # correct
-F "init_image=@file"    # correct

Check content type:

# Must use multipart/form-data
curl -X POST "http://localhost:20200/v1/detect" \
  -H "Content-Type: multipart/form-data" \
  -F "obj=button" \
  -F "init_image=@image.png"

Validate response format:

# All endpoints return JSON
curl -H "Accept: application/json" http://localhost:20200/v1/point

Load Balancing Issues

Issue: Uneven load distribution or worker failures

Solutions:

Check worker health:

curl http://localhost:20200/health
# Verify all workers have recent last_used timestamps

Monitor worker statistics:

curl http://localhost:20200/ | jq '.load_balancer_stats.model_stats'
# Check request counts and error rates

Restart problematic workers:

# Kill and restart the service
pkill -f "python main.py"
sleep 3
cd moondream && python main.py

Adjust worker count:

# In moondream/.env
MOONDREAM_WORKERS=1  # Reduce for troubleshooting

Logging and Debugging

Enable Debug Logging

# In main.py
import logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('moondream_debug.log'),
        logging.StreamHandler()
    ]
)

Request Tracing

// Add request IDs for tracing
const requestId = generateId();
console.log(`[${requestId}] Starting detection for: ${element}`);

const result = await detectObject(screenshot, element);

console.log(`[${requestId}] Detection result:`, {
  found: result.found,
  duration: Date.now() - startTime
});

Performance Profiling

import cProfile

# Profile a single request
cProfile.run('detect_object(image, "button")', 'profile_output.prof')

# Analyze results
import pstats
p = pstats.Stats('profile_output.prof')
p.sort_stats('cumulative').print_stats(10)

Advanced Debugging

Model Inspection

# Check model properties
print("Model device:", model.device)
print("Model dtype:", model.dtype)
print("Model parameters:", sum(p.numel() for p in model.parameters()))

# Test model with simple input
with torch.no_grad():
    test_output = model(test_input)
    print("Model output shape:", test_output.shape)

Memory Analysis

import torch
import gc

# Check memory usage
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")
print("GPU memory reserved:", torch.cuda.memory_reserved() / 1024**2, "MB")

# Force garbage collection
gc.collect()
torch.cuda.empty_cache()

print("After cleanup:")
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")

Network Debugging

# Test network connectivity
curl -v http://localhost:20200/v1/point \
  -F "object=test" \
  -F "init_image=@test.png"

# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:20200/

# curl-format.txt
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
# time_starttransfer: %{time_starttransfer}\n
# time_total: %{time_total}\n

Common Error Patterns

Error: "Model inference error"

Cause: Model processing failed during inference

Solution:

Check image format:

file image.png  # Should be PNG/JPEG/WebP/BMP/TIFF

Verify image integrity:

convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted"

Check model health:

curl http://localhost:20200/health
# Look for error counts in model_stats

Restart service:

pkill -f "python main.py"
sleep 2
cd moondream && python main.py

Error: "422 Validation Error"

Cause: Missing or invalid request parameters

Solution:

Check required parameters:

# /v1/query needs: question + init_image
# /v1/detect needs: obj + init_image
# /v1/point needs: obj + init_image
# /v1/caption needs: init_image

Verify parameter names:

-F "obj=button"    # correct
-F "object=button" # incorrect

Ensure multipart/form-data:

curl -X POST "http://localhost:20200/v1/detect" \
  -F "obj=button" \
  -F "init_image=@image.png"

Error: "413 Payload Too Large"

Cause: Image file exceeds size limits

Solution:

Check file size:
```
ls -lh image.png  # Should be < 10MB
```

Resize large images:

convert input.png -resize 1920x1080 output.png

Compress images:

convert input.png -quality 80 output.jpg

Error: "Connection reset by peer"

Cause: Service crashed or worker died during request

Solution:

Check service status:

curl http://localhost:20200/ || echo "Service down"

Monitor logs:
```
tail -f moondream.log
```

Implement retry logic:

# With exponential backoff
for i in 1 2 4 8; do
  curl http://localhost:20200/v1/detect -F "obj=button" -F "init_image=@img.png" && break
  sleep $i
done

Scale workers down:

# Temporarily reduce load
echo "MOONDREAM_WORKERS=1" > moondream/.env

Recovery Procedures

Service Restart

# Graceful restart
kill -TERM $(pgrep -f "python main.py")
sleep 5
python main.py

# Force restart
pkill -f "python main.py"
python main.py

Model Recovery

Issue: Model becomes unresponsive or returns errors

Solutions:

Check model status:

curl http://localhost:20200/health
# Verify model_stats show recent activity

Restart model workers:

pkill -f "python main.py"
sleep 3
cd moondream && python main.py

Clear GPU cache:

# In Python REPL
import torch
torch.cuda.empty_cache()

Reduce worker count:

# Temporarily reduce load
echo "MOONDREAM_WORKERS=1" > moondream/.env

Prevention Best Practices

Service Monitoring

Health Check Script:

#!/bin/bash
# moondream_health_check.sh

MOONDREAM_URL="http://localhost:20200"

# Check basic connectivity
if ! curl -s "$MOONDREAM_URL/" > /dev/null; then
  echo "$(date): MoonDream service is down"
  # Send alert or restart service
  exit 1
fi

# Check model health
HEALTH=$(curl -s "$MOONDREAM_URL/health")
if echo "$HEALTH" | grep -q '"status": "healthy"'; then
  echo "$(date): MoonDream service healthy"
else
  echo "$(date): MoonDream health check failed: $HEALTH"
  exit 1
fi

Automated Monitoring:

# Add to crontab for regular checks
*/5 * * * * /path/to/moondream_health_check.sh

Resource Monitoring

System Resource Checks:

#!/bin/bash
# monitor_resources.sh

# Check memory usage
MEM_USAGE=$(ps aux --no-headers -o pmem -C python | awk '{sum+=$1} END {print sum}')
if (( $(echo "$MEM_USAGE > 80" | bc -l) )); then
  echo "High memory usage: ${MEM_USAGE}%"
fi

# Check GPU usage (if applicable)
if command -v nvidia-smi &> /dev/null; then
  GPU_USAGE=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits)
  if [ "$GPU_USAGE" -gt 90 ]; then
    echo "High GPU usage: ${GPU_USAGE}%"
  fi
fi

# Check disk space for model cache
CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
DISK_USAGE=$(df "$CACHE_DIR" | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
  echo "Low disk space: ${DISK_USAGE}% used"
fi

Log Management

Log Rotation Setup:

# /etc/logrotate.d/moondream
/var/log/moondream/*.log {
  daily
  rotate 30
  compress
  missingok
  notifempty
  create 644 moondream moondream
  postrotate
    systemctl reload moondream || true
  endscript
}

Backup and Recovery

Model Cache Backup:

#!/bin/bash
# backup_model_cache.sh

CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
BACKUP_DIR="/backup/moondream"

# Create backup
rsync -av --delete "$CACHE_DIR/" "$BACKUP_DIR/"

# Verify backup integrity
if [ $? -eq 0 ]; then
  echo "Model cache backup completed successfully"
else
  echo "Model cache backup failed"
  exit 1
fi

This comprehensive troubleshooting guide covers the most common issues you'll encounter with MoonDream Vision. Regular monitoring, proper logging, and following the best practices outlined here will help maintain a stable and performant vision service. 🔧

Service Startup Issues​

"Module 'torch' not found"​

"CUDA out of memory"​

"MPS not available on Intel Macs"​

Port Already in Use​

Model Loading Issues​

"Model download failed"​

"Revision not found"​

Slow Model Loading​

API Request Issues​

422 Validation Errors​

500 Internal Server Errors​

Image Processing Errors​

Performance Issues​

Slow Response Times​

High Memory Usage​

GPU Memory Issues​

Network and Connectivity Issues​

Connection Refused​

Timeout Errors​

Integration Issues​

External Application Can't Connect​

API Compatibility Issues​

Load Balancing Issues​

Logging and Debugging​

Enable Debug Logging​

Request Tracing​

Performance Profiling​

Advanced Debugging​

Model Inspection​

Memory Analysis​

Network Debugging​

Common Error Patterns​

Error: "Model inference error"​

Error: "422 Validation Error"​

Error: "413 Payload Too Large"​

Error: "Connection reset by peer"​

Recovery Procedures​

Service Restart​

Model Recovery​

Prevention Best Practices​

Service Monitoring​

Resource Monitoring​

Log Management​

Backup and Recovery​

Service Startup Issues

"Module 'torch' not found"

"CUDA out of memory"

"MPS not available on Intel Macs"

Port Already in Use

Model Loading Issues

"Model download failed"

"Revision not found"

Slow Model Loading

API Request Issues

422 Validation Errors

500 Internal Server Errors

Image Processing Errors

Performance Issues

Slow Response Times

High Memory Usage

GPU Memory Issues

Network and Connectivity Issues

Connection Refused

Timeout Errors

Integration Issues

External Application Can't Connect

API Compatibility Issues

Load Balancing Issues

Logging and Debugging

Enable Debug Logging

Request Tracing

Performance Profiling

Advanced Debugging

Model Inspection

Memory Analysis

Network Debugging

Common Error Patterns

Error: "Model inference error"

Error: "422 Validation Error"

Error: "413 Payload Too Large"

Error: "Connection reset by peer"

Recovery Procedures

Service Restart

Model Recovery

Prevention Best Practices

Service Monitoring

Resource Monitoring

Log Management

Backup and Recovery