Skip to main content

MoonDream Troubleshooting

Common issues, solutions, and debugging strategies for MoonDream Vision service.

Service Startup Issues

"Module 'torch' not found"

Error:

ModuleNotFoundError: No module named 'torch'

Solutions:

# Install PyTorch for your platform
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Or for CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

"CUDA out of memory"

Error:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB

Solutions:

  1. Reduce workers:

    # moondream/.env
    MOONDREAM_WORKERS=1
  2. Use CPU mode:

    export CUDA_VISIBLE_DEVICES=""
    python main.py
  3. Clear GPU memory:

    # Kill other GPU processes
    nvidia-smi
    kill -9 <process_id>

"MPS not available on Intel Macs"

Error:

RuntimeError: MPS is not available

Solution: MoonDream will automatically fall back to CPU. For better performance, consider:

  • Upgrading to Apple Silicon (M1/M2/M3)
  • Using CPU optimization flags
  • Reducing model precision

Port Already in Use

Error:

OSError: [Errno 48] Address already in use

Solutions:

# Find process using port 20200
lsof -i :20200

# Kill the process
kill -9 <process_id>

# Or change port in main.py
# Modify uvicorn.run(port=20201)

Model Loading Issues

"Model download failed"

Error:

OSError: Couldn't reach server

Solutions:

  1. Check internet connection

  2. Use local model cache:

    export HF_HOME=/path/to/cache
    export TRANSFORMERS_CACHE=/path/to/cache
  3. Manual download:

    # Download model manually
    git lfs clone https://huggingface.co/vikhyatk/moondream2
    # Place in ~/.cache/huggingface/hub/

"Revision not found"

Error:

OSError: Revision '2025-06-21' not found

Solution: Update to the latest available revision:

# In main.py, check available revisions
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="main", # Use 'main' instead of specific date
trust_remote_code=True,
device_map=device_map
)

Slow Model Loading

Issue: Model takes >5 minutes to load

Solutions:

  1. Use faster storage:

    # Move model to SSD
    export HF_HOME=/ssd/cache
  2. Pre-download model:

    from transformers import AutoModelForCausalLM

    # Pre-download model
    AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="2025-06-21"
    )
  3. Use model sharding:

    model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    device_map="auto", # Automatic sharding
    max_memory={0: "4GB", "cpu": "8GB"}
    )

API Request Issues

422 Validation Errors

Error:

{
"detail": [
{
"loc": ["body", "obj"],
"msg": "field required",
"type": "value_error.missing"
}
]
}

Solutions:

  1. Check required parameters:

    • /v1/query requires question and init_image
    • /v1/detect requires obj and init_image
    • /v1/point requires obj and init_image
    • /v1/caption requires init_image
  2. Verify parameter names:

    # Correct
    -F "obj=login button" \
    -F "question=What do you see?"

    # Incorrect
    -F "object=login button" \
    -F "q=What do you see?"
  3. Check image file:

    # Ensure image file exists and is readable
    ls -la screenshot.png
    file screenshot.png

500 Internal Server Errors

Error:

{
"detail": "Model inference error: ..."
}

Solutions:

  1. Check model loading:

    curl http://localhost:20200/
    # Should show successful model loading
  2. Monitor worker status:

    curl http://localhost:20200/health
    # Check model_stats for errors
  3. Restart service:

    pkill -f "python main.py"
    sleep 2
    cd moondream && python main.py

Image Processing Errors

Issue: API returns errors related to image processing

Common causes:

  1. Unsupported image format:

    # Check image format
    file image.png
    # Should be: PNG, JPEG, WebP, BMP, or TIFF
  2. Corrupted image file:

    # Test image integrity
    convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted"
  3. Image too large:

    # Check file size
    ls -lh image.png
    # Should be under 10MB

Solutions:

# Resize large images
convert input.png -resize 1920x1080 output.png

# Convert to supported format
convert input.bmp output.png

# Optimize file size
convert input.png -quality 85 output.jpg

Performance Issues

Slow Response Times

Issue: Requests taking >5 seconds

Diagnosis:

# Check MoonDream logs
tail -f moondream.log

# Monitor system resources
top -p $(pgrep python)
nvidia-smi # For GPU systems

Solutions:

  1. Scale workers:

    MOONDREAM_WORKERS=2  # Increase from 1
  2. Optimize image size:

    const optimizedImage = await resizeImage(screenshot, 1024);
  3. Enable caching:

    // Use coordinate cache for repeated elements
    const cached = await getCachedCoordinates(flowId, deviceId, element);
    if (cached) return cached;

High Memory Usage

Issue: Service consuming too much RAM

Monitoring:

# Check memory usage
ps aux | grep python | head -1

# Monitor over time
watch -n 5 'ps aux | grep python'

Solutions:

  1. Reduce workers:

    MOONDREAM_WORKERS=1
  2. Enable memory optimization:

    import os
    os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
  3. Use smaller batch sizes:

    MAX_BATCH_SIZE = 1  # Reduce from default

GPU Memory Issues

Issue: CUDA out of memory during inference

Solutions:

  1. Reduce batch size:

    BATCH_SIZE = 1
  2. Use gradient checkpointing:

    model.gradient_checkpointing_enable()
  3. Clear cache between requests:

    torch.cuda.empty_cache()

Network and Connectivity Issues

Connection Refused

Error:

ConnectionError: Connection refused

Solutions:

  1. Check if service is running:

    curl http://localhost:20200/
    # Should return service status
  2. Verify port configuration:

    netstat -tlnp | grep :20200
    # Should show Python process listening on port 20200
  3. Check if port is in use:

    lsof -i :20200
    # Kill any conflicting processes
  4. Test local connectivity:

    # Test if service responds locally
    curl http://127.0.0.1:20200/
    curl http://localhost:20200/

Timeout Errors

Issue: Requests timing out

Solutions:

  1. Check service load:

    curl http://localhost:20200/health
    # Check if all workers are responding
  2. Monitor system resources:

    # Check CPU/memory usage
    top -p $(pgrep python) -n 1

    # Check GPU usage (if applicable)
    nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv
  3. Scale workers:

    # Increase MOONDREAM_WORKERS in .env
    echo "MOONDREAM_WORKERS=2" >> moondream/.env
  4. Restart service:

    pkill -f "python main.py"
    sleep 2
    cd moondream && python main.py

Integration Issues

External Application Can't Connect

Error: Connection refused or Service unavailable

Solutions:

  1. Verify service is running:

    curl http://localhost:20200/
    # Should return status information
  2. Check network accessibility:

    # Test from different interfaces
    curl http://127.0.0.1:20200/
    curl http://localhost:20200/
    curl http://0.0.0.0:20200/ # If bound to all interfaces
  3. Firewall configuration:

    # Linux
    sudo ufw allow 20200
    sudo ufw status

    # macOS - check System Preferences > Security & Privacy

API Compatibility Issues

Issue: API requests failing due to parameter mismatches

Solutions:

  1. Verify parameter names:

    # Correct parameter names
    -F "obj=button" # not "object"
    -F "question=What?" # correct
    -F "init_image=@file" # correct
  2. Check content type:

    # Must use multipart/form-data
    curl -X POST "http://localhost:20200/v1/detect" \
    -H "Content-Type: multipart/form-data" \
    -F "obj=button" \
    -F "init_image=@image.png"
  3. Validate response format:

    # All endpoints return JSON
    curl -H "Accept: application/json" http://localhost:20200/v1/point

Load Balancing Issues

Issue: Uneven load distribution or worker failures

Solutions:

  1. Check worker health:

    curl http://localhost:20200/health
    # Verify all workers have recent last_used timestamps
  2. Monitor worker statistics:

    curl http://localhost:20200/ | jq '.load_balancer_stats.model_stats'
    # Check request counts and error rates
  3. Restart problematic workers:

    # Kill and restart the service
    pkill -f "python main.py"
    sleep 3
    cd moondream && python main.py
  4. Adjust worker count:

    # In moondream/.env
    MOONDREAM_WORKERS=1 # Reduce for troubleshooting

Logging and Debugging

Enable Debug Logging

# In main.py
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('moondream_debug.log'),
logging.StreamHandler()
]
)

Request Tracing

// Add request IDs for tracing
const requestId = generateId();
console.log(`[${requestId}] Starting detection for: ${element}`);

const result = await detectObject(screenshot, element);

console.log(`[${requestId}] Detection result:`, {
found: result.found,
duration: Date.now() - startTime
});

Performance Profiling

import cProfile

# Profile a single request
cProfile.run('detect_object(image, "button")', 'profile_output.prof')

# Analyze results
import pstats
p = pstats.Stats('profile_output.prof')
p.sort_stats('cumulative').print_stats(10)

Advanced Debugging

Model Inspection

# Check model properties
print("Model device:", model.device)
print("Model dtype:", model.dtype)
print("Model parameters:", sum(p.numel() for p in model.parameters()))

# Test model with simple input
with torch.no_grad():
test_output = model(test_input)
print("Model output shape:", test_output.shape)

Memory Analysis

import torch
import gc

# Check memory usage
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")
print("GPU memory reserved:", torch.cuda.memory_reserved() / 1024**2, "MB")

# Force garbage collection
gc.collect()
torch.cuda.empty_cache()

print("After cleanup:")
print("GPU memory allocated:", torch.cuda.memory_allocated() / 1024**2, "MB")

Network Debugging

# Test network connectivity
curl -v http://localhost:20200/v1/point \
-F "object=test" \
-F "init_image=@test.png"

# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:20200/

# curl-format.txt
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
# time_starttransfer: %{time_starttransfer}\n
# time_total: %{time_total}\n

Common Error Patterns

Error: "Model inference error"

Cause: Model processing failed during inference

Solution:

  1. Check image format:

    file image.png  # Should be PNG/JPEG/WebP/BMP/TIFF
  2. Verify image integrity:

    convert image.png -resize 1x1 /dev/null 2>&1 || echo "Corrupted"
  3. Check model health:

    curl http://localhost:20200/health
    # Look for error counts in model_stats
  4. Restart service:

    pkill -f "python main.py"
    sleep 2
    cd moondream && python main.py

Error: "422 Validation Error"

Cause: Missing or invalid request parameters

Solution:

  1. Check required parameters:

    # /v1/query needs: question + init_image
    # /v1/detect needs: obj + init_image
    # /v1/point needs: obj + init_image
    # /v1/caption needs: init_image
  2. Verify parameter names:

    -F "obj=button"    # correct
    -F "object=button" # incorrect
  3. Ensure multipart/form-data:

    curl -X POST "http://localhost:20200/v1/detect" \
    -F "obj=button" \
    -F "init_image=@image.png"

Error: "413 Payload Too Large"

Cause: Image file exceeds size limits

Solution:

  1. Check file size:

    ls -lh image.png  # Should be < 10MB
  2. Resize large images:

    convert input.png -resize 1920x1080 output.png
  3. Compress images:

    convert input.png -quality 80 output.jpg

Error: "Connection reset by peer"

Cause: Service crashed or worker died during request

Solution:

  1. Check service status:

    curl http://localhost:20200/ || echo "Service down"
  2. Monitor logs:

    tail -f moondream.log
  3. Implement retry logic:

    # With exponential backoff
    for i in 1 2 4 8; do
    curl http://localhost:20200/v1/detect -F "obj=button" -F "init_image=@img.png" && break
    sleep $i
    done
  4. Scale workers down:

    # Temporarily reduce load
    echo "MOONDREAM_WORKERS=1" > moondream/.env

Recovery Procedures

Service Restart

# Graceful restart
kill -TERM $(pgrep -f "python main.py")
sleep 5
python main.py

# Force restart
pkill -f "python main.py"
python main.py

Model Recovery

Issue: Model becomes unresponsive or returns errors

Solutions:

  1. Check model status:

    curl http://localhost:20200/health
    # Verify model_stats show recent activity
  2. Restart model workers:

    pkill -f "python main.py"
    sleep 3
    cd moondream && python main.py
  3. Clear GPU cache:

    # In Python REPL
    import torch
    torch.cuda.empty_cache()
  4. Reduce worker count:

    # Temporarily reduce load
    echo "MOONDREAM_WORKERS=1" > moondream/.env

Prevention Best Practices

Service Monitoring

Health Check Script:

#!/bin/bash
# moondream_health_check.sh

MOONDREAM_URL="http://localhost:20200"

# Check basic connectivity
if ! curl -s "$MOONDREAM_URL/" > /dev/null; then
echo "$(date): MoonDream service is down"
# Send alert or restart service
exit 1
fi

# Check model health
HEALTH=$(curl -s "$MOONDREAM_URL/health")
if echo "$HEALTH" | grep -q '"status": "healthy"'; then
echo "$(date): MoonDream service healthy"
else
echo "$(date): MoonDream health check failed: $HEALTH"
exit 1
fi

Automated Monitoring:

# Add to crontab for regular checks
*/5 * * * * /path/to/moondream_health_check.sh

Resource Monitoring

System Resource Checks:

#!/bin/bash
# monitor_resources.sh

# Check memory usage
MEM_USAGE=$(ps aux --no-headers -o pmem -C python | awk '{sum+=$1} END {print sum}')
if (( $(echo "$MEM_USAGE > 80" | bc -l) )); then
echo "High memory usage: ${MEM_USAGE}%"
fi

# Check GPU usage (if applicable)
if command -v nvidia-smi &> /dev/null; then
GPU_USAGE=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits)
if [ "$GPU_USAGE" -gt 90 ]; then
echo "High GPU usage: ${GPU_USAGE}%"
fi
fi

# Check disk space for model cache
CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
DISK_USAGE=$(df "$CACHE_DIR" | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
echo "Low disk space: ${DISK_USAGE}% used"
fi

Log Management

Log Rotation Setup:

# /etc/logrotate.d/moondream
/var/log/moondream/*.log {
daily
rotate 30
compress
missingok
notifempty
create 644 moondream moondream
postrotate
systemctl reload moondream || true
endscript
}

Backup and Recovery

Model Cache Backup:

#!/bin/bash
# backup_model_cache.sh

CACHE_DIR="${HF_HOME:-$HOME/.cache/huggingface}"
BACKUP_DIR="/backup/moondream"

# Create backup
rsync -av --delete "$CACHE_DIR/" "$BACKUP_DIR/"

# Verify backup integrity
if [ $? -eq 0 ]; then
echo "Model cache backup completed successfully"
else
echo "Model cache backup failed"
exit 1
fi

This comprehensive troubleshooting guide covers the most common issues you'll encounter with MoonDream Vision. Regular monitoring, proper logging, and following the best practices outlined here will help maintain a stable and performant vision service. 🔧