Skip to main content

MoonDream Setup Guide

Complete installation and configuration guide for MoonDream Vision API in the FYI Automation Tool.

Prerequisites

System Requirements

Hardware

  • RAM: Minimum 8GB, Recommended 16GB+
  • Storage: 5GB free space for models
  • GPU: Optional but recommended (NVIDIA CUDA or Apple MPS)

Software

  • Python: 3.13 or newer
  • pip: Latest version
  • Git: For cloning repositories

Operating System

  • macOS: 12.0+ (MPS acceleration on Apple Silicon)
  • Linux: Ubuntu 20.04+, CentOS 7+, or equivalent
  • Windows: 10+ with WSL2 (recommended)

Installation

1. Clone and Setup

# Navigate to project directory
cd /path/to/fyi-automation-tool

# Go to MoonDream directory
cd moondream

# Create virtual environment
python -m venv .venv

# Activate virtual environment
source .venv/bin/activate # macOS/Linux
# OR
.venv\Scripts\activate # Windows

2. Install Dependencies

# Install Python packages
pip install -r requirements.txt

# Verify installation
pip list | grep -E "(torch|transformers|fastapi)"

3. Configure Environment

# Copy sample configuration
cp sample.env .env

# Edit configuration (optional)
nano .env

Configuration Options:

# Number of model instances (workers)
MOONDREAM_WORKERS=1

4. Start MoonDream Service

# Start the server
python main.py

# Expected output:
# ✓ MPS (Metal Performance Shaders) detected - using macOS GPU acceleration
# Loading 1 Moondream2 model instances on mps...
# Model instance 0 loaded successfully on mps!
# Starting MoonDream API server on http://localhost:20200

Hardware Acceleration Setup

macOS (Apple Silicon)

MoonDream automatically detects and uses MPS (Metal Performance Shaders):

# Verify MPS availability
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"

Expected Output:

MPS available: True

NVIDIA GPU (CUDA)

For NVIDIA GPUs, ensure CUDA is properly installed:

# Check CUDA availability
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"

# Check GPU info
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')"

Expected Output:

CUDA available: True
GPU: NVIDIA GeForce RTX 3080

CPU-Only Setup

If no GPU is available, MoonDream will automatically fall back to CPU:

# Verify CPU setup
python -c "import torch; print('CPU threads:', torch.get_num_threads())"

Expected Output:

CPU threads: 8
⚠ No GPU acceleration available - using CPU (this will be slower)

FYI Automation Tool Integration

1. Configure Backend

Edit server/.env:

# Vision Service Configuration
MOONDREAM_URL=http://localhost:20200
MOONDREAM_MAX_VARIATIONS=2
MOONDREAM_API_KEY= # Optional, leave empty for local setup

Use Hosted MoonDream (Quickstart)

Use the managed endpoint in production or to get started fast:

# Hosted production endpoint
MOONDREAM_URL=https://moondream.fyi.ai:9443/

2. Verify Connection

# Test MoonDream service health
curl http://localhost:20200/

# Expected response:
{
"status": "healthy",
"model": "moondream2",
"workers": 1,
"device": "mps",
"requests_processed": 0
}

3. Test Vision Integration

# Test FYI backend vision service
curl http://localhost:3001/api/automation/vision/info

# Expected response:
{
"status": "ready",
"model": "moondream",
"endpoint": "http://localhost:20200",
"capabilities": ["object_detection", "text_recognition", "ui_element_detection"]
}

Advanced Configuration

Multi-Worker Setup

For high-throughput scenarios, run multiple model instances:

# Edit moondream/.env
MOONDREAM_WORKERS=4

Considerations:

  • Each worker loads a separate model instance
  • Memory usage: ~2GB per worker
  • Best for concurrent request handling
  • Monitor system resources

Production Deployment

Systemd Service (Linux)

# Create service file
sudo nano /etc/systemd/system/moondream.service
[Unit]
Description=MoonDream Vision API
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/fyi-automation-tool/moondream
ExecStart=/path/to/fyi-automation-tool/moondream/.venv/bin/python main.py
Restart=always

[Install]
WantedBy=multi-user.target
# Enable and start service
sudo systemctl enable moondream
sudo systemctl start moondream
sudo systemctl status moondream

Windows Service

# Install NSSM (Non-Sucking Service Manager)
# Download from https://nssm.cc/

# Create service
nssm install MoonDream "C:\path\to\fyi-automation-tool\moondream\.venv\Scripts\python.exe"
nssm set MoonDream AppParameters "main.py"
nssm set MoonDream AppDirectory "C:\path\to\fyi-automation-tool\moondream"

# Start service
nssm start MoonDream

Networking Configuration

Local Development

  • Default: http://localhost:20200
  • LAN Access: http://0.0.0.0:20200
  • Custom Port: Modify main.py port configuration

Production Setup

# In main.py, modify uvicorn configuration
if __name__ == "__main__":
uvicorn.run(
"main:app",
host="0.0.0.0", # Allow external connections
port=20200,
workers=1
)

Alternatively, use the hosted service:

https://moondream.fyi.ai:9443/

Firewall Configuration

Linux (ufw)

sudo ufw allow 20200
sudo ufw status

macOS

# Allow incoming connections in System Preferences > Security & Privacy
# Or use pfctl for advanced configuration

Performance Tuning

Memory Optimization

# In main.py, add memory optimization
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

Model Loading Optimization

# Configure model loading parameters
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-06-21",
trust_remote_code=True,
device_map=device_map,
torch_dtype=torch.float16, # Use half precision
low_cpu_mem_usage=True # Optimize CPU memory
)

Request Batching

# Configure batch processing
MAX_BATCH_SIZE = 4
TIMEOUT_SECONDS = 30

Monitoring and Logging

Enable Detailed Logging

# In main.py
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('moondream.log'),
logging.StreamHandler()
]
)

Health Monitoring

# Monitor service health
curl http://localhost:20200/

# Check system resources
top -p $(pgrep -f "python main.py")

# Monitor GPU usage (if applicable)
nvidia-smi

Log Analysis

# View recent logs
tail -f moondream.log

# Search for errors
grep "ERROR" moondream.log

# Performance analysis
grep "inference_time" moondream.log | awk '{sum+=$2} END {print "Avg:", sum/NR}'

Troubleshooting Setup Issues

Model Loading Failures

Issue: CUDA out of memory

# Solution: Reduce workers or use CPU
MOONDREAM_WORKERS=1
# Or force CPU usage
export CUDA_VISIBLE_DEVICES=""

Issue: MPS not available on older macOS

# Check macOS version
sw_vers

# Solution: Update macOS or use CPU mode

Port Conflicts

Issue: Port 20200 already in use

# Find process using port
lsof -i :20200

# Kill conflicting process
kill -9 <PID>

# Or change port in main.py

Import Errors

Issue: ModuleNotFoundError

# Reinstall dependencies
pip uninstall -r requirements.txt
pip install -r requirements.txt

# Check Python path
python -c "import sys; print(sys.path)"

Performance Issues

Issue: Slow response times

# Check hardware utilization
top # CPU
nvidia-smi # GPU

# Optimize settings
MOONDREAM_WORKERS=1 # Reduce workers
# Use smaller batch sizes

Verification Checklist

  • Python 3.13+ installed
  • Virtual environment activated
  • Dependencies installed successfully
  • MoonDream service starts without errors
  • Hardware acceleration working (MPS/CUDA/CPU)
  • Health endpoint responds correctly
  • FYI backend can connect to MoonDream
  • Vision service tests pass
  • Automation flows work with computer vision

Next Steps

Once MoonDream is set up and running:

  1. Test the API: Verify all endpoints work correctly
  2. Integrate with FYI: Learn how vision features work in automation
  3. Optimize Performance: Fine-tune for your hardware
  4. Monitor Usage: Set up logging and monitoring