MoonDream Setup Guide
Complete installation and configuration guide for MoonDream Vision API in the FYI Automation Tool.
- Quickstart: Use Hosted MoonDream
Prerequisites
System Requirements
Hardware
- RAM: Minimum 8GB, Recommended 16GB+
- Storage: 5GB free space for models
- GPU: Optional but recommended (NVIDIA CUDA or Apple MPS)
Software
- Python: 3.13 or newer
- pip: Latest version
- Git: For cloning repositories
Operating System
- macOS: 12.0+ (MPS acceleration on Apple Silicon)
- Linux: Ubuntu 20.04+, CentOS 7+, or equivalent
- Windows: 10+ with WSL2 (recommended)
Installation
1. Clone and Setup
# Navigate to project directory
cd /path/to/fyi-automation-tool
# Go to MoonDream directory
cd moondream
# Create virtual environment
python -m venv .venv
# Activate virtual environment
source .venv/bin/activate # macOS/Linux
# OR
.venv\Scripts\activate # Windows
2. Install Dependencies
# Install Python packages
pip install -r requirements.txt
# Verify installation
pip list | grep -E "(torch|transformers|fastapi)"
3. Configure Environment
# Copy sample configuration
cp sample.env .env
# Edit configuration (optional)
nano .env
Configuration Options:
# Number of model instances (workers)
MOONDREAM_WORKERS=1
4. Start MoonDream Service
# Start the server
python main.py
# Expected output:
# ✓ MPS (Metal Performance Shaders) detected - using macOS GPU acceleration
# Loading 1 Moondream2 model instances on mps...
# Model instance 0 loaded successfully on mps!
# Starting MoonDream API server on http://localhost:20200
Hardware Acceleration Setup
macOS (Apple Silicon)
MoonDream automatically detects and uses MPS (Metal Performance Shaders):
# Verify MPS availability
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"
Expected Output:
MPS available: True
NVIDIA GPU (CUDA)
For NVIDIA GPUs, ensure CUDA is properly installed:
# Check CUDA availability
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
# Check GPU info
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')"
Expected Output:
CUDA available: True
GPU: NVIDIA GeForce RTX 3080
CPU-Only Setup
If no GPU is available, MoonDream will automatically fall back to CPU:
# Verify CPU setup
python -c "import torch; print('CPU threads:', torch.get_num_threads())"
Expected Output:
CPU threads: 8
⚠ No GPU acceleration available - using CPU (this will be slower)
FYI Automation Tool Integration
1. Configure Backend
Edit server/.env:
# Vision Service Configuration
MOONDREAM_URL=http://localhost:20200
MOONDREAM_MAX_VARIATIONS=2
MOONDREAM_API_KEY= # Optional, leave empty for local setup
Use Hosted MoonDream (Quickstart)
Use the managed endpoint in production or to get started fast:
# Hosted production endpoint
MOONDREAM_URL=https://moondream.fyi.ai:9443/
2. Verify Connection
# Test MoonDream service health
curl http://localhost:20200/
# Expected response:
{
"status": "healthy",
"model": "moondream2",
"workers": 1,
"device": "mps",
"requests_processed": 0
}
3. Test Vision Integration
# Test FYI backend vision service
curl http://localhost:3001/api/automation/vision/info
# Expected response:
{
"status": "ready",
"model": "moondream",
"endpoint": "http://localhost:20200",
"capabilities": ["object_detection", "text_recognition", "ui_element_detection"]
}
Advanced Configuration
Multi-Worker Setup
For high-throughput scenarios, run multiple model instances:
# Edit moondream/.env
MOONDREAM_WORKERS=4
Considerations:
- Each worker loads a separate model instance
- Memory usage: ~2GB per worker
- Best for concurrent request handling
- Monitor system resources
Production Deployment
Systemd Service (Linux)
# Create service file
sudo nano /etc/systemd/system/moondream.service
[Unit]
Description=MoonDream Vision API
After=network.target
[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/fyi-automation-tool/moondream
ExecStart=/path/to/fyi-automation-tool/moondream/.venv/bin/python main.py
Restart=always
[Install]
WantedBy=multi-user.target
# Enable and start service
sudo systemctl enable moondream
sudo systemctl start moondream
sudo systemctl status moondream
Windows Service
# Install NSSM (Non-Sucking Service Manager)
# Download from https://nssm.cc/
# Create service
nssm install MoonDream "C:\path\to\fyi-automation-tool\moondream\.venv\Scripts\python.exe"
nssm set MoonDream AppParameters "main.py"
nssm set MoonDream AppDirectory "C:\path\to\fyi-automation-tool\moondream"
# Start service
nssm start MoonDream
Networking Configuration
Local Development
- Default:
http://localhost:20200 - LAN Access:
http://0.0.0.0:20200 - Custom Port: Modify
main.pyport configuration
Production Setup
# In main.py, modify uvicorn configuration
if __name__ == "__main__":
uvicorn.run(
"main:app",
host="0.0.0.0", # Allow external connections
port=20200,
workers=1
)
Alternatively, use the hosted service:
https://moondream.fyi.ai:9443/
Firewall Configuration
Linux (ufw)
sudo ufw allow 20200
sudo ufw status
macOS
# Allow incoming connections in System Preferences > Security & Privacy
# Or use pfctl for advanced configuration
Performance Tuning
Memory Optimization
# In main.py, add memory optimization
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
Model Loading Optimization
# Configure model loading parameters
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-06-21",
trust_remote_code=True,
device_map=device_map,
torch_dtype=torch.float16, # Use half precision
low_cpu_mem_usage=True # Optimize CPU memory
)
Request Batching
# Configure batch processing
MAX_BATCH_SIZE = 4
TIMEOUT_SECONDS = 30
Monitoring and Logging
Enable Detailed Logging
# In main.py
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('moondream.log'),
logging.StreamHandler()
]
)
Health Monitoring
# Monitor service health
curl http://localhost:20200/
# Check system resources
top -p $(pgrep -f "python main.py")
# Monitor GPU usage (if applicable)
nvidia-smi
Log Analysis
# View recent logs
tail -f moondream.log
# Search for errors
grep "ERROR" moondream.log
# Performance analysis
grep "inference_time" moondream.log | awk '{sum+=$2} END {print "Avg:", sum/NR}'
Troubleshooting Setup Issues
Model Loading Failures
Issue: CUDA out of memory
# Solution: Reduce workers or use CPU
MOONDREAM_WORKERS=1
# Or force CPU usage
export CUDA_VISIBLE_DEVICES=""
Issue: MPS not available on older macOS
# Check macOS version
sw_vers
# Solution: Update macOS or use CPU mode
Port Conflicts
Issue: Port 20200 already in use
# Find process using port
lsof -i :20200
# Kill conflicting process
kill -9 <PID>
# Or change port in main.py
Import Errors
Issue: ModuleNotFoundError
# Reinstall dependencies
pip uninstall -r requirements.txt
pip install -r requirements.txt
# Check Python path
python -c "import sys; print(sys.path)"
Performance Issues
Issue: Slow response times
# Check hardware utilization
top # CPU
nvidia-smi # GPU
# Optimize settings
MOONDREAM_WORKERS=1 # Reduce workers
# Use smaller batch sizes
Verification Checklist
- Python 3.13+ installed
- Virtual environment activated
- Dependencies installed successfully
- MoonDream service starts without errors
- Hardware acceleration working (MPS/CUDA/CPU)
- Health endpoint responds correctly
- FYI backend can connect to MoonDream
- Vision service tests pass
- Automation flows work with computer vision
Next Steps
Once MoonDream is set up and running:
- Test the API: Verify all endpoints work correctly
- Integrate with FYI: Learn how vision features work in automation
- Optimize Performance: Fine-tune for your hardware
- Monitor Usage: Set up logging and monitoring