MoonDream Vision Overview

MoonDream is a tiny, fast vision language model that provides computer vision capabilities for the FYI Automation Tool. It enables automated UI element detection, screen analysis, and natural language queries about visual content.

What is MoonDream?

MoonDream is a compact vision-language model that can:

Answer questions about images
Detect objects and UI elements
Point to specific locations in images
Run efficiently on various hardware (CPU, GPU, MPS)

Integration with FYI Automation Tool

MoonDream powers the computer vision capabilities in the FYI Automation Tool by:

🤖 AI Automation Features

UI Element Detection: Automatically finds buttons, inputs, and interactive elements
Screen Analysis: Understands what's visible on device screens
Natural Language Commands: Interprets commands like "tap the login button"
Coordinate Caching: Speeds up repeated operations by remembering element locations

📱 Device Automation

Screenshot Analysis: Processes device screenshots in real-time
Gesture Targeting: Provides precise coordinates for touch interactions
Validation: Verifies screen state and element presence
Error Recovery: Detects when UI elements are missing or changed

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Android Device│────│  FYI Backend     │────│  MoonDream API  │
│   (Screenshot)  │    │  (Vision Service)│    │  (Computer      │
│                 │    │                  │    │   Vision)       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   Coordinate    │
                       │   Cache DB      │
                       │   (Performance) │
                       └─────────────────┘

Key Features

🚀 Performance

Tiny Model: Only ~1.8GB, runs anywhere
Fast Inference: Sub-second responses
Hardware Acceleration: Supports MPS (macOS), CUDA (NVIDIA), and CPU
Load Balancing: Multiple model instances for concurrent requests

🎯 Accuracy

UI Element Detection: High precision for mobile app elements
Context Awareness: Understands app layouts and hierarchies
Variation Handling: Adapts to different themes and orientations
Confidence Scoring: Provides reliability metrics

🔧 Integration

REST API: Simple HTTP endpoints
Multi-format Support: Handles various image formats
Error Handling: Robust failure recovery
Monitoring: Health checks and performance metrics

Use Cases

Mobile App Testing

// Detect login button
const result = await detectObject("login_button", screenshot);

// Result contains precise coordinates for automation
{
  found: true,
  coordinates: { x: 540, y: 1200 },
  confidence: 0.95
}

Screen Validation

// Verify expected elements are present
const validation = await validateScreen(screenshot, "welcome message");

if (validation.found) {
  console.log("Screen state is correct");
}

Natural Language Commands

// Convert natural language to coordinates
const coords = await processCommand("tap the search icon", screenshot);

// Use coordinates for device automation
await device.tap(coords.x, coords.y);

Quick Start

1. Start MoonDream Service

cd moondream
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.py

2. Configure FYI Tool

# In server/.env
MOONDREAM_URL=http://localhost:20200
MOONDREAM_MAX_VARIATIONS=2

3. Test Integration

# Health check
curl http://localhost:20200/

# Test detection
curl -X POST "http://localhost:20200/v1/point" \
  -F "object=button" \
  -F "init_image=@screenshot.png"

API Endpoints

Endpoint	Method	Description
`GET /`	Health check with statistics
`POST /v1/query`	Visual question answering
`POST /v1/detect`	Object detection
`POST /v1/point`	Point detection for UI elements

Configuration Options

Environment Variables

# MoonDream Service (moondream/.env)
MOONDREAM_WORKERS=1          # Number of model instances

# FYI Backend (server/.env)
MOONDREAM_URL=http://localhost:20200    # API endpoint
MOONDREAM_MAX_VARIATIONS=2              # Detection variations
MOONDREAM_API_KEY=                     # Optional API key

Hardware Optimization

macOS (MPS)

Automatically detected
Metal Performance Shaders acceleration
Best performance on Apple Silicon

NVIDIA GPU (CUDA)

Automatic CUDA detection
Parallel processing on GPU cores
Maximum throughput for batch processing

CPU Fallback

Works on any system
Slower but reliable
Good for development/testing

Best Practices

Performance Optimization

Load Balancing: Use multiple workers for concurrent requests
Caching: Enable coordinate caching for repeated elements
Image Size: Resize large screenshots before processing
Batch Processing: Group similar detection requests

Accuracy Improvement

Clear Screenshots: Ensure good image quality
Descriptive Labels: Use specific element descriptions
Context Awareness: Include surrounding elements in queries
Confidence Thresholds: Adjust based on use case requirements

Error Handling

Fallback Strategies: Implement coordinate fallbacks
Retry Logic: Handle temporary service unavailability
Validation: Always validate detection results
Monitoring: Track detection success rates

Troubleshooting

Common Issues

Service Won't Start

# Check Python version
python --version  # Should be 3.13+

# Verify dependencies
pip list | grep torch

# Check GPU availability
python -c "import torch; print(torch.cuda.is_available())"

Low Accuracy

Use more descriptive element names
Ensure screenshots are clear and well-lit
Try different variations of the same query
Adjust confidence thresholds

Slow Performance

Reduce number of concurrent requests
Use smaller images when possible
Enable GPU acceleration
Add more worker instances

Memory Issues

Reduce MOONDREAM_WORKERS
Use CPU mode if GPU memory is limited
Monitor memory usage with system tools
Restart service periodically

Next Steps

Setup Guide: Complete installation and configuration
API Reference: Detailed endpoint documentation
Integration: How MoonDream works with FYI Automation Tool
Optimization: Performance tuning and best practices
Troubleshooting: Common issues and solutions

What is MoonDream?​

Integration with FYI Automation Tool​

🤖 AI Automation Features​

📱 Device Automation​

Architecture​

Key Features​

🚀 Performance​

🎯 Accuracy​

🔧 Integration​

Use Cases​

Mobile App Testing​

Screen Validation​

Natural Language Commands​

Quick Start​

1. Start MoonDream Service​

2. Configure FYI Tool​

3. Test Integration​

API Endpoints​

Configuration Options​

Environment Variables​

Hardware Optimization​

macOS (MPS)​

NVIDIA GPU (CUDA)​

CPU Fallback​

Best Practices​

Performance Optimization​

Accuracy Improvement​

Error Handling​

Troubleshooting​

Common Issues​

Service Won't Start​

Low Accuracy​

Slow Performance​

Memory Issues​

Next Steps​