Skip to main content

MoonDream Integration

How MoonDream Vision integrates with the FYI Automation Tool for intelligent mobile app testing and automation.

Architecture Overview

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ User Command │────│ FYI Backend │────│ MoonDream API │
│ "tap login" │ │ │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Automation Core │ │ Vision Service │ │ Point Detection│
│ │ │ │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Coordinate │ │ ADB Commands │ │ Touch Events │
│ Cache │────│ │────│ │
└─────────────────┘ └──────────────────┘ └─────────────────┘

Integration Points

1. Vision Service (visionService.ts)

The vision service is the main integration point that handles all MoonDream API communication.

Key Functions

detectObject

// Primary function for UI element detection
const result = await detectObject(screenshotPath, "login button");

// Returns coordinates and confidence
{
found: true,
coordinates: { x: 540, y: 1200 },
confidence: 0.95,
screenshot: "/path/to/screenshot.png"
}

validateVisionService

// Health check for MoonDream connectivity
const health = await validateVisionService();

// Returns service status
{
available: true,
endpoint: "http://localhost:20200",
latency: 150
}

2. Automation Routes

MoonDream is integrated into several API endpoints:

AI Automation (/api/automation/*)

// Natural language command processing
POST /api/automation/execute
// "tap the login button" → MoonDream detection → ADB tap

POST /api/automation/detect
// Direct UI element detection

POST /api/automation/screenshot-detect
// Capture + detect in one request

Coordinate Cache (/api/coordinate-cache/*)

// Cache management for detected coordinates
POST /api/coordinate-cache/store
// Store MoonDream results for performance

GET /api/coordinate-cache/check
// Retrieve cached coordinates

Workflow Examples

Basic UI Interaction

  1. User Command: "tap the login button"
  2. Screenshot Capture: Take device screenshot via ADB
  3. MoonDream Detection: Find "login button" coordinates
  4. Coordinate Caching: Store result for future use
  5. ADB Execution: Send touch command to device
  6. Result Validation: Confirm action success

Code Flow

// 1. Capture screenshot
const screenshot = await takeScreenshot(deviceId);

// 2. Detect element with MoonDream
const detection = await detectObject(screenshot, "login button");

// 3. Cache coordinates for performance
if (detection.found) {
await storeCoordinateCache(flowId, deviceId, "login button", detection);
}

// 4. Execute ADB command
await adbTap(deviceId, detection.coordinates.x, detection.coordinates.y);

Configuration Integration

Environment Variables

# server/.env
MOONDREAM_URL=http://localhost:20200 # MoonDream API endpoint
MOONDREAM_MAX_VARIATIONS=2 # Detection attempts
MOONDREAM_API_KEY= # Optional authentication

# moondream/.env
MOONDREAM_WORKERS=1 # Model instances

Service Dependencies

// services registered in dependency injection
const visionService = new VisionService();
const automationService = new AutomationService(visionService);
const coordinateCache = new CoordinateCacheService();

Error Handling

MoonDream Service Unavailable

try {
const result = await detectObject(screenshot, element);
} catch (error) {
if (error.code === 'MOONDREAM_UNAVAILABLE') {
// Fallback to cached coordinates
const cached = await getCachedCoordinates(flowId, deviceId, element);
if (cached) {
return executeWithCachedCoordinates(cached);
}
// Or skip vision-dependent steps
return { skipped: true, reason: 'vision_unavailable' };
}
}

Low Confidence Detections

const detection = await detectObject(screenshot, element);

if (detection.confidence < 0.7) {
// Try alternative phrasings
const alternatives = [
`${element} icon`,
`the ${element}`,
element.replace('button', 'btn')
];

for (const alt of alternatives) {
const altDetection = await detectObject(screenshot, alt);
if (altDetection.confidence > detection.confidence) {
detection = altDetection;
}
}
}

Network Timeouts

const result = await Promise.race([
detectObject(screenshot, element),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('timeout')), 30000)
)
]);

Performance Optimization

Coordinate Caching Strategy

// Check cache first
let coordinates = await getCachedCoordinates(flowId, deviceId, element);

if (!coordinates) {
// Fallback to MoonDream detection
const detection = await detectObject(screenshot, element);

if (detection.found) {
// Cache for future use
await storeCoordinateCache(flowId, deviceId, element, detection);
coordinates = detection.coordinates;
}
}

Batch Processing

// Detect multiple elements in one request
const elements = ["login button", "username field", "password field"];
const detections = await Promise.all(
elements.map(element => detectObject(screenshot, element))
);

// Process results
const foundElements = detections.filter(d => d.found);
const missingElements = detections.filter(d => !d.found);

Connection Pooling

// Reuse HTTP connections for better performance
const agent = new https.Agent({
keepAlive: true,
maxSockets: 10,
maxFreeSockets: 5
});

const response = await fetch(MOONDREAM_URL, {
agent,
// ... other options
});

Advanced Features

Multi-Modal Detection

// Combine visual and textual detection
const visualResult = await detectObject(screenshot, "login button");
const textResult = await ocrDetect(screenshot, "Login");

if (visualResult.found && textResult.found) {
// Use more precise visual coordinates
coordinates = visualResult.coordinates;
} else if (textResult.found) {
// Fallback to OCR coordinates
coordinates = textResult.coordinates;
}

Learning and Adaptation

// Track detection success/failure patterns
await trackDetectionMetrics(element, detection.confidence, detection.found);

// Use historical data to improve future detections
const successRate = await getElementSuccessRate(element);
if (successRate < 0.5) {
// Try alternative detection methods
const altResult = await alternativeDetection(screenshot, element);
}

Testing Integration

Unit Tests

describe('Vision Service Integration', () => {
test('detects login button correctly', async () => {
const mockScreenshot = '/path/to/test/screenshot.png';
const result = await detectObject(mockScreenshot, 'login button');

expect(result.found).toBe(true);
expect(result.coordinates.x).toBeGreaterThan(0);
expect(result.coordinates.y).toBeGreaterThan(0);
expect(result.confidence).toBeGreaterThan(0.8);
});

test('handles MoonDream unavailability', async () => {
// Mock MoonDream service down
mockMoonDreamUnavailable();

const result = await detectObject(screenshot, 'button');

expect(result.found).toBe(false);
expect(result.error).toContain('unavailable');
});
});

Integration Tests

describe('End-to-End Automation', () => {
test('complete login flow with vision', async () => {
// Setup device and app
const deviceId = await setupTestDevice();

// Execute natural language command
const result = await executeCommand(
deviceId,
'tap the login button, type test@example.com in username, type password123 in password, tap login'
);

expect(result.success).toBe(true);
expect(result.stepsExecuted).toBeGreaterThan(3);

// Verify login success
const loginCheck = await checkScreen(deviceId, 'Welcome back');
expect(loginCheck.found).toBe(true);
});
});

Monitoring and Observability

Vision Service Metrics

// Track MoonDream performance
const metrics = {
requestsTotal: 0,
requestsSuccessful: 0,
averageResponseTime: 0,
cacheHitRate: 0,
errorsByType: {}
};

// Prometheus metrics
const visionRequests = new Counter({
name: 'vision_requests_total',
help: 'Total vision service requests',
labelNames: ['method', 'status']
});

const visionDuration = new Histogram({
name: 'vision_request_duration_seconds',
help: 'Vision request duration',
buckets: [0.1, 0.5, 1, 2, 5, 10]
});

Logging

// Structured logging for vision operations
logger.info('Vision detection completed', {
element: targetElement,
confidence: result.confidence,
coordinates: result.coordinates,
processingTime: Date.now() - startTime,
cacheUsed: cachedResult !== null,
workerId: moonDreamWorkerId
});

Alerting

// Alert on vision service degradation
if (visionService.health.latency > 5000) {
alert('MoonDream service response time degraded');
}

if (visionService.health.errorRate > 0.1) {
alert('High error rate in vision service');
}

Best Practices

Development

  • Mock Services: Use mocked MoonDream responses for unit testing
  • Fallback Logic: Always implement fallbacks for vision failures
  • Confidence Thresholds: Tune confidence levels per element type
  • Caching Strategy: Cache successful detections, invalidate on failures

Production

  • Health Monitoring: Regularly check MoonDream service health
  • Load Balancing: Distribute requests across multiple MoonDream workers
  • Rate Limiting: Implement rate limits to prevent service overload
  • Error Recovery: Automatic retry with exponential backoff

Performance

  • Image Optimization: Compress and resize images before sending
  • Batch Requests: Group multiple detections when possible
  • Connection Reuse: Use HTTP keep-alive for better performance
  • Async Processing: Don't block on vision operations when possible

Troubleshooting Integration Issues

Common Problems

MoonDream Connection Failed

// Check service health
const health = await fetch('http://localhost:20200/');
if (!health.ok) {
console.error('MoonDream service is down');
// Start MoonDream service or use fallback
}

Coordinate Detection Inaccurate

// Debug detection results
const debugResult = await detectObject(screenshot, element, { debug: true });
console.log('Detection details:', debugResult);

// Try alternative element descriptions
const alternatives = generateAlternatives(element);
for (const alt of alternatives) {
const result = await detectObject(screenshot, alt);
if (result.confidence > bestConfidence) {
bestResult = result;
}
}

Cache Invalidation Issues

// Clear problematic cache entries
await invalidateCoordinateCache(flowId, deviceId, element);

// Force fresh detection
const freshResult = await detectObject(screenshot, element, { skipCache: true });

Performance Degradation

// Monitor MoonDream performance
const metrics = await getVisionMetrics();
if (metrics.averageResponseTime > 2000) {
// Scale up MoonDream workers
// Or implement request queuing
}

Future Enhancements

Planned Features

  • Real-time Streaming: WebSocket-based real-time detection
  • Model Fine-tuning: Custom models for specific apps
  • Multi-language Support: UI detection in different languages
  • Gesture Recognition: Detect complex touch gestures
  • Layout Analysis: Understand app screen layouts
  • Accessibility Support: Work with screen readers and accessibility features