Skip to Content
ArchitectureScalability

Scalability

Computer Agents is designed to scale seamlessly from individual developers to enterprise workloads. This document explains our scaling architecture and performance characteristics.

Auto-Scaling Architecture

Horizontal Scaling

Our compute layer automatically scales based on demand:

Load Increases Load Decreases │ │ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ CPU > 70% │ │ CPU < 30% │ │ for 60s │ │ for 120s │ └──────┬──────┘ └──────┬──────┘ │ │ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ Add │ │ Remove │ │ Instance │ │ Instance │ └─────────────┘ └─────────────┘

Scaling Parameters

ParameterValueDescription
Min instances1Always-on baseline
Max instancesDynamicScales with demand
Scale-up trigger70% CPUFor 60 seconds
Scale-down trigger30% CPUFor 120 seconds
Cooldown60s up, 120s downPrevents thrashing

Instance Startup Time

PhaseDuration
VM provisioning~30 seconds
OS boot~15 seconds
Service startup~15 seconds
Health check pass~20 seconds
Total~80 seconds

Traffic is only routed to new instances after they pass health checks, ensuring no requests hit unready servers.

Performance Characteristics

Latency Profile

OperationP50P95P99
API request (non-execution)50ms150ms300ms
Execution start (warm)100ms300ms500ms
Execution start (cold)3s5s8s
Database query5ms20ms50ms

Throughput

MetricValue
Requests per instance~500 RPS
Concurrent executions per instance10-20
Total platform capacityScales with instances

Container Pool Optimization

Warm Container Benefits

Warm containers dramatically reduce execution latency:

Request Flow - Cold Start: [Request] → [Start Container] → [Mount Storage] → [Execute] → [Response] ~3-5 seconds ~1 second varies Request Flow - Warm Container: [Request] → [Execute] → [Response] ~100ms

Pool Configuration

SettingValueRationale
Idle timeout15 minutesBalance cost vs latency
Max containers/instance10Memory constraints
Max lifetime24 hoursPrevent stale state

Cache Hit Rate

Typical warm container hit rates:

Usage PatternHit Rate
Repeated tasks (same env)90%+
Varied environments50-70%
First request of day0%

Database Scaling

Connection Management

ConfigurationValue
Max connections per instanceOptimized per tier
Connection timeout30 seconds
Idle timeout10 minutes
Auto-reconnectEnabled

Query Performance

We optimize database performance through:

  • Indexes on frequently queried columns
  • Connection pooling to reduce overhead
  • Query optimization for common patterns
  • Read replicas available for scaling reads

Database Scaling Options

TierMax ConnectionsUse Case
db-f1-micro25Development
db-g1-small100Small production
db-custom500+High-volume production

Storage Scaling

Cloud Storage Performance

OperationLatency
Object read~50ms
Object write~100ms
Directory listing~100-200ms

Scaling Characteristics

  • Unlimited capacity - Pay for what you use
  • High throughput - Parallel operations supported
  • Global distribution - Multi-region replication available

gcsfuse Performance

ScenarioPerformance
Sequential readsNear-native speed
Random readsGood, slight overhead
WritesGood, eventual consistency
Large filesStreaming supported

Rate Limits

Current Limits

LimitValueScope
Global requests1,000 / 15 minPer IP
Executions30 / 15 minPer API key
File uploads100 / 15 minPer API key
Max file size100 MBPer file

Rate Limit Responses

When limits are exceeded:

HTTP/1.1 429 Too Many Requests Retry-After: 60 { "error": "Rate limit exceeded", "message": "Too many requests. Retry after 60 seconds.", "retryAfter": 60 }

Handling Rate Limits

import { ComputerAgentsClient, ApiClientError } from 'computer-agents'; async function executeWithRetry(task: string, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { return await client.run(task, { environmentId: 'env_xxx' }); } catch (error) { if (error instanceof ApiClientError && error.status === 429) { const retryAfter = error.retryAfter || 60; await new Promise(r => setTimeout(r, retryAfter * 1000)); continue; } throw error; } } throw new Error('Max retries exceeded'); }

High Availability

Redundancy

ComponentRedundancy Level
Load BalancerGlobal, multi-region
ComputeAuto-scaling, self-healing
DatabaseHigh availability replicas
StorageMulti-region replication

Failure Scenarios

FailureImpactRecovery
Single instanceNone (traffic rerouted)~2 min auto-heal
Multiple instancesReduced capacityAuto-scale up
Database primaryBrief interruption~60s HA failover
Zone outagePotential interruptionManual intervention

Availability Target

MetricTarget
Monthly uptime99.9%
Allowed downtime~43 minutes/month
Planned maintenanceAnnounced 7 days ahead

Capacity Planning

Per-User Limits

ResourceDefault Limit
Environments50
Threads1,000
Agents100
Schedules50
Storage10 GB

Enterprise Limits

Contact us for increased limits:

  • Higher rate limits
  • Dedicated capacity
  • Custom retention policies
  • SLA guarantees

Performance Optimization Tips

1. Reuse Environments

// Good: Reuse environment const env = await client.environments.getDefault(); await client.run(task1, { environmentId: env.id }); await client.run(task2, { environmentId: env.id }); // Warm container! // Less optimal: New environment each time await client.run(task1, { environmentId: (await client.environments.create({})).id });
// Good: Related tasks in same thread const r1 = await client.run('Create project structure', { environmentId }); const r2 = await client.run('Add tests', { threadId: r1.threadId }); // Context preserved! // Less optimal: Separate threads const r1 = await client.run('Create project structure', { environmentId }); const r2 = await client.run('Add tests to the project structure I just created', { environmentId });

3. Handle Errors Gracefully

try { await client.run(task, options); } catch (error) { if (error.status === 429) { // Rate limited - back off and retry } else if (error.status === 402) { // Budget exhausted - notify user } else if (error.status >= 500) { // Server error - retry with exponential backoff } }

4. Monitor Your Usage

// Check budget before batch operations const budget = await client.budget.get(); if (budget.balance < estimatedCost) { console.log('Insufficient budget for operation'); return; } // Proceed with operations for (const task of tasks) { await client.run(task, options); }

Monitoring Your Usage

Available Metrics

EndpointInformation
GET /v1/budgetCurrent balance and limits
GET /v1/costs/summaryUsage by period
GET /v1/billing/recordsDetailed transaction history

Usage Dashboard

The web dashboard provides:

  • Real-time usage graphs
  • Cost breakdown by environment
  • Execution history and performance
  • Budget alerts configuration

Future Scaling Improvements

We’re continuously improving scalability:

  • Regional expansion - Additional regions for lower latency
  • Larger instances - For compute-intensive workloads
  • GPU support - For ML workloads
  • Dedicated capacity - Reserved instances for enterprise

Subscribe to our changelog for updates on scaling improvements.

Last updated on