Scalability

Computer Agents is designed to scale seamlessly from individual developers to enterprise workloads. This document explains our scaling architecture and performance characteristics.

Auto-Scaling Architecture

Horizontal Scaling

Our compute layer automatically scales based on demand:


Load Increases          Load Decreases
      │                       │
      ▼                       ▼
┌─────────────┐         ┌─────────────┐
│ CPU > 70%   │         │ CPU < 30%   │
│ for 60s     │         │ for 120s    │
└──────┬──────┘         └──────┬──────┘
       │                       │
       ▼                       ▼
┌─────────────┐         ┌─────────────┐
│ Add         │         │ Remove      │
│ Instance    │         │ Instance    │
└─────────────┘         └─────────────┘

Scaling Parameters

Parameter	Value	Description
Min instances	1	Always-on baseline
Max instances	Dynamic	Scales with demand
Scale-up trigger	70% CPU	For 60 seconds
Scale-down trigger	30% CPU	For 120 seconds
Cooldown	60s up, 120s down	Prevents thrashing

Instance Startup Time

Phase	Duration
VM provisioning	~30 seconds
OS boot	~15 seconds
Service startup	~15 seconds
Health check pass	~20 seconds
Total	~80 seconds

Traffic is only routed to new instances after they pass health checks, ensuring no requests hit unready servers.

Performance Characteristics

Latency Profile

Operation	P50	P95	P99
API request (non-execution)	50ms	150ms	300ms
Execution start (warm)	100ms	300ms	500ms
Execution start (cold)	3s	5s	8s
Database query	5ms	20ms	50ms

Throughput

Metric	Value
Requests per instance	~500 RPS
Concurrent executions per instance	10-20
Total platform capacity	Scales with instances

Container Pool Optimization

Warm Container Benefits

Warm containers dramatically reduce execution latency:


Request Flow - Cold Start:
[Request] → [Start Container] → [Mount Storage] → [Execute] → [Response]
              ~3-5 seconds        ~1 second         varies

Request Flow - Warm Container:
[Request] → [Execute] → [Response]
             ~100ms

Pool Configuration

Setting	Value	Rationale
Idle timeout	15 minutes	Balance cost vs latency
Max containers/instance	10	Memory constraints
Max lifetime	24 hours	Prevent stale state

Cache Hit Rate

Typical warm container hit rates:

Usage Pattern	Hit Rate
Repeated tasks (same env)	90%+
Varied environments	50-70%
First request of day	0%

Database Scaling

Connection Management

Configuration	Value
Max connections per instance	Optimized per tier
Connection timeout	30 seconds
Idle timeout	10 minutes
Auto-reconnect	Enabled

Query Performance

We optimize database performance through:

Indexes on frequently queried columns
Connection pooling to reduce overhead
Query optimization for common patterns
Read replicas available for scaling reads

Database Scaling Options

Tier	Max Connections	Use Case
db-f1-micro	25	Development
db-g1-small	100	Small production
db-custom	500+	High-volume production

Storage Scaling

Cloud Storage Performance

Operation	Latency
Object read	~50ms
Object write	~100ms
Directory listing	~100-200ms

Scaling Characteristics

Unlimited capacity - Pay for what you use
High throughput - Parallel operations supported
Global distribution - Multi-region replication available

gcsfuse Performance

Scenario	Performance
Sequential reads	Near-native speed
Random reads	Good, slight overhead
Writes	Good, eventual consistency
Large files	Streaming supported

Rate Limits

Current Limits

Limit	Value	Scope
Global requests	1,000 / 15 min	Per IP
Executions	30 / 15 min	Per API key
File uploads	100 / 15 min	Per API key
Max file size	100 MB	Per file

Rate Limit Responses

When limits are exceeded:


HTTP/1.1 429 Too Many Requests
Retry-After: 60
 
{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Retry after 60 seconds.",
  "retryAfter": 60
}

Handling Rate Limits


import { ComputerAgentsClient, ApiClientError } from 'computer-agents';
 
async function executeWithRetry(task: string, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await client.run(task, { environmentId: 'env_xxx' });
    } catch (error) {
      if (error instanceof ApiClientError && error.status === 429) {
        const retryAfter = error.retryAfter || 60;
        await new Promise(r => setTimeout(r, retryAfter * 1000));
        continue;
      }
      throw error;
    }
  }
  throw new Error('Max retries exceeded');
}

High Availability

Redundancy

Component	Redundancy Level
Load Balancer	Global, multi-region
Compute	Auto-scaling, self-healing
Database	High availability replicas
Storage	Multi-region replication

Failure Scenarios

Failure	Impact	Recovery
Single instance	None (traffic rerouted)	~2 min auto-heal
Multiple instances	Reduced capacity	Auto-scale up
Database primary	Brief interruption	~60s HA failover
Zone outage	Potential interruption	Manual intervention

Availability Target

Metric	Target
Monthly uptime	99.9%
Allowed downtime	~43 minutes/month
Planned maintenance	Announced 7 days ahead

Capacity Planning

Per-User Limits

Resource	Default Limit
Environments	50
Threads	1,000
Agents	100
Schedules	50
Storage	10 GB

Enterprise Limits

Higher rate limits
Dedicated capacity
Custom retention policies
SLA guarantees

Performance Optimization Tips

1. Reuse Environments


// Good: Reuse environment
const env = await client.environments.getDefault();
await client.run(task1, { environmentId: env.id });
await client.run(task2, { environmentId: env.id }); // Warm container!
 
// Less optimal: New environment each time
await client.run(task1, { environmentId: (await client.environments.create({})).id });


// Good: Related tasks in same thread
const r1 = await client.run('Create project structure', { environmentId });
const r2 = await client.run('Add tests', { threadId: r1.threadId }); // Context preserved!
 
// Less optimal: Separate threads
const r1 = await client.run('Create project structure', { environmentId });
const r2 = await client.run('Add tests to the project structure I just created', { environmentId });

3. Handle Errors Gracefully


try {
  await client.run(task, options);
} catch (error) {
  if (error.status === 429) {
    // Rate limited - back off and retry
  } else if (error.status === 402) {
    // Budget exhausted - notify user
  } else if (error.status >= 500) {
    // Server error - retry with exponential backoff
  }
}

4. Monitor Your Usage


// Check budget before batch operations
const budget = await client.budget.get();
 
if (budget.balance < estimatedCost) {
  console.log('Insufficient budget for operation');
  return;
}
 
// Proceed with operations
for (const task of tasks) {
  await client.run(task, options);
}

Monitoring Your Usage

Available Metrics

Endpoint	Information
`GET /v1/budget`	Current balance and limits
`GET /v1/costs/summary`	Usage by period
`GET /v1/billing/records`	Detailed transaction history

Usage Dashboard

The web dashboard provides:

Real-time usage graphs
Cost breakdown by environment
Execution history and performance
Budget alerts configuration

Future Scaling Improvements

We’re continuously improving scalability:

Regional expansion - Additional regions for lower latency
Larger instances - For compute-intensive workloads
GPU support - For ML workloads
Dedicated capacity - Reserved instances for enterprise

Subscribe to our changelog for updates on scaling improvements.