Scalability
Computer Agents is designed to scale seamlessly from individual developers to enterprise workloads. This document explains our scaling architecture and performance characteristics.
Auto-Scaling Architecture
Horizontal Scaling
Our compute layer automatically scales based on demand:
Load Increases Load Decreases
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ CPU > 70% │ │ CPU < 30% │
│ for 60s │ │ for 120s │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Add │ │ Remove │
│ Instance │ │ Instance │
└─────────────┘ └─────────────┘Scaling Parameters
| Parameter | Value | Description |
|---|---|---|
| Min instances | 1 | Always-on baseline |
| Max instances | Dynamic | Scales with demand |
| Scale-up trigger | 70% CPU | For 60 seconds |
| Scale-down trigger | 30% CPU | For 120 seconds |
| Cooldown | 60s up, 120s down | Prevents thrashing |
Instance Startup Time
| Phase | Duration |
|---|---|
| VM provisioning | ~30 seconds |
| OS boot | ~15 seconds |
| Service startup | ~15 seconds |
| Health check pass | ~20 seconds |
| Total | ~80 seconds |
Traffic is only routed to new instances after they pass health checks, ensuring no requests hit unready servers.
Performance Characteristics
Latency Profile
| Operation | P50 | P95 | P99 |
|---|---|---|---|
| API request (non-execution) | 50ms | 150ms | 300ms |
| Execution start (warm) | 100ms | 300ms | 500ms |
| Execution start (cold) | 3s | 5s | 8s |
| Database query | 5ms | 20ms | 50ms |
Throughput
| Metric | Value |
|---|---|
| Requests per instance | ~500 RPS |
| Concurrent executions per instance | 10-20 |
| Total platform capacity | Scales with instances |
Container Pool Optimization
Warm Container Benefits
Warm containers dramatically reduce execution latency:
Request Flow - Cold Start:
[Request] → [Start Container] → [Mount Storage] → [Execute] → [Response]
~3-5 seconds ~1 second varies
Request Flow - Warm Container:
[Request] → [Execute] → [Response]
~100msPool Configuration
| Setting | Value | Rationale |
|---|---|---|
| Idle timeout | 15 minutes | Balance cost vs latency |
| Max containers/instance | 10 | Memory constraints |
| Max lifetime | 24 hours | Prevent stale state |
Cache Hit Rate
Typical warm container hit rates:
| Usage Pattern | Hit Rate |
|---|---|
| Repeated tasks (same env) | 90%+ |
| Varied environments | 50-70% |
| First request of day | 0% |
Database Scaling
Connection Management
| Configuration | Value |
|---|---|
| Max connections per instance | Optimized per tier |
| Connection timeout | 30 seconds |
| Idle timeout | 10 minutes |
| Auto-reconnect | Enabled |
Query Performance
We optimize database performance through:
- Indexes on frequently queried columns
- Connection pooling to reduce overhead
- Query optimization for common patterns
- Read replicas available for scaling reads
Database Scaling Options
| Tier | Max Connections | Use Case |
|---|---|---|
| db-f1-micro | 25 | Development |
| db-g1-small | 100 | Small production |
| db-custom | 500+ | High-volume production |
Storage Scaling
Cloud Storage Performance
| Operation | Latency |
|---|---|
| Object read | ~50ms |
| Object write | ~100ms |
| Directory listing | ~100-200ms |
Scaling Characteristics
- Unlimited capacity - Pay for what you use
- High throughput - Parallel operations supported
- Global distribution - Multi-region replication available
gcsfuse Performance
| Scenario | Performance |
|---|---|
| Sequential reads | Near-native speed |
| Random reads | Good, slight overhead |
| Writes | Good, eventual consistency |
| Large files | Streaming supported |
Rate Limits
Current Limits
| Limit | Value | Scope |
|---|---|---|
| Global requests | 1,000 / 15 min | Per IP |
| Executions | 30 / 15 min | Per API key |
| File uploads | 100 / 15 min | Per API key |
| Max file size | 100 MB | Per file |
Rate Limit Responses
When limits are exceeded:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
{
"error": "Rate limit exceeded",
"message": "Too many requests. Retry after 60 seconds.",
"retryAfter": 60
}Handling Rate Limits
import { ComputerAgentsClient, ApiClientError } from 'computer-agents';
async function executeWithRetry(task: string, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await client.run(task, { environmentId: 'env_xxx' });
} catch (error) {
if (error instanceof ApiClientError && error.status === 429) {
const retryAfter = error.retryAfter || 60;
await new Promise(r => setTimeout(r, retryAfter * 1000));
continue;
}
throw error;
}
}
throw new Error('Max retries exceeded');
}High Availability
Redundancy
| Component | Redundancy Level |
|---|---|
| Load Balancer | Global, multi-region |
| Compute | Auto-scaling, self-healing |
| Database | High availability replicas |
| Storage | Multi-region replication |
Failure Scenarios
| Failure | Impact | Recovery |
|---|---|---|
| Single instance | None (traffic rerouted) | ~2 min auto-heal |
| Multiple instances | Reduced capacity | Auto-scale up |
| Database primary | Brief interruption | ~60s HA failover |
| Zone outage | Potential interruption | Manual intervention |
Availability Target
| Metric | Target |
|---|---|
| Monthly uptime | 99.9% |
| Allowed downtime | ~43 minutes/month |
| Planned maintenance | Announced 7 days ahead |
Capacity Planning
Per-User Limits
| Resource | Default Limit |
|---|---|
| Environments | 50 |
| Threads | 1,000 |
| Agents | 100 |
| Schedules | 50 |
| Storage | 10 GB |
Enterprise Limits
Contact us for increased limits:
- Higher rate limits
- Dedicated capacity
- Custom retention policies
- SLA guarantees
Performance Optimization Tips
1. Reuse Environments
// Good: Reuse environment
const env = await client.environments.getDefault();
await client.run(task1, { environmentId: env.id });
await client.run(task2, { environmentId: env.id }); // Warm container!
// Less optimal: New environment each time
await client.run(task1, { environmentId: (await client.environments.create({})).id });2. Use Threads for Related Tasks
// Good: Related tasks in same thread
const r1 = await client.run('Create project structure', { environmentId });
const r2 = await client.run('Add tests', { threadId: r1.threadId }); // Context preserved!
// Less optimal: Separate threads
const r1 = await client.run('Create project structure', { environmentId });
const r2 = await client.run('Add tests to the project structure I just created', { environmentId });3. Handle Errors Gracefully
try {
await client.run(task, options);
} catch (error) {
if (error.status === 429) {
// Rate limited - back off and retry
} else if (error.status === 402) {
// Budget exhausted - notify user
} else if (error.status >= 500) {
// Server error - retry with exponential backoff
}
}4. Monitor Your Usage
// Check budget before batch operations
const budget = await client.budget.get();
if (budget.balance < estimatedCost) {
console.log('Insufficient budget for operation');
return;
}
// Proceed with operations
for (const task of tasks) {
await client.run(task, options);
}Monitoring Your Usage
Available Metrics
| Endpoint | Information |
|---|---|
GET /v1/budget | Current balance and limits |
GET /v1/costs/summary | Usage by period |
GET /v1/billing/records | Detailed transaction history |
Usage Dashboard
The web dashboard provides:
- Real-time usage graphs
- Cost breakdown by environment
- Execution history and performance
- Budget alerts configuration
Future Scaling Improvements
We’re continuously improving scalability:
- Regional expansion - Additional regions for lower latency
- Larger instances - For compute-intensive workloads
- GPU support - For ML workloads
- Dedicated capacity - Reserved instances for enterprise
Subscribe to our changelog for updates on scaling improvements.
Last updated on