Rate Limits

All API endpoints are rate-limited to ensure fair usage across all users. Limits vary by subscription tier.

Rate Limit Headers

Every API response includes rate limit headers:

Header	Description
`X-RateLimit-Minute-Limit`	Maximum requests per minute
`X-RateLimit-Minute-Remaining`	Requests remaining in current minute
`X-RateLimit-Minute-Reset`	Unix timestamp when the minute window resets

Execution endpoints (running apps/workflows) also include hourly limits:

Header	Description
`X-RateLimit-Hour-Limit`	Maximum executions per hour
`X-RateLimit-Hour-Remaining`	Executions remaining in current hour
`X-RateLimit-Hour-Reset`	Unix timestamp when the hour window resets

General Rate Limit

All endpoints share a general rate limit of 120 requests per minute per user, regardless of subscription tier.

Execution Rate Limits

Run endpoints (POST /api/v1/apps/:id and POST /api/v1/runs) have additional per-tier limits:

Tier	Per Minute	Per Hour
Starter	10	100
Plus	20	200
Max	30	300
Ultra	90	900

Handling Rate Limits

When rate-limited, the API returns 429 Too Many Requests with a Retry-After header indicating how many seconds to wait:

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded",
    "retry_after": 30
  }
}

Best practice: Check the X-RateLimit-Minute-Remaining header before making requests. When remaining hits zero, wait until the X-RateLimit-Minute-Reset timestamp.

GPU Quota

In addition to rate limits, execution endpoints enforce GPU usage quotas. When exceeded, you receive 429 with code GPU_QUOTA_EXCEEDED.

Rate Limit Headers

General Rate Limit

Execution Rate Limits

Handling Rate Limits

GPU Quota

On this page