Load Testing

Comprehensive k6 load testing suite for the BEEF platform, testing service limits across Sirloin, Round, Brain, and Cinder APIs.

Overview

This suite includes 11 load test scenarios designed to find performance limits of various platform services:

Scenario	Service	Target QPS	Description
01	Cinder	50	Onboarding nudity detection (30-60s response)
02	Cinder	800	Generation nudity detection (30-60s response)
03	Sirloin/Brain	800	Image generation submission + polling
04	Sirloin/Brain	100	Video generation submission + polling
05	Sirloin	10,000	Explore list (pagination)
06	Sirloin	1,000	Explore search queries
07	Sirloin	1,000	Explore filtering
08	Round	1,000	Text embeddings generation
09	Round	100	Face detection (base64 images)
10	Hive	50	Celebrity recognition (standalone, not in run-all)
11	Hive	50	Content moderation (standalone, not in run-all)

Test Families

This page covers two different test surfaces:

load-tests/: k6 load and performance scenarios for service-limit discovery.
tests/: Playwright API and e2e tests configured by tests/playwright.config.ts.

Use [Testing](/operations/testing/) for the per-service unit, lint, and typecheck command matrix.

Playwright API And E2E Tests

Playwright tests live under tests/ and are configured by tests/playwright.config.ts. They exercise API and browser flows rather than sustained load. Install dependencies from the tests directory and use the package scripts there:

cd tests
npm install
npx playwright install
npm run test:api

Use Playwright when validating request/response behavior, user journeys, browser compatibility, traces, screenshots, or API regression coverage. Use k6 load tests when measuring throughput, latency, saturation, or service-limit behavior.

Prerequisites

Required Software

k6 - Load testing tool

# macOS
brew install k6

# Linux
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
  --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | \
  sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

Node.js 18+ - For utility scripts
Terminal window
```
node --version  # Should be 18 or higher
```
ts-proto - For proto code generation
Terminal window
```
npm install -g ts-proto
```

Required Access

API Keys: Brain API key, Cinder API key, Hive API keys (scenarios 10 and 11 only)
R2 Credentials: Access to Cloudflare R2 bucket (or S3-compatible storage)
Service Access: Network access to staging/dev environment
Test Data: Valid user IDs and character IDs from target environment

Setup

1. Install Dependencies

cd load-tests
npm install

2. Generate Proto Code

From the repository root:

make generate-proto

This generates gRPC client code in load-tests/generated/ for both Sirloin and Round services.

3. Prepare Dataset

Download and prepare ~1,000 images for testing:

npm run setup

This script offers three options:

Option 1: CelebA-HQ (manual download required)
Option 2: LFW dataset (auto-download)
Option 3: Use your own images

Images will be placed in data/images/.

4. Upload Images to R2

First, configure your environment (see Configuration section below), then:

npm run upload-images

This uploads all images from data/images/ to your configured R2 bucket under load-test-images/YYYY-MM-DD/.

5. Generate Presigned URLs

npm run generate-urls

This creates data/presigned-urls.json with presigned URLs for all uploaded images. Valid for 7 days.

IMPORTANT: This file is gitignored and should never be committed. Regenerate weekly for ongoing testing.

Configuration

Create .env File

cp .env.example .env

Required Configuration

Edit .env and fill in all values:

# Target Environment
SIRLOIN_GRPC_HOST=staging.example.com:9920
ROUND_GRPC_HOST=staging.example.com:8080
BRAIN_API_KEY=your_brain_api_key
CINDER_API_KEY=your_cinder_api_key

# R2 Storage
R2_BUCKET_NAME=your-bucket-name
R2_ENDPOINT=https://account-id.r2.cloudflarestorage.com
R2_ACCESS_KEY=your_access_key
R2_SECRET_KEY=your_secret_key
R2_PUBLIC_URL=https://your-r2-public-domain.com

# Test Data (from staging environment)
TEST_USER_IDS=user_2abc123,user_2def456
TEST_CHARACTER_IDS=uuid1,uuid2,uuid3

# Load Test Settings (optional, defaults shown)
RAMP_UP_DURATION=2m
SUSTAIN_DURATION=15m
RAMP_DOWN_DURATION=2m

# Hive API (scenarios 10 and 11 only)
HIVE_CELEBRITY_API_KEY=your_hive_api_key
HIVE_MODERATION_API_KEY=your_hive_moderation_api_key
# HIVE_CELEBRITY_QPS=50       # Target QPS (default 50)
# HIVE_MODERATION_QPS=50      # Target QPS (default 50)

Validate Configuration

npm run validate-config

Running Tests

Run All Scenarios

npm test
# or
node run-all.js

This runs all 9 scenarios sequentially and generates an aggregate report.

Run Single Scenario

k6 run scenarios/01-cinder-onboarding.js
k6 run scenarios/05-explore-list.js
k6 run scenarios/10-hive-celebrity.js  # Hive API - requires HIVE_CELEBRITY_API_KEY
k6 run scenarios/11-hive-moderation.js  # Hive API - requires HIVE_MODERATION_API_KEY
# etc.

Run Selected Scenarios

node run-all.js --scenarios 01,02,05
# Runs only scenarios 01, 02, and 05

Custom Duration

Override test duration via environment variables:

SUSTAIN_DURATION=5m k6 run scenarios/01-cinder-onboarding.js

Or for all tests:

SUSTAIN_DURATION=5m node run-all.js

Interpreting Results

During Test Execution

k6 displays real-time metrics:

running (15m00s), 2250/2250 VUs, 180000 complete and 0 interrupted iterations
     ✓ cinder: status is 200
     ✗ cinder: response has body

VUs: Virtual users (concurrent simulated users)
Iterations: Completed requests
✓/✗: Check pass/fail status

After Test Completion

Each scenario outputs a summary:

========================================
Scenario 01: Cinder Onboarding
========================================
Target QPS:         50
Virtual Users:      2250
Total Requests:     45000
Avg Response Time:  45.23s
p95 Response Time:  58.12s
p99 Response Time:  63.45s
Error Rate:         0.12%
Nudity Detected:    2.34%
========================================

Aggregate Report

After running all scenarios:

Scenario                    Status      Requests        p95        Errors
────────────────────────────────────────────────────────────────────────────
Cinder Onboarding           ✓ PASS        45,000     58.12s      0.12%
Cinder Generation           ✓ PASS       720,000     62.34s      0.23%
...

Results are saved to:

Individual: results/01-cinder-onboarding.json, etc.
Aggregate: results/aggregate-report.json

Success Criteria

A scenario passes if:

✅ Target QPS sustained during 15-minute steady state
✅ Error rate < 1%
✅ p95 latency within defined thresholds
✅ No client-side crashes

A scenario fails if:

❌ Error rate ≥ 1%
❌ p95 latency exceeds threshold
❌ k6 errors or crashes

Test Scenarios Deep Dive

01-02: Cinder Nudity Detection

Purpose: Test Cinder API capacity for synchronous nudity detection

Key Metrics:

Response time (30-60s expected)
Nudity detection rate
HTTP error rate

Notes:

Each request uses a unique presigned URL
Response time includes full ML processing
Workflow types: user_onboarding vs generation.response

03-04: Image/Video Generation

Purpose: Test Sirloin GenerateMedia submission capacity

Key Metrics:

Submission latency (<2s expected)
gRPC error rate

Notes:

Tests submission rate, not completion rate
Actual generation takes 60-120s (images) or 600-1200s (videos)
Polls for 10s to catch fast completions
Most generations won’t complete during test - this is expected

05-07: Explore Endpoints

Purpose: Test Sirloin read-only endpoints for discovery/search

Key Metrics:

Response time (p95 < 200-500ms)
Results returned
gRPC error rate

Notes:

High QPS (1k-10k)
Read-only, no side effects
Tests pagination, search, and filtering

10: Hive Celebrity Recognition

Purpose: Test Hive API capacity for celebrity recognition (standalone scenario, not in run-all)

Key Metrics:

Response time (p95 < 35s)
HTTP error rate (< 1%)

Notes:

Requires HIVE_CELEBRITY_API_KEY (source .env before k6 or use -e HIVE_CELEBRITY_API_KEY=xxx)
Requires data/presigned-urls.json (run npm run setup, upload-images, generate-urls)
Override QPS: HIVE_CELEBRITY_QPS=25

11: Hive Content Moderation

Purpose: Test Hive API capacity for content moderation (standalone scenario, not in run-all)

Key Metrics:

Response time (p95 < 35s)
HTTP error rate (< 1%)

Notes:

Requires HIVE_MODERATION_API_KEY (source .env before k6 or use -e HIVE_MODERATION_API_KEY=xxx)
Requires data/presigned-urls.json (run npm run setup, upload-images, generate-urls)
Override QPS: HIVE_MODERATION_QPS=25
Run: pnpm run test:hive-moderation

08-09: Round Service

Purpose: Test Round inference endpoints

Key Metrics:

Inference time (p95 < 500ms-2s)
Embedding dimensions / faces detected
gRPC error rate

Notes:

Embeddings: Varying text lengths (10-1000 chars)
Face Detection: Base64-encoded images (~15MB max)
Direct gRPC to Round (not via Sirloin)

Troubleshooting

gRPC Connection Errors

WARN[0001] Request Failed error="rpc error: code = Unavailable"

Solutions:

Check SIRLOIN_GRPC_HOST / ROUND_GRPC_HOST in .env
Verify network connectivity: telnet staging.example.com 9920
Check firewall rules
Ensure services are running

Authentication Failures

error="rpc error: code = Unauthenticated desc = invalid token"

Solutions:

Verify BRAIN_API_KEY / CINDER_API_KEY in .env
Check API key expiry
Ensure API keys are for correct environment

Presigned URL Errors

Error: No presigned URLs available

Solutions:

Run npm run generate-urls
Check URLs haven’t expired (7-day limit)
Verify data/presigned-urls.json exists
Regenerate if older than 7 days

Image Loading Errors

Failed to load images: ENOENT: no such file or directory

Solutions:

Run npm run setup to download images
Manually place images in data/images/
Verify at least 100 images exist

Out of Memory (OOM)

FATAL: JavaScript heap out of memory

Solutions:

Reduce SUSTAIN_DURATION (try 5m instead of 15m)
Run scenarios individually instead of all at once
Reduce target QPS via environment variables
Increase Node.js memory: NODE_OPTIONS="--max-old-space-size=4096" node run-all.js

High Error Rates

If error rate > 1%:

Check service logs for backend errors
Reduce QPS to find sustainable rate
Verify test data (user IDs, character IDs) are valid
Check database connection limits
Monitor CPU/memory on backend services

Advanced Usage

Custom QPS Targets

Override any scenario’s target QPS:

CINDER_ONBOARDING_QPS=25 k6 run scenarios/01-cinder-onboarding.js
IMAGE_GENERATION_QPS=400 k6 run scenarios/03-image-generation.js

Custom Timeouts

IMAGE_GENERATION_TIMEOUT=300 k6 run scenarios/03-image-generation.js

Debug Mode

DEBUG=true k6 run scenarios/01-cinder-onboarding.js

Cloud Execution

Run tests from k6 Cloud for distributed load:

k6 cloud scenarios/05-explore-list.js

Maintenance

Weekly Tasks

Regenerate presigned URLs: npm run generate-urls (URLs expire after 7 days)

As Needed

Update test data: Edit data/test-data.json with new search queries, prompts, etc.
Refresh dataset: Re-run npm run setup if images change
Update proto code: Run make generate-proto from repo root after proto changes

Architecture

Directory Structure

load-tests/
├── data/                      # Test data (gitignored)
│   ├── images/               # Local image dataset
│   ├── presigned-urls.json   # Generated presigned URLs
│   └── test-data.json        # Search queries, prompts, etc.
├── generated/                 # Generated proto code (gitignored)
│   ├── sirloin/              # Sirloin gRPC types
│   └── round/                # Round gRPC types
├── results/                   # Test results (gitignored)
├── scenarios/                 # k6 test scenarios
├── scripts/                   # Setup scripts
├── utils/                     # Shared utilities
├── .env                       # Environment config (gitignored)
├── .env.example              # Example configuration
├── package.json              # Dependencies
└── run-all.js                # Master test runner

Load Profile

All scenarios use gradual ramp-up:

VUs
 |
 |        ┌─────────────────┐
 |       /                   \
 |      /                     \
 |     /                       \
 |____/                         \____
 └────────────────────────────────── Time
      2m      15m          2m
   ramp-up  sustain   ramp-down

This prevents sudden traffic spikes and provides more realistic load patterns.

Contributing

When adding new scenarios:

Create scenario file in scenarios/
Define thresholds in utils/metrics.js
Add to scenarios array in run-all.js
Update README with scenario description
Test individually before adding to suite

Support

For issues or questions:

Check troubleshooting section above
Review k6 documentation: https://k6.io/docs/
Check service logs for backend errors
Verify configuration with npm run validate-config