Skip to content

Media Generation

Media Generation

Purpose

Document the user-facing generation flow from Brisket through Sirloin, Brain, Round, and Flank.

Participants

  • Brisket initiates generation and displays credit pricing.
  • Sirloin validates usage, deducts credits at request time, and stores user-facing media state.
  • Brain handles async generation records, queues, provider integrations, and calls to round.
  • Round serves internal model inference.
  • Flank participates when media.workflow_name is set.

Sequence

sequenceDiagram
participant Brisket
participant Sirloin
participant Brain
participant Round
participant Flank
Brisket->>Sirloin: Create media/generation request
Sirloin->>Sirloin: Validate usage and deduct credits
alt workflow_name set
Sirloin->>Flank: Execute workflow
else default path
Sirloin->>Brain: Send media to brain
Brain->>Round: Inference when needed
end
Brain-->>Sirloin: Generation status/result
Sirloin-->>Brisket: User-facing media state
  • apps/brisket/src/hooks/use-credits.tsx
  • apps/brisket/src/lib/constants.ts
  • apps/sirloin/internal/app/services/media/
  • apps/brain/src/modules/
  • apps/flank/server/engine/

Image Moderation

Brain can moderate input or source images before generation when an image is present, and it moderates generated outputs before they become available. Pre-generation moderation uses visual checks only today. Post-generation moderation also runs demographics age scoring and can run celebrity checks for NSFW generations.

A flagged single output fails the generation. For carousel output, flagged panels can be dropped and the request fails only if no safe panels remain. See Image Moderation And Age Scoring for checkpoint rules, age-scoring thresholds, and provider fallback behavior.

State Transitions

Sirloin deducts credits when the request is accepted. Brain records and processes async generation. Flank routes workflow-backed media when media.workflow_name is set. Re-edits with source_id can be free.

Reference To Video

REFERENCE_TO_VIDEO is a Kitsune media generation path for creating a video from a required video prompt and character reference imagery, optionally guided by a reference video and pose image. Fennec exposes it in the media generator with image_path, video_path, video_prompt, video_provider, and a text-to-video model selection. image_path is a pose image and video_path is an optional source video.

Brain supports this generation type with WaveSpeed and Atlas providers. Before scheduling provider work, Brain resolves the character’s latest body and frontal-face onboarding images, preferring NSFW variants when the flow is NSFW and both NSFW body and face references exist. It upscales the body and face references through Fal-backed UpscaleImage commands, storing each result under a global deterministic media/reference-upscales/{hash}.{output_format} cache key. This reference-to-video flow currently requests jpg output. Cache hits are marked as Cached in generation events. Brain then sends those upscaled images, the selected pose image when present, and the selected reference video when present to the provider text-to-video endpoint. If the request has a shorter video duration than the selected reference video, Brain first stores or reuses a deterministic media/reference-video-cuts/{hash}.mp4 trim and sends that shorter reference.

When a media example was originally generated by REFERENCE_TO_VIDEO, replaying that example uses the same REFERENCE_TO_VIDEO flow. The original pose image and source video are reused only when the source generation metadata has sourceImagePath or sourceVideoPath; examples created from character references alone replay without sending a reference video.

Successful output is stored as the canonical video path and a compressed 480p variant. Brain also tries to extract the first video frame with ffmpeg for the preview image; if extraction fails, it falls back to the selected pose image or the upscaled body reference.

Provider Routing

Brain routes image and video generation to external inference providers (Atlas, WaveSpeed, RunPod, FAL). Each generation type has an executor that manages an ordered fallback chain of providers — if the first provider fails to schedule, the executor tries the next.

For image and image-sequence generation, the fallback chain order is configurable at runtime via the inference-traffic-split application setting. This enables progressive rollout of new providers by controlling what percentage of requests try each ordering. Video executors use the provider specified in the job data without fallback chains.

See Brain Inference Providers for the full provider inventory, adapter architecture, and traffic split configuration.

Invariants

  • Credit deductions happen at request time, not completion time.
  • NSFW pricing and full-access credit behavior must follow the media credit-cost standard.
  • Brain owns generation records; Sirloin owns user-facing media state.
  • REFERENCE_TO_VIDEO requires video_prompt, video_model, and video_provider set to WAVESPEED or ATLAS; image_path and video_path are optional input references.
  • REFERENCE_TO_VIDEO requires a character body reference and frontal-face reference image before provider scheduling starts.

Error Paths

Insufficient credits stop the request before async work starts. Provider or queue failures surface through Brain status and logs. Flank-backed failures must preserve media state and execution logs for inspection.

When Brain blocks or fails a generation for moderation, it records a normalized media failure reason on the generation metadata and Sirloin persists it on media.media.failure_reason. ListMedia exposes that value as Media.media_failure_reason so Brisket can show stable user-facing failed-media messages for underage, celebrity, nudity, illegal, offensive, or generic moderation failures.

Tests And Verification

  • cd apps/sirloin && make run-tests
  • cd apps/brain && pnpm test
  • cd apps/flank && pnpm test