Media Generation
Media Generation
Purpose
Document the user-facing generation flow from Brisket through Sirloin, Brain, Round, and Flank.
Participants
- Brisket initiates generation and displays credit pricing.
- Sirloin validates usage, deducts credits at request time, and stores user-facing media state.
- Brain handles async generation records, queues, provider integrations, and calls to round.
- Round serves internal model inference.
- Flank participates when media.workflow_name is set.
Sequence
sequenceDiagram participant Brisket participant Sirloin participant Brain participant Round participant Flank Brisket->>Sirloin: Create media/generation request Sirloin->>Sirloin: Validate usage and deduct credits alt workflow_name set Sirloin->>Flank: Execute workflow else default path Sirloin->>Brain: Send media to brain Brain->>Round: Inference when needed end Brain-->>Sirloin: Generation status/result Sirloin-->>Brisket: User-facing media stateSource Links
- apps/brisket/src/hooks/use-credits.tsx
- apps/brisket/src/lib/constants.ts
- apps/sirloin/internal/app/services/media/
- apps/brain/src/modules/
- apps/flank/server/engine/
Image Moderation
Brain can moderate input or source images before generation when an image is present, and it moderates generated outputs before they become available. Pre-generation moderation uses visual checks only today. Post-generation moderation also runs demographics age scoring and can run celebrity checks for NSFW generations.
A flagged single output fails the generation. For carousel output, flagged panels can be dropped and the request fails only if no safe panels remain. See Image Moderation And Age Scoring for checkpoint rules, age-scoring thresholds, and provider fallback behavior.
State Transitions
Sirloin deducts credits when the request is accepted. Brain records and processes async generation. Flank routes workflow-backed media when media.workflow_name is set. Re-edits with source_id can be free.
Reference To Video
REFERENCE_TO_VIDEO is a Kitsune media generation path for creating a video
from a required video prompt and character reference imagery, optionally guided
by a reference video and pose image. Fennec exposes it in the media generator
with image_path, video_path, video_prompt, video_provider, and a
text-to-video model selection. image_path is a pose image and video_path is
an optional source video.
Brain supports this generation type with WaveSpeed and Atlas providers.
Before scheduling provider work, Brain resolves the character’s latest body and
frontal-face onboarding images, preferring NSFW variants when the flow is NSFW
and both NSFW body and face references exist. It upscales the body and face
references through Fal-backed UpscaleImage commands, storing each result under
a global deterministic media/reference-upscales/{hash}.{output_format} cache
key. This reference-to-video flow currently requests jpg output. Cache hits
are marked as Cached in generation events. Brain then sends those upscaled
images, the selected pose image when present, and the selected reference video
when present to the provider text-to-video endpoint. If the request has a
shorter video duration
than the selected reference video, Brain first stores or reuses a deterministic
media/reference-video-cuts/{hash}.mp4 trim and sends that shorter reference.
When a media example was originally generated by REFERENCE_TO_VIDEO, replaying
that example uses the same REFERENCE_TO_VIDEO flow. The original pose image
and source video are reused only when the source generation metadata has
sourceImagePath or sourceVideoPath; examples created from character
references alone replay without sending a reference video.
Successful output is stored as the canonical video path and a compressed 480p variant. Brain also tries to extract the first video frame with ffmpeg for the preview image; if extraction fails, it falls back to the selected pose image or the upscaled body reference.
Provider Routing
Brain routes image and video generation to external inference providers (Atlas, WaveSpeed, RunPod, FAL). Each generation type has an executor that manages an ordered fallback chain of providers — if the first provider fails to schedule, the executor tries the next.
For image and image-sequence generation, the fallback chain order is configurable at runtime via the inference-traffic-split application setting. This enables progressive rollout of new providers by controlling what percentage of requests try each ordering. Video executors use the provider specified in the job data without fallback chains.
See Brain Inference Providers for the full provider inventory, adapter architecture, and traffic split configuration.
Invariants
- Credit deductions happen at request time, not completion time.
- NSFW pricing and full-access credit behavior must follow the media credit-cost standard.
- Brain owns generation records; Sirloin owns user-facing media state.
REFERENCE_TO_VIDEOrequiresvideo_prompt,video_model, andvideo_providerset toWAVESPEEDorATLAS;image_pathandvideo_pathare optional input references.REFERENCE_TO_VIDEOrequires a character body reference and frontal-face reference image before provider scheduling starts.
Error Paths
Insufficient credits stop the request before async work starts. Provider or queue failures surface through Brain status and logs. Flank-backed failures must preserve media state and execution logs for inspection.
When Brain blocks or fails a generation for moderation, it records a normalized media failure reason on the generation metadata and Sirloin persists it on media.media.failure_reason. ListMedia exposes that value as Media.media_failure_reason so Brisket can show stable user-facing failed-media messages for underage, celebrity, nudity, illegal, offensive, or generic moderation failures.
Tests And Verification
- cd apps/sirloin && make run-tests
- cd apps/brain && pnpm test
- cd apps/flank && pnpm test