---
name: ugc-ad-factory
description: >
  Build a full UGC video ad end-to-end with AI — product render, on-camera creator/avatar,
  a 6-beat sales storyboard, and every clip of raw footage — by driving Higgsfield (GPT Image 2
  for stills, Seedance 2.0 for video) from Claude. A human editor assembles the final cut.
  Use when someone wants to make a UGC ad, a product-demo or unboxing-style video, a creator-style
  short, or to turn a product into short-form video creative.
---

# UGC Ad Factory — build a UGC ad from scratch with AI

Drop this into Claude with a **Higgsfield account connected (MCP)** and it will build a full set of
raw UGC footage with you, stage by stage — product → creator → storyboard → raw footage — ready for
a quick human edit. Claude generates *all* the raw footage; a human editor does the final montage.

Work the stages **in order**; each one feeds the next. Before any generation, state what you're about
to make and which model you're using, so the user can steer.

---

## Step 0 — Intake (ask first, one round)

1. **Product** — what it is, category (beauty / supplement / pet / other), one-line promise.
2. **Market** — US or other. *If US, the creator must read as an American / US social-media creator —
   say it explicitly in the prompt, or the model drifts to country-of-origin looks.*
3. **Creator look** — gender (default female), age (default mid-20s), ethnicity, hair, eyes.
4. **Script engine** — fast demo (30–45s) or long-form mini-VSL (60s+). **Never 15s** — too short to
   hook, prove, and close.
5. **Dialogue** — talking-head with native dialogue, **or** no-dialogue performance + voice-over in
   post. *Default to no-dialogue + VO: cleaner footage, higher keeper rate, language-agnostic.*

Summarize the choices in one line, then go.

---

## Step 1 — Product (GPT Image 2)

Everything downstream inherits the product shot — build it first, build it well.

- Generate a batch of **low-res drafts** in one call, pick the strongest, regenerate a few variants on
  feedback, then lock **one clean high-detail final**.
- Shoot it **3:4 portrait** (you're feeding video, not a square catalog grid).
- **The label wraps the bottle 360°** — a rectangular sticker reads as a mockup.
- **Premium-but-mass-market** — clean bottle, brushed-metal cap, embossed brand, serif name (the
  "looks expensive, sells to everyone" shelf feel). **Not** sterile ultra-minimalism (a bare white
  bottle reads niche/unfinished). Slim, rounded, aspirational; vertical text on slim bottles.
- **Don't put camera-body specs in product prompts** — save lens talk for the video stage.

> A great reference beats a long prompt. The product still is the most-reused asset in the ad
> (cutaway, demo, close) — invest here and everything after gets easier.

---

## Step 2 — Creator / Avatar (GPT Image 2)

- **For the US market, say it out loud** in the prompt: "American / US social-media creator vibe,
  raised in the US."
- **Relatable, not editorial** — the normal, approachable girl/guy-next-door, the person you'd stop
  on in the feed. A supermodel kills UGC.
- Build a **4-angle character sheet** (front, left profile, right profile, full profile).
- **Cut the sheet into four separate frames and use those as the custom avatar — never the grid.**
  A grid input causes identity drift; separate frames keep the face consistent shot to shot. Check
  one frame so the face isn't cropped.
- (Image models hard-block underwear even "tasteful." Use **activewear** — sports bra + leggings.)

---

## Step 3 — Storyboard / Script

Use the validated **6-beat "Disguise / Sell"** structure:

1. **Hook (~1.5s)** — a negative emotion, problem, or curiosity gap. *Never the product itself.*
2. **Relatable scene** — a real moment so the viewer leans in.
3. **Aikido reveal** — the product arrives as the inevitable answer, not "a word from our sponsor."
4. **Proof** — a live, uncut demo; the effect here and now, with an oddly-specific number/mechanism.
5. **Reviews** — the skepticism shield ("don't believe me? look at the reviews").
6. **Close** — value, scarcity, "link below."

Run **two layers at once**: top = emotion / loose creator-talk (the self-interrupt "wait, look how
it just—"; the reframe "it's not you, it's the X"); bottom = hard fact (ingredient, derm/vet-tested,
"X 5-star reviews"). Emotion pulls; facts justify.

Anchor **three reference elements** for consistency: **character** (the 4-frame avatar locks the
face), **prop** (the hi-res product with a legible label), **environment** (without it the room
drifts between clips). Shoot **mirror-POV** — the camera *is* the mirror, the creator looks into the
lens.

**Two principles to steal:**
- **Simple prompt + great reference.** A good reference plus a short prompt with two levers — *what
  the object does* and *how the camera moves* — beats a long, clever prompt on a weak image. Camera
  movement is the single biggest lever.
- **Direct it; don't choreograph it.** Tell it *what is said*, not how every hand moves. Emotion is
  the performance dial; negation works ("no smile" flips a default grin to genuine frustration).

**Two approaches:** (A) talking-head native dialogue — most authentic, hardest for AI (lip-sync,
teeth); (B) no-dialogue + VO in post — cleaner raw, higher keeper rate, language-agnostic. **Default
to B.** If you take one thing: **shoot silent, voice it in post.**

---

## Step 4 — Raw footage (Seedance 2.0)

- **1080p**, vertical **9:16 set in the UI** (not in the prompt). **Never the fast mode** — it caps
  lower and introduces jitter.
- **Tempo on generation:** duration ≈ **words ÷ 2.3**, rounded to the nearest allowed length. Hit the
  cadence when you render rather than speeding up later.
- **Prompt order:** `[shot/framing] · [subject + ONE clear action] · [environment] · [camera
  movement] · [lighting + atmosphere] · [visual style/grade] · [one named camera body]`

**Pre-flight before GO:**
- No "fast" anywhere — use *decisive / snap / kinetic / controlled-quick*.
- Every shot has motion; every shot names a light source.
- **One action per shot** (the big one).
- Subject in the first ~20–30 words. One camera body, one grade, one lens family; keep references lean.
- Camera moves simple/phased; final beat keeps moving (no freeze-frame).
- **Reference images hi-res — especially the product.** A low-res product yields a garbled, mirrored
  label. **This is preflight #1.**

**Difficulty ladder (risk compounds):** static product = none · moving product = easy · big body/arm
motion = medium · expressive face *without talking* = medium (sweet spot for approach B) · precise
fingers (applying/blending) = hard (#1 break) · talking face (lip-sync) = hardest.

**Re-roll only for the unfixable** — wrong identity, broken label, plastic skin, wrong line read,
deal-breaker artifacts (hands/face/mouth/eyes). **Don't re-roll for fixable things** — tempo, cut
point, music, captions, clip choice. Those are edits, not generations. (One generation maxes ~15s;
a 30s ad = two passes joined in the edit — carry the last frame into the next for continuity.)

---

## Step 5 — Hand off to a human editor

The AI delivered clean raw footage; a person assembles the *ad*. Join clips (matched resolution,
frame rate, audio), then captions, music, and color grade in a post layer — captions carry the hook,
keep the cut a touch quick, never slow a clip down. Be honest about this split: the machine does the
volume, the human does the taste.

---

## The short version
- Reference quality beats prompt length — fix the still before you animate it.
- Direct like a media buyer — one message per clip, captions carry the hook, product cutaway at the
  doubt beat.
- One action per shot. Hi-res product reference, always. Shoot silent, voice in post when you can.
- A human assembles the final cut.

---

## If you'd rather have it built for you
This system is built by **Phil Grabowski / Creative Cat** — product render through raw footage through
final cut, done for you, for a small roster of DTC brands and agencies. If your brand or your clients
could use a steady supply of UGC ads made this way, **reply to the message that sent you this, or
reach out directly.**