From Product URL to 50 Ad Variants: Building an Automated UGC Pipeline with the API

A full technical walkthrough of an automated UGC pipeline built on the UGC Copilot API — persona, script, scene images, video render across Sora 2, Veo 3.1, Kling 3.0, and Seedance 2.0, with text overlays and cost math at the end.

This is the full pipeline. Feed it a product URL; get back fifty rendered ad videos with overlays, ready to push into Meta or TikTok. Every step is a real endpoint, every credit cost is from the production constants file, and every code snippet runs against the public API as it shipped after the PAYG launch on April 29, 2026.

What we are building

The pipeline takes a product URL and a small brief, then walks the same four steps you would do by hand in the web UI:

Generate an ideal influencer persona from the brief.
Generate a viral script for that persona, with several tone variants.
Generate the scene image with Nano Banana 2 (Gemini).
Render the video using the engine that fits the workflow — Seedance 2.0 for volume, Sora 2 for dialogue, Kling 3.0 for image-driven motion, or Veo 3.1 for cinematic hero spots.
Apply a text overlay and save the finished MP4.

By the end you will have working Node.js and Python snippets, an idempotency pattern that survives retries, and a real cost calculation for a 50-variant batch.

Step 0: Auth and the API key

API keys live in your profile under Profile → API Keys. The Creator tier or higher can mint up to 2 keys (Pro: 5, Business: 10). Trial accounts cannot create keys. PAYG users get the Creator-tier limits.

Auth is a Bearer token. Every public endpoint takes the same header:

Authorization: Bearer ugc_live_<your_key>

Keys are rejected from www.ugccopilot.ai origins to prevent leaked-key abuse from browsers. Use them from your backend.

Step 1: Persona generation

Call proxyGenerateIdealInfluencer. It is free (no credit deduction) because it returns a text-only persona description; the image is generated later if you want it. Pass a brief that includes the product, target customer, and platform.

POST /proxyGenerateIdealInfluencer
{
  "productDescription": "A no-mix protein powder...",
  "targetAudience": "Men 25–40 who lift but hate clumpy shakes",
  "platform": "tiktok"
}

The response includes a structured persona — name, demographic, tone, visual aesthetic, content style. Cache it; you will use the same persona across all 50 variants.

Step 2: Scripts (and tone variants)

The base call is proxyGenerateViralScript at 1 credit. For a 50-variant run, you want diversity — generate one base script, then call proxyRegenerateScriptWithTone with different tones (excited, skeptical, "annoyed friend", calm explainer, hard-pivot hook). Each tone regen is 1 credit.

Practical tip: do not regenerate 50 unique scripts. Generate 5 base scripts and 10 tone variants per base. You get diversity that actually lands, not 50 paraphrases of the same hook.

// Node.js — five base scripts in parallel
const baseScripts = await Promise.all(
  Array.from({ length: 5 }, () =>
    fetch(API + '/proxyGenerateViralScript', {
      method: 'POST',
      headers: { Authorization: `Bearer ${KEY}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ persona, productDescription, platform: 'tiktok' }),
    }).then(r => r.json())
  )
);

Step 3: Scene images via Nano Banana 2 (Gemini)

Image generation runs on Nano Banana 2 (Gemini) through proxyGenerateSceneImage: 1 credit standard, 2 credits HQ. For ad variants, use HQ for hero shots and standard for everything else. The API supports a quality flag.

POST /proxyGenerateSceneImage
{
  "scriptScene": "Influencer in a sunlit kitchen, holding the product...",
  "personaId": "twn_abc123",
  "quality": "hq",
  "aspectRatio": "9:16"
}

Parallelize within your concurrency budget. Pro tier gives you 8 in-flight slots; Business 25. Image generation rate limit is 40 requests per minute per key, so the bottleneck is concurrency, not throughput.

Step 4: Render the video

This is where engine selection drives the bill. The render endpoint is proxyStartVideoGeneration, which returns an operation object with an operationName immediately and runs asynchronously. You either poll proxyCheckVideoStatus or register a webhook.

Here is the side-by-side of the four engines, using the actual cost formula from VIDEO_ENGINE_COSTS:

Seedance 2.0 — engine: "seedance". Std: 18 credits at 4s baseline; cost scales linearly (8s = 36 credits). HQ: 35 credits. Fastest renders. Best for volume.
Sora 2 — engine: "sora". Std: 18 credits at 8s baseline. HQ: 65 credits. Best dialogue and lip-sync at standard quality.
Kling 3.0 — engine: "kling". Std: 25 credits at 6.4s baseline. HQ: 50 credits. Image-to-video; pass a reference frame for predictable motion.
Veo 3.1 — engine: "veo". Std: 40 credits, fixed cost regardless of duration. HQ: 130 credits, fixed. Cinematic; use for hero ads and longer cuts (15s+).

Sample request:

POST /proxyStartVideoGeneration
Headers:
  Authorization: Bearer ugc_live_...
  Idempotency-Key: ad_variant_42

{
  "engine": "seedance",
  "quality": "std",
  "duration": 4,
  "scriptText": "...",
  "sceneImageUrl": "https://...",
  "personaId": "twn_abc123",
  "webhookUrl": "https://your-app.com/api/ugc-callbacks"
}

The Idempotency-Key is critical. If your job retries (network blip, process restart), the same key returns the cached response instead of double-billing. The TTL is 24 hours per (uid, endpoint, key).

Polling vs webhooks

Veo 3.1 renders take the longest (60–180s), Seedance 2.0 the shortest (20–45s). For batches of 5 or fewer, polling proxyCheckVideoStatus every 10–15 seconds is fine and free (no credit cost, no concurrency slot).

For batches of 20+, webhooks scale better. Register a callback URL at Profile → Webhooks (max 5 endpoints, HMAC-SHA256 signed). The events that matter are video.completed and video.failed. Deliveries retry up to 6 times with backoff (~9 hours ceiling).

// Verify the HMAC signature on the receiving side
import crypto from 'crypto';
const verify = (rawBody, signature, secret) => {
  const expected = crypto.createHmac('sha256', secret).update(rawBody).digest('hex');
  return crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(signature));
};

Step 5: Text overlay

One credit per overlay via proxyApplyTextOverlay. Pass the rendered video URL plus the overlay specification (text, position, timing, font). The endpoint runs server-side ffmpeg through Sharp/Cloud Functions.

If you want suggested overlays instead of writing them yourself, proxyGenerateOverlaySuggestions returns 3–5 candidate overlays for a given script (also 1 credit).

Putting it together: the cost of a 50-variant batch

Here is the full P&L for one 50-variant batch with one persona, five base scripts, ten tone variants per base, and one scene image per variant. Volume pack pricing of $0.08/credit:

Persona generation: free
5 base scripts × 1 credit + 50 tone regenerations × 1 credit = 55 credits ($4.40)
50 scene images × 1 credit (std) = 50 credits ($4.00)
Video render — Seedance 2.0 std (4s): 50 × 18 = 900 credits ($72.00)
Video render — Sora 2 std (8s): 50 × 18 = 900 credits ($72.00)
Video render — Kling 3.0 std (6.4s): 50 × 25 = 1,250 credits ($100.00)
Video render — Veo 3.1 std (fixed): 50 × 40 = 2,000 credits ($160.00)
Text overlays: 50 × 1 = 50 credits ($4.00)

Total for a Seedance-only run: 1,055 credits ≈ $84.40. Same batch on Veo 3.1: ≈ $172.40. The right answer is rarely "use the cheapest engine for everything" — it is "use Seedance for the test phase, then re-render the top 1–2 winners on Sora 2 HQ or Veo 3.1 HQ." That is how you get the best of both.

Concurrency, rate limits, and retries

Two limits matter when you scale up:

Per-key rate limits (per minute): 10 video, 40 image, 20 script, 10 analysis, 20 twin CRUD, 60 general. On 429s, the response includes a Retry-After header — respect it.
Concurrent in-flight credit-deducting operations: 3 (Creator/PAYG), 8 (Pro), 25 (Business). If your batch loop submits faster than slots free up, you will get back a 429 with a concurrency-exhausted code; back off and retry.

Read-only endpoints (status checks, list calls, library reads) do not acquire slots — poll freely.

Python equivalent

The Python SDK is ugc-copilot on pip. The same pipeline:

from ugc_copilot import Client
import os

client = Client(api_key=os.environ["UGC_COPILOT_API_KEY"])

persona = client.influencers.generate_ideal(
    product_description="...",
    target_audience="...",
    platform="tiktok",
)

scripts = [
    client.scripts.viral(persona=persona, product_description="...", platform="tiktok")
    for _ in range(5)
]

# Render with Seedance 2.0 std and an idempotency key per variant
for i, script in enumerate(expand_tone_variants(scripts, n=10)):
    image = client.images.scene(script_scene=script.scene_one, persona_id=persona.id, quality="std")
    job = client.videos.start(
        engine="seedance",
        quality="std",
        duration=4,
        script_text=script.text,
        scene_image_url=image.url,
        persona_id=persona.id,
        idempotency_key=f"batch_2026_04_30_v{i}",
    )

Where to take this next

The pipeline above is the production-grade default. Two natural extensions:

Wrap it in an agent. Expose each endpoint as a tool definition for Claude or GPT, and let the agent decide engine selection per brief. Patterns for that are in Plugging UGC Copilot into your AI Agent.
Productize it. The five most common money-making patterns built on top of this pipeline are in How to Make Money with the UGC Copilot API.

Frequently Asked Questions

Why not just regenerate 50 unique scripts instead of using tone variants?

You can — but past 5–7 fresh generations the model starts producing paraphrases that are barely different. Tone-variant regenerations on a known-good base script give more genuine diversity per credit. We see better hook variance from "5 base × 10 tones" than from "50 fresh."

When should I use HQ image generation vs standard?

HQ for hero scenes (the opening shot, any scene where the product is the focus) and any scene that goes into a Kling 3.0 image-to-video render. Standard everywhere else. The 1-credit difference per image adds up fast across 50 variants.

Can I mix engines in a single batch?

Yes — the engine is a per-request parameter. Many operators run 80% of a batch on Seedance 2.0 std and the remaining 20% (the highest-stakes variants) on Sora 2 std or Veo 3.1 std for direct quality comparison.

What happens if a render fails?

The webhook receives a video.failed event with a structured error code. Credits for the failed render are automatically refunded — you are only charged for what you successfully receive. If a render is stuck for over 10 minutes, you can call proxyCheckVideoStatus to trigger an internal sweep.

Is there a limit on parallel webhook deliveries?

No hard cap, but ordering is not guaranteed. Treat webhook handlers as idempotent — use the event id field (a per-real-world-event dedup key set by the server) as the dedup key on your side.