A reference video is the source clip that drives AI video analysis or motion transfer in clone-video workflows. The reference is uploaded by the user and tells the model what to copy — structural patterns (hooks, pacing, scene composition) for video analysis, or actual motion (gestures, camera path) for Motion Control.
In UGC Copilot's clone-video mode, the reference video has specific constraints:
- Format: MP4, WebM, or MOV - Length: ≤60 seconds for deep analysis, ≤30 seconds of usable motion for Kling 2.6 Motion Control in video orientation, ≤10 seconds for image orientation - Hosting: Must be on Firebase Storage (the reference URL is validated server-side as an SSRF guard against arbitrary external URLs) - Subject: Single dominant subject works best; multi-person reference clips confuse motion transfer
The reference clip does most of the work in a Motion Control render. Reference quality bounds output quality — clean, well-lit, motion-forward references produce clean clones; chaotic or static references produce poor ones.
Picking a usable reference
The reference selection step is the single highest-leverage decision in a clone-video workflow. A practitioner's checklist:
- Single creator on screen, well-lit, vertically framed - Motion-forward content (dance, gesture, walking shot, product demonstration) - Trim to lead with the most representative motion segment - Match output aspect ratio (9:16 in, 9:16 out)
What ruins a reference: multiple people in frame, heavy text overlay obscuring the subject, drastic lighting changes, long static segments, mismatched aspect ratio.