Hi Apple Developer Forums and Final Cut Pro team,
I am developing a Final Cut Pro Workflow Extension focused on speech-to-text / subtitle recognition and generation. The extension runs inside Final Cut Pro, analyzes clips, generates accurate subtitles (often hundreds of individual elements), and allows users to drag the generated subtitles directly back into the FCP timeline as a clean, editable Compound Clip or Storyline.
We are implementing this drag-and-drop functionality using the official pasteboard mechanism (com.apple.finalcutpro.xml and versioned types such as com.apple.finalcutpro.xml.v1-10).
While the high-level documentation is helpful:
Supporting Drag and Drop for Data Sent to Final Cut Pro
FCPXML Reference
Designing Workflow Extensions
There is still no detailed public specification for the exact internal XML structure that Final Cut Pro expects for a drag operation to reliably result in a usable Compound Clip (or direct Storyline insertion), especially when dealing with large numbers of subtitle titles.
After extensive systematic testing (multiple rounds over several weeks, with full experiment logs), we have observed the following:
Short subtitle sequences work with many different structures (various combinations of , , , inline titles, etc.).
Long subtitle lists (800+ individual elements) only succeed reliably when the outer structure uses a specific shell: root= containing + (often combined with a mainflow sequence and inner / layers).
In all working cases, the dropped result appears as a “fake” / nested Compound Clip that requires 2–3 Break Apart (unpack) operations before the real editable Storyline with individual titles is revealed.
Almost all other structures — pure as root, as root, wrapper layers, direct inline titles without the clip + gap + storyline shell, etc. — are immediately rejected by Final Cut Pro when the subtitle count is high.
This undocumented behavior forces third-party Workflow Extension developers to engage in time-consuming blind guesswork and reverse-engineering just to achieve basic, reliable drag-and-drop integration.
Our request:
We kindly ask Apple to publish a detailed, official specification for the Draggable FCPXML Text Protocol (or expand the existing FCPXML Reference) that clearly defines:
The minimal and recommended XML structure for dragging content into the timeline as a Compound Clip or Storyline.
Exact roles and requirements for , , , , , and any implicit “mainflow” patterns.
Best practices for handling large numbers of nested titles/subtitles.
Reasons why certain nesting patterns are rejected or produce multi-level fake compounds.
Any version-specific differences across FCPXML DTD versions.
This specification is critically necessary for the FCPX Workflow Extension ecosystem. Reliable drag-and-drop from extensions back to the timeline is one of the most valuable integration points for subtitle/caption tools, transcription services, title generators, and other workflow utilities. Without clear guidelines, developers waste significant time on trial-and-error, leading to inconsistent user experiences and slower innovation in the Final Cut Pro community.
We are more than happy to share our complete experiment logs, working and failing XML samples, and GitHub repository with the documentation or engineering team if it helps accelerate this.
Thank you in advance for any official clarification or guidance. Clear documentation in this area would greatly benefit both developers and Final Cut Pro users.
0
0
13