00%
blog.info()
← Back to Home
SEQUENCE // Game Engineering

Over-Engineering a Rocket League Montage Generator with Go and OCR

Author Thorn Hall
0

VIEW THE RESULT HERE: VIDEO

The Problem

One game I play the most is Rocket League. Sometimes, when I'm playing, I do something cool and I clip it.

"Clipping it" in this context means I use the Xbox Game Bar on my PC to save the last 30 seconds of footage. The clip then gets saved to a specific folder on my PC.

The ultimate goal of clipping it is to share it. Here is the issue though:

  • Only about ~12 seconds of that clip is relevant (the setup + the goal).
  • No one has the attention span for 30-second clips (myself included).
  • I'd need to manually trim the clip to get it to an acceptable length.
  • I've been playing Rocket League for years, but I've never gone back and trimmed them manually.
  • I now have over 300+ 30-second, un-trimmed videos sitting on my hard drive.

The Solution

I decided that I wanted to automate trimming these 30-second clips down to the relevant ~11-12 seconds. There was no chance I'd ever go through all 300+ clips myself. More importantly, I wanted to automate creating a Montage of these trimmed clips, complete with transitions and music, without ever opening video editing software. I don't know how to edit videos, but I do know how to code.

I decided to over engineer a solution using Go, OCR, and FFmpeg.

Clip Trimming: The Goal Detector

The first challenge was automatically finding where the goal happened in the video file.

I realized that Rocket League is consistent: when a goal is scored, the text [PLAYERNAME] SCORED always appears in the exact center of the screen. If I could read that text programmatically, I would know the exact timestamp of the goal.

Why Apple Vision?

My first thought was to use Tesseract (an open-source OCR engine), but it’s notoriously slow and heavy to bundle. Since I’m developing on a Mac, I realized I could tap into the native Apple Vision framework (the same tech that lets you copy text from photos on your iPhone). It is blazingly fast and highly accurate.

To accomplish this, my Go program does the following:

  1. Extract frames using ffmpeg to extract one frame every second (1 FPS) from the video.
  2. Instead of scanning the whole image, we crop a small region in the center where the text appears. This reduces noise and speeds up processing.
  3. OCR via CGO: I wrote a small Python bridge to pass these frames from Go to Apple's VNRecognizeTextRequest.
  4. Fuzzy Matching: OCR isn't always perfect (e.g., it might read "SC0RED" or "SCOR ED"). I implemented a Levenshtein distance check to fuzzy match the word "SCORED."

Once a match is found at timestamp $T$, the program tells FFmpeg to slice the video from $T - 11s$ to $T$.

The Montage: Fighting FFmpeg

After trimming the clips, I wanted to merge them into a single highlight reel. This turned out to be the hardest part.

Attempt 1: The "Concat" Demuxer

My first attempt used FFmpeg's concat method. It was instantaneous, but the result was jarring. It was just "Hard Cut" -> "Hard Cut" -> "Hard Cut." It felt less like a montage and more like a slideshow of chaos.

Attempt 2: Complex Filters & Transitions

I decided to use FFmpeg's xfade (video transition) and acrossfade (audio transition) filters to add a "Wipe Left" effect between every clip.

This introduced a new problem: argument list explosion. Trying to chain 300 clips into a single FFmpeg command resulted in a filter graph string that was massive, causing the terminal to crash with Argument list too long.

The Final Solution: Recursive Batching

To solve this, I wrote a recursive batching system in Go:

  1. Chunking: The program splits the 300 clips into batches of 15.
  2. Rendering: It renders small mini-montages of those 15 clips using hardware acceleration (h264_videotoolbox on Mac), which sped up encoding from ~45 minutes to ~3 minutes.
  3. Stitching: Finally, it stitches the pre-rendered batches together into the final video.

The Finishing Touches

A montage isn't complete without music and a place to watch it.

  1. Audio mixing: I wrote a function that scans a directory for music, shuffles the playlist, and uses FFmpeg's acrossfade to mix the songs into a continuous radio station, fading them out exactly when the video ends.
  2. The web player: I uploaded the result to a DigitalOcean Space and built a custom HTML5 player. I used HLS (HTTP Live Streaming) to ensure it loads instantly on mobile, and styled the UI with CSS to match the neon, cyberpunk aesthetic of Rocket League.

Conclusion

Was it worth spending 8 hours writing code to avoid 2 hours of video editing?

Absolutely. Not only did I save my clips from digital purgatory, but I now have a pipeline that can generate a new season highlight reel every week while I sleep.

VIEW THE RESULT HERE: VIDEO

View Abstract Syntax Tree (Build-Time Generated)
Document
Paragraph
Text "VIEW THE RESULT HERE: "
Heading
Text "The"
Text " Problem"
Paragraph
Text "One game I play the most is..."
Text " it."
Paragraph
Text ""Clipping it" in this conte..."
Text " PC."
Paragraph
Text "The ultimate goal of clippi..."
Text " though:"
List
ListItem
TextBlock
Text "Only about ~"
Text "12 seconds of that clip is ..."
Text " goal)."
ListItem
TextBlock
Text "No one has the attention sp..."
Text " included)."
ListItem
TextBlock
Text "I'd need to "
Emphasis
Text "manually"
Text " trim the clip to get it to..."
Text " length."
ListItem
TextBlock
Text "I've been playing Rocket Le..."
Text " manually."
ListItem
TextBlock
Text "I now have over 300+ 30-sec..."
Text " drive."
Heading
Text "The"
Text " Solution"
Paragraph
Text "I decided that I wanted to ..."
Text "11-12 seconds. There was no..."
Emphasis
Text "Montage"
Text " of these trimmed clips, co..."
Text " code."
Paragraph
Text "I decided to over engineer ..."
Emphasis
Text "Go"
Text ", "
Emphasis
Text "OCR"
Text ", and "
Emphasis
Text "FFmpeg"
Text "."
Heading
Text "Clip Trimming: The Goal"
Text " Detector"
Paragraph
Text "The first challenge was aut..."
Text " file."
Paragraph
Text "I realized that Rocket Leag..."
CodeSpan
Text "[PLAYERNAME] SCORED"
Text " always appears in the exac..."
Text " goal."
Heading
Text "Why Apple"
Text " Vision?"
Paragraph
Text "My first thought was to use..."
Emphasis
Text "Apple Vision framework"
Text " (the same tech that lets y..."
Text " accurate."
Paragraph
Text "To accomplish this, my Go p..."
Text " following:"
List
ListItem
TextBlock
Text "Extract frames using "
CodeSpan
Text "ffmpeg"
Text " to extract one frame every..."
Text " video."
ListItem
TextBlock
Text "Instead of scanning the who..."
Text " processing."
ListItem
TextBlock
Emphasis
Text "OCR via CGO:"
Text " I wrote a small Python bri..."
CodeSpan
Text "VNRecognizeTextRequest"
Text "."
ListItem
TextBlock
Emphasis
Text "Fuzzy Matching:"
Text " OCR isn't always perfect (..."
Text " "SCORED.""
Paragraph
Text "Once a match is found at ti..."
Text " $T$."
Heading
Text "The Montage: Fighting"
Text " FFmpeg"
Paragraph
Text "After trimming the clips, I..."
Text " part."
Heading
Text "Attempt 1: The "Concat""
Text " Demuxer"
Paragraph
Text "My first attempt used FFmpe..."
CodeSpan
Text "concat"
Text " method. It was instantaneo..."
Text " chaos."
Heading
Text "Attempt 2: Complex Filters &"
Text " Transitions"
Paragraph
Text "I decided to use FFmpeg's "
CodeSpan
Text "xfade"
Text " (video transition) and "
CodeSpan
Text "acrossfade"
Text " (audio transition) filters..."
Text " clip."
Paragraph
Text "This introduced a new probl..."
Text " explosion."
Text "Trying to chain 300 clips i..."
CodeSpan
Text "Argument list too long"
Text "."
Heading
Text "The Final Solution: Recursive"
Text " Batching"
Paragraph
Text "To solve this, I wrote a re..."
Text " Go:"
List
ListItem
TextBlock
Text "Chunking: The program split..."
Text " 15."
ListItem
TextBlock
Text "Rendering: It renders small..."
CodeSpan
Text "h264_videotoolbox"
Text " on Mac), which sped up enc..."
Text "45 minutes to ~"
Text "3"
Text " minutes."
ListItem
TextBlock
Text "Stitching: Finally, it stit..."
Text " video."
Heading
Text "The Finishing"
Text " Touches"
Paragraph
Text "A montage isn't complete wi..."
Text " it."
List
ListItem
TextBlock
Text "Audio mixing: I wrote a fun..."
CodeSpan
Text "acrossfade"
Text " to mix the songs into a co..."
Text " ends."
ListItem
TextBlock
Text "The web player: I uploaded ..."
Text " League."
Heading
Text "Conclusion"
Paragraph
Text "Was it worth spending 8 hou..."
Text " editing?"
Paragraph
Text "Absolutely. Not only did I ..."
Text " sleep."
Paragraph
Text "VIEW THE RESULT HERE: "