Proof of concept code to test distributed transcoding idea
  • Go 91.3%
  • HTML 8.2%
  • Makefile 0.5%
Find a file
2026-05-03 23:50:06 +02:00
cmd misc fixes 2026-05-03 22:43:09 +02:00
pkg slight ux changes 2026-05-03 17:16:50 +02:00
.gitignore initial work 2026-04-26 22:56:18 +02:00
go.mod finished up transcoder 2026-04-29 23:06:30 +02:00
go.sum finished up transcoder 2026-04-29 23:06:30 +02:00
Makefile initial work 2026-04-26 22:56:18 +02:00
README.md add readme 2026-05-03 23:50:06 +02:00

transcoding-poc

Built by copying a lot of code from https://git.walros.dev/marcsello/stream-poc2. Make sure to check it out first. This does mostly the same thing with one big difference: transcoding is done "remotely". Also, the on disk format is not compatible!

Components

transcoding-poc is made up of 3 components, instead of two as it was with stream-poc2 (in stream-poc2 those two components were compiled into one binary...). Each component is compiled into a separate binary.

Server

Server is the largest component, but it does seemingly little.

  • Lists media
  • Keeps track of transcoders
    • It has an internal state of all transcoders
    • It tries not to overcommit the transcoders by using a dumb reserve-free algorithm
  • Serves media
    • If original format is requested serve the original
    • If a different format is requested look it up in the cache
      • If available in cache: serve it from the cache
      • If not: pipes it trough one of the transcoders, then save the result in the cache

Usage:

bin/server <db path> <transcoder address 1> {transcoder address N...}

By default, it binds to port 8080, this can be overridden with the PORT envvar.

Ingest

(formerly import)

This is how new media gets into the db. It probes the new media with ffprobe, then splits up to different segments for different streams. This does not use the remote transcoders, it calls ffmpeg and ffprobe directly to do stuff.

Usage:

bin/ingest -db <db path> -id <new media id> -path <path to the media>

Transcoder

This basically just FFMpeg with an HTTP server.

Usage:

bin/transcoder -listen <listen address> -transcoder <transcoder pipeline>

Valid transcoder pipelines:

  • nvidia (nvdec+cuda+nvenc)
  • vaapi
  • software (libx264, libx265)

Endpoints:

  • /ping returns 200 OK
  • /config returns with the config and supported profiles
  • /status returns with some internal status
  • /job this is where the magic happens, video is streamed as the request body, and the transcoded video stream is streamed back as the response body.

Testing manually using curl:

curl -v -X POST -H 'X-Transcoder-Profile: 144p-hevc' -H 'X-Transcoder-Source: {"codec":"libx264","resolution":{"width":1920,"height":1080}}' --data-binary "@s1.ts" 'localhost:8090/job' > /tmp/asd.ts

The transcoder also has an internal limiter, to limit the number of concurrent jobs it takes. When the limit is reached, incoming jobs are put on waiting for a while, but if they keep coming, it will begin dropping them. (jobs more than capacity*4)

Changes from stream-poc2

  • On-disk format of the db changed, metadata format changed as well as the original video stream is now in a subfolder named the original profile.
  • components are now in separate binaries.
  • import renamed to ingest.
  • Video stream is now transcoded to AVC during import/ingest if it's not in a compatible format (hevc/avc).
  • Cache is now not pre-transcoded during import, instead it is automatically populated by saving the segments transcoded during watching the media stream.
  • Fixes around the codec profile.
  • Proper handling of non 16:9 media.
  • Master playlist now only offer transcoded streams that are "smaller" than the original media.
  • Support for three transcoding pipelines with scaling: nvidia, vaapi, software.
  • Added a back button on the media page.
  • Preset is no longer exposed to change by the user on the fly.
  • HTML files are now compiled into the server binary, so they are not need to be present on the filesystem during runtime.
  • Added some more ffmpeg arguments to tune for low latency transcoding in general

db structure:

  • db/ Root folder for the "db".
    • {id}/ Folder for each imported media, identified by "id".
      • meta.json All metadata used by the server to handle the video.
      • segmentationResult.json Output of the segmentation process stored for debugging purposes.
      • segments_video_{profile}.csv Output of the segment data for the video stream. Used during import, kept for debugging.
      • segments_audio_{idx}.csv Output of the segment data for the audio stream identified by "idx". Used during import, kept for debugging.
      • video/ Folder for the video segments in the original format.
        • {profile}/ Subfolder with the name of the original profile.
          • s{i}.ts Video segment number "i".
      • audio/ Folder for the audio segments in mp3 format.
        • {idx}/ Folder for the audio segments for the audio stream identified by "idx".
          • s{i}.ts Audio segment number "i".
      • cache/ Cache folder for pre-transcoded video segments.
        • video/ Video cache folder.
          • {profile}/ Folder for a specific transcoding profile.
            • s{i}.ts Transcoded video segment number "i".

Known issues

There is little to no error handling for the transcoding side, if transcoding fails (or even sometimes canceled) then it will happily write corrupted or even empty files in the cache thus breaking the player. The best you can do in this case is deleting the contents of the cache.

Results

What is a PoC without results?

I tested it locally, did some empirical measurements, and it seems like there is little added latency. Both the transcoder and the server logs how long did a transcode take from their perspective, these values are usually very close by only a few millisecond difference.

It would be interesting to test this by putting the server and the transcoder on different machines with real network between them...