So, as you might have gathered, I now work for a company called Aframe (lower case f. Very fussy about that. As if I didn’t spend enough time correct people’s usage of my name). We do video stuff.
Of course, muggins here gets to be the one responsible for an awful lot of that video stuff. We have in house expertise, but they’re largely people with a huge amount of video experience and no programming experience. So this has been a bit of a learning experience for me, and it’s far from over yet.
Anyway, one of the things we do is convert all the video we get into a version you can view on the website. It’s not really our major selling feature, but it’s the basic feature on which we build a hell of a lot of other stuff.
We’ve had this collection of video from a particular customer for a while with the embarrassing feature that the version we were producing from the website had no sound. I couldn’t figure it out at all – the original had 8 audio channels, which seemed likely to be the source of the problem, but no matter which one I played in vlc it was always silent.
Actually… turns out that was just vlc lying to us and not switching channels. Sigh. (I’d file a bug report, but channel switching clearly works in other contexts and we can’t share the video, so I don’t know how best to do that).
We revisited this yesterday and verified that actually there are perfectly good audio tracks on channels 3 and 4, but the other 6 are silent and we were defaulting to using tracks 1 and 2. The camera in question has a variety of different ways of taking audio in, and which tracks are non-silent depends on which audio inputs you use. Of course, all the channels are there regardless of what you do, and there’s no metadata (that I could find) marking which ones contain sound.
It’s easy enough for our transcode process to select the right tracks, but combining the audio tracks is harder (some transcoders support it, some don’t). So ideally what we would do is just remove the silent audio tracks (you probably guessed this from the title).
But how do you do that? There’s no metadata saying which are silent, so you have to figure out which audio tracks are silent and which contain sound. In an automated way that is – listening to it is no good – and with the audio tracks in an arbitrary codec.
I pondered this for a little while and came up with a solution. It’s duct tape programming in the extreme, but actually it works very nicely:
We have no metadata. So we have to use the data. We don’t know what the format is, so we have to convert it to a standard one. What standard format would make it really easy to detect whether a track is silent? Well… an uncompressed one. Pick a random simple uncompressed audio format? Let’s try wav.
It turns out that detecting a silent wav file is trivially easy: It consists overwhelmingly of 0 bytes. There’s a brief header at the beginning, and then the rest of the file is just a big long chain of 0s. So for each audio track we generate a wav, check what percentage of it is 0 bytes, and if it’s >= 99.9% we declare it to be silent.
Once we’ve determined which tracks are silent it’s then a simple matter (admittedly a simple matter which took me a reasonable amount of wrestling with command line options and asking for help on IRC) to use ffmpeg to cut out the silent tracks. The invocation looks approximately like:
ffmpeg -i original.mov -y -ac 2 -map 0.0 -map 0.3 -map 0.4 -acodec copy -vcodec copy output.mov -newaudio
-ac tells the number of audio channels the output should have, the map options say which channels from the source to use (the order is significant) and -newaudio says “Don’t fail in mysterious and incomprehensible ways as a result of my having tried to change the audio channels”.
In the unlikely event that anyone who isn’t us will find this useful, I’ve started collecting the results of experiments like this into a repo on github. The code for this post is there as “removesilent”. It’s not the exact code we’re using at work (as that’s better integrated into our system), but it’s pretty close.