who spoke when

Did you notice the coloured bar with stripes under the video?

What is that bar?

Each colour segment in that bar represents the time a different speaker spoke in the video. This is called "speaker diarisation", it identifies who spoke and when. This is a different problem from voice recognition, we are not trying to identify what is spoken.

Why do that?

Well, in this case this is useful to identify different segments of the talk, so I can skip the introduction, jump through the questions in a Q&A, identify a discussion between groups of speakers, get an idea of the general flow of the talk... basically it gives a "bird's ear view" of the media and facilitates its navigation. Yes you can already click and drag along the video slider to see snapshots in youtube videos but you still have to browse slowly through all snapshosts. Also audio files don't have snapshots.

This is great! Why is this feature not all over youtube?

The implementation described below is not scalable as is. The computation to identify the different segments is extremely intensive, it can take almost as long as just playing the media itself


How to buid your own diarisation bars:

download the audio of the video/podcast

        youtube-dl -x 'http://www.youtube.com/watch?v=klZWuI6Fqgk' -o edge.of.sky.m4a

convert it to 16kHz 16bit mono PCM

        ffmpeg -i edge.of.sky.m4a -acodec pcm_s16le -ac 1 -ar 16000 edge.of.sky.wav

apply the diarisation tool from lium3 (links below) to generate the segments file

        java -Xmx2048m -jar ./lium_spkdiarization-8.4.1.jar --fInputMask=./what.if.wav --sOutputMask=./edge.of.sky.seg --doCEClustering edge.of.sky

run the R script to generate the bar

        R CMD BATCH diarise.r

finally open edge.of.sky.png and insert/embed under videos/podcasts


The chosen colour palette (Accent1), based on colorbrewer2,  is optimised for categorical data, in this case different speakers, it provides maximum hue contrast between colours. It is also suitable for dichromats.


This speaker diarisation bar was inspired from the "moodbar" that I've been using for many years to navigate music files.









other speaker diarisation tools:




No votes yet

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.