In this post, we’ll see how to use video grep, an idea and a software made by Sam Lavigne.
Everything is well explained in his GitHub repository or on this page, but I think it could be easier to follow some easy steps for people that are not so confident with technology.
This post will also serve as a reminder for me.
Download the videos with YT-DLP
Install YT-DLP with this command:
pip install yt-dlp
We are using YT-DLP to download videos. To do that just write this command with the link to the video or the playlist you want to download.
yt-dlp "https://www.youtube.com/watch?v=gxX8wZpc8ME&list=PLNlAEAan3IFU4UntqHaFtkVY501sF5awP" --sub-langs it --sub-format srt --format mp4 -f 137+140
Create subtitles with Vosk
To generate .srt we need Vosk transcriber, that is a speech to text (STT) open source software (GitHub repo). To install Vosk run
pip install vosk
If the YouTube video doesn’t have subtitles, you can create them with the Vosk speech to text.
The simplest command that you can use is this. It’s going to use the small modell.
vosk-transcriber -l it -i video.mp4 -t srt -o video.srt
If you want a more precise transcription, you can download the full model for your language. At the moment, you need to unzip the model before to use it.
vosk-transcriber -l it -i ./video.mp4 -m /home/ale/Downloads/vosk-model-it-0.22 -o ./video.srt -t srt
In this case, you have to specify where is the full model located. Note that you can transcribe also more videos at the same type, just leave ./ without specifying the name of the video.
If you are transcribing multiple videos with Vosk, in the actual version there is a bug that doesn’t allow spaces into videos.
In this way it will put – instead of spaces in every videos. Run this command once that you are in the folder where your videos are.
rename 's/\s+/-/g' *
Installing and using videogrep
To install videogrep run this.
pip install videogrep
Finally, we can use videogrep. Before cutting a video, it’s interesting to know how many times that word is repeated. To do that, we can use
-n 1 videogrep --input "./video.mp4" -n 1
Once that we know which word (or words!) we want to use we remove -n 1
videogrep --input "./video.mp4" --search 'ciao' -n 1