Video file to subtitles file

Chris Angelico rosuav at gmail.com
Sun Aug 30 15:26:27 EDT 2020


On Mon, Aug 31, 2020 at 3:16 AM Christian Gollwitzer <auriocus at gmx.de> wrote:
>
> Am 30.08.20 um 17:25 schrieb MRAB:
> > On 2020-08-30 07:23, Muskan Sanghai wrote:
> >> On Sunday, August 30, 2020 at 11:46:15 AM UTC+5:30, Chris Angelico wrote:
> >>> I recommend looking into CMU Sphinx then. I've used that from Python.
> >>> The results are highly entertaining.
> >>> ChrisA
> >> Okay I will try it, thank you.
> >>
> > Speech recognition works best when there's a single voice, speaking
> > clearly, with little or no background noise. Movies tend not to be like
> > that.
> >
> > Which is why the results are "highly entertaining"...
>
>
> Well, with enough effort it is possible to build a system that is more
> useful than "entertaining". Google did that, English youtube videos can
> be annotated with subtitles from speech recognition. For example, try
> this video:
> https://www.youtube.com/watch?v=lYVLpC_8SQE
>
> Go to the settings thing (the little gear icon in the nav bar) and
> switch on subtitles, English autogenerated. You'll see a word-by-word
> transcription of the text, and most of it is accurate.
>
> There are strong arguments that anything one can build with open source
> tools will be inferior. 1) They'll probably have a bunch of highly
> qualified KI experts working on this thing 2) They have an enormous
> corpus of training data. Many videos already have user-provided
> subtitles. They can feed all of this into the training.
>
> I'm waiting to be disproven on this point ;)
>

The OP doesn't want to use Google's services for this. That doesn't
disprove your point, but....... :)

ChrisA


More information about the Python-list mailing list