Video file to subtitles file

Chris Angelico rosuav at gmail.com
Sat Aug 29 09:36:51 EDT 2020


On Sat, Aug 29, 2020 at 11:15 PM Barry Scott <barry at barrys-emacs.org> wrote:
> > On 29 Aug 2020, at 12:51, Muskan Sanghai <muskansanghai at gmail.com> wrote:
> > On Friday, August 28, 2020 at 10:59:29 PM UTC+5:30, Chris Angelico wrote:
> >> Not familiar with Openshot, but it's worth looking into.
> >> Alternatively, I'd definitely recommend ffmpeg for anything like this
> >> sort of job. But if you actually need to OCR something, then you may
> >> need to do some scripting work. I don't have code to offer you, but it
> >> would involve FFMPEG to lift the images, something like Tesseract to
> >> do the actual OCRing, and then you'd write the rest of it yourself in
> >> Python.
> >>
> >> Other than that, this probably is something best done with a dedicated
> >> movie editing tool, not Python. Use what exists.
> >>
> >> ChrisA
> > I want to extract subtitles from a MPEG video (which does not have any previous subtitles)
>
> If it has no subtitles there is nothing to extract?
>
> I recall that in MPEG subtitles are RLE encoded bitmaps with timing and position data.
> Which allows the player to show this bitmap at position X, Y starting at T0 and remove at t1 etc.
> You have to track multiple subtitles at the same time.
>
> You should be able to extract the subtitle bit maps and timing data with modest work.
> You could use OCR technology to turn the subtitles into text.

That's what I was thinking of. I have a separate project that involves
grabbing image frames from the subtitles track, running them through
Tesseract (for OCR), and attempting to intelligently parse two
concurrent tracks of subtitles. Probably more complicated than needed
here though.

I don't understand the OP's request though. Extract subtitles when
there aren't any?

ChrisA


More information about the Python-list mailing list