[Web-SIG] HTML parsing - get text position and font size

Dirkjan Ochtman dirkjan at ochtman.nl
Mon Jan 12 13:16:00 CET 2009


2009/1/12 Girish Redekar <girish.redekar at gmail.com>:
> is still tedious as font sizes in html/css can be expressed in multiple
> methods (like <FONT> tags, sizes in pixels, relative sizes, default larger
> size for header etc). One can get down and code each of these cases, but I
> was hoping someone has already (and reliably) worked on the same

So basically you want a full-on headless browser? Pretty non-trivial.

Your best bet would probably be to hook into a Mozilla instance
somehow (PyXPCOM, anyone?) and try to read the styles from the DOM
there.

Cheers,

Dirkjan


More information about the Web-SIG mailing list