[Tutor] man pages parsing (still)

Tiago Saboga tiagosaboga at terra.com.br
Mon Sep 11 15:55:00 CEST 2006


I'm still there, trying to parse man pages (I want to gather a list of all 
options with their help strings). I've tried to use regex on both the 
formatted output of man and the source troff files and I discovered what is 
already said in the doclifter man page: you have to do a number of hints, and 
it's really not simple. So I'm know using doclifter, and it's working, but is 
terribly slow. Doclifter itself take around a second to parse the troff file, 
but my few lines of code take 25 seconds to parse the resultant xml. I've 
pasted the code at http://pastebin.ca/166941
and I'd like to hear from you how I could possibly optimize it.

Thanks,

Tiago.


More information about the Tutor mailing list