ascii txt to LaTeX

Harry George hgg9140 at cola2.ca.boeing.com
Thu Apr 24 10:09:19 EDT 2003


marco <marco.rossini at gmx.ch> writes:

> i wrote a python program for converting an word wrapped(!) ascii text into 
> LaTeX format. it actually doesn't do much, just finds(!) and places 
> paragraphs (\n\n), replaces apostrophes and umlauts. it does routine stuff: 
> why do boring things myself if my computer can do them for me?
> 
> feel free to use it (GPL). also i would appreciate critique on my code.
> 
> probably someone has made such (or: a better) program before, but i don't 
> care, it was easy for me to do and it's all _I_ need.
> 
>         marco
> 
> #!/usr/bin/python
> 
> # Converts a regular word-wrapped ascii text into formated LaTeX
> # detects paragraphs, replaces apostrophes, replaces umlauts
> 
> # Program written by Marco Rossini <marco.rossini at gmx.ch>
> # Copyright: GPL
> 
> from sys import argv
> from sys import exit
> from math import ceil
> from string import find
> from string import join
> from string import strip
> from string import replace
> from string import whitespace
> from string import punctuation
> 
> if len(argv) != 2: exit("txt2latex: Argument error!")
> 
> try: f = file(argv[1],"r")
> except: exit("txt2latex: File not found!")
> 
> 
> # GET THE MAXIMAL NUMBER OF CHARACTERS PER LINE
> array = f.readlines()
> linelength = 0
> for i in range(len(array)):
>     array[i] = strip(array[i])
>     if len(array[i]) > linelength: linelength = len(array[i])
> 
> # GUESS IF IT'S A PARAGRAPH BREAK
> for i in range(len(array)-1):
>     nleft = linelength - len(array[i])
>     nright = find(array[i+1]," ")
>     if nright == -1: nright = len(array[i+1])
>     # if it is, append \n\n to the line, else a space
>     if nright+1 <= nleft:
>         if len(array[i]) > 0: array[i] += '\n\n'
>     else:
>         array[i] += ' '
> 
> # the lines are joined, the text is NOT word wrapped anymore
> text = join(array,"")
> 
> # Replace apostrophes intelligent(ly?)
> i = 0;
> while i < len(text):
>     # ... before a word
>     if text[i] == '\"' and find(whitespace,text[i-1]) >= 0:
>         text = text[:i] + "``" + text[i+1:]
>         i += 1
>     # ... before a word (single)
>     if text[i] == '\'' and find(whitespace,text[i-1]) >= 0:
>         text = text[:i] + "`" + text[i+1:]
>     # ... after a word, no punctuation
>     if text[i] == '\"' and find(whitespace,text[i+1]) >= 0:
>         text = text[:i] + "''" + text[i+1:]
>         i += 1
>     if text[i] == '\"' and find(punctuation,text[i+1])>= 0:
>         text = text[:i] + "''" + text[i+1:]
>         i += 1
>     i += 1
> 
> # Here replacements for umlauts. modify if you like to.
> text = replace(text,"ä","\\\"{a}") -A
> text = replace(text,"Ä","\\\"{A}") -A
> text = replace(text,"ö","\\\"{o}") -A
> text = replace(text,"Ö","\\\"{O}") -A
> text = replace(text,"ü","\\\"{u}") -A
> text = replace(text,"Ü","\\\"{U}") -A
> 
> # handle output
> print text
> 

See also:
SDF http://www.xpenguin.biz/download/sdf/doc/

POD http://www.perldoc.com/perl5.6/pod/perlpod.html

PDX http://www.seanet.com/~hgg9140/comp/pdx-1.6.1/doc/index.html
    (my own shot at this)



-- 
harry.g.george at boeing.com
6-6M31 Knowledge Management
Phone: (425) 294-8757




More information about the Python-list mailing list