Seeking assistance - string processing.

John Machin sjmachin at lexicon.net
Tue Nov 14 06:44:43 EST 2006


billpaterson2006 at googlemail.com wrote:
> I've been working on some code to search for specific textstrings and
> act upon them insome way. I've got the conversion sorted

What does that mean? There is no sort in the computer sense, and if you
mean as in "done" ...

> however there
> is 1 problem remaining.
>
> I am trying to work out how to make it find a string like this "==="
> and when it has found it, I want it to add "===" to the end of the
> line.

The answer is at the end. Now take a deep breath, and read on carefully
and calmly:

>
> For example.
>
> The text file contains this:
>
> ===Heading
>
> and I am trying to make it be processed and outputted as a .dat file
> with the contents
>
> ===Heading===
>
> Here's the code I have got so far.
>
> import string

Not needed for this task. In fact the string module has only minimal
use these days. From what book or tutorial did you get the idea to use
result = string.replace(source_string, old, new) instead of result =
source_string.replace(old, new) sometimes? You should be using the
result = source_string.replace(old, new) way all the time.

What version of Python are you using?

> import glob
> import os
>
> mydir = os.getcwd()
> newdir = mydir#+"\\Test\\";

Try and make a real comment obvious; don't do what you did -- *delete*
unwanted code; alternatively if it may be wanted in the future, put in
a real comment to say why.

What was the semicolon for?

Consider using os.path.join() -- it's portable. Don't say "But my code
will only ever be run on Windows". If you write code like that, it will
be a self-fulfilling prophecy -- no-one will want try to run it
anywhere else.

>
> for filename in glob.glob1(newdir,"*.txt"):
>     #print "This is an input file: " + filename
No it isn't; it's a *name* of a file
>     fileloc = newdir+"\\"+filename
>     #print fileloc
>
>     outputname = filename
>     outputfile = string.replace(outputname,'.txt','.dat')

No again, it's not a file.

Try outputname = filename.replace('.txt', '.dat')
Also consider what happens if the name of the input file is foo.txt.txt
[can happen]

>     #print filename
>     #print a
>
>     print "This is an input file: " + filename + ".  Output file:
> "+outputfile

No it isn't.


>
>     #temp = newdir + "\\" + outputfile
>     #print temp
>
>
>     fpi = open(fileloc);
>     fpo = open(outputfile,"w+");

Why the "+"?
Semi-colons?

>
>     output_lines = []

Why not just write as you go? What happens with a 1GB file? How much
memory do you have on your computer?


>     lines = fpi.readlines()

Whoops. That's now 2GB min of memory you need

>
>     for line in lines:

No, use "for line in fpi"

>         if line.rfind("--------------------") is not -1:

Quick, somebody please count the "-" signs in there; we'd really like
to know what this program is doing. If there are more identical
characters than you have fingers on your hand, don't do that. Use
character.repeat(count). Then consider giving it a name. Consider
putting in a comment to explain what your code is doing. If you can,
like why use rfind instead of find -- both will give the same result if
there are 0 or 1 occurrences of the sought string, and you aren't using
the position if there are 1 or more occurences. Then consider that if
you need a a comment for code like that, then maybe your variable names
are not very meaningful.

>             new = line.replace("------------------","----")

Is that the same number of "-"? Are you sure?

>         elif line.rfind("img:") is not -1:
>             new = line.replace("img:","[[Image:")
>         elif line.rfind(".jpg") is not -1:
>             new = line.replace(".jpg",".jpg]]")

That looks like a pattern to me. Consider setting up a list of (old,
new) tuples and looping over it.

>         elif line.rfind(".gif") is not -1:
>             new = line.replace(".gif",".gif]]")
>         else:
>             output_lines.append(line);
>             continue
>         output_lines.append(new);
>

Try this:
   else:
        new = line
   fpo.write(new)

> for line in output_lines:
>     fpo.write(line)
>
> fpi.close()
> fpo.flush()

News to me that close() doesn't automatically do flush() on a file
that's been open for writing.

> fpo.close()
>
>
> I hope this gets formatted correctly :-p
>
> Cheers, hope you can help.

Answer to your question:

string1 in string2 beats string2.[r]find(string1) for readability and
(maybe) for speed too

elif "===" in line: # should be same to assume your audience can count
to 3 
   new = line[:-1] + "===\n"

HTH,
John




More information about the Python-list mailing list