newbe question about removing items from one file to another file
Simon Forman
rogue_pedro at yahoo.com
Mon Aug 28 02:03:29 EDT 2006
Eric_Dexter at msn.com wrote:
> def simplecsdtoorc(filename):
> file = open(filename,"r")
> alllines = file.read_until("</CsInstruments>")
> pattern1 = re.compile("</")
> orcfilename = filename[-3:] + "orc"
> for line in alllines:
> if not pattern1
> print >>orcfilename, line
>
> I am pretty sure my code isn't close to what I want. I need to be able
> to skip html like commands from <defined> to <undefined> and to key on
> another word in adition to </CsInstruments> to end the routine
>
> I was also looking at se 2.2 beta but didn't see any easy way to use it
> for this or for that matter search and replace where I could just add
> it as a menu item and not worry about it.
>
> thanks for any help in advance
If you're dealing with html or html-like files, do check out
beautifulsoup. I had reason to use it the other day and man is it ever
useful!
Meantime, there are a few minor points about the code you posted:
1) open() defaults to 'r', you can leave it out when you call open() to
read a file.
2) 'file' is a builtin type (it's the type of file objects returned by
open()) so you shouldn't use it as a variable name.
3) file objects don't have a read_until() method. You could say
something like:
f = open(filename)
lines = []
for line in f:
lines.append(line)
if '</CsInstruments>' in line:
break
4) filename[-3:] will give you the last 3 chars in filename. I'm
guessing that you want all but the last 3 chars, that's filename[:-3],
but see the os.path.splitext() function, and indeed the other
functions in os.path too:
http://docs.python.org/lib/module-os.path.html
5) the regular expression objects returned by re.compile() will always
evaluate True, so you want to call their search() method on the data to
search:
if not pattern1.search(line):
But, 6) using re for a pattern as simple as "</" is way overkill. Just
use 'in' or the find() method of strings:
if "</" not in line:
or:
pos = line.find("</")
if pos == -1:
print >>orcfilename, line
else:
print >>orcfilename, line[:pos]
7) the "print >> file" usage requires a file (or file-like object,
anything with a write() method I think) not a string. You need to use
it like this:
orcfile = open(orcfilename, 'w')
#...
print >> orcfile, line
8) If you have a list of lines anyway, you can use the writelines()
method of files to write them in one go:
open(orcfilename, 'w').writelines(lines)
of course stripping out your unwanted data from that last line using
find() as shown above.
I hope this helps.
Check out the docs on file objects:
http://docs.python.org/lib/bltin-file-objects.html, but like I said,
if you're dealing with html or html-like files, be sure to check out
beautifulsoup. Also, there's the elementtree package for parsing XML
that could help here too.
~Simon
More information about the Python-list
mailing list