[Tutor] reading into a text file
Alan Gauld
alan.gauld at btinternet.com
Sun Jul 26 00:07:50 CEST 2009
<keithabt at beyondbb.com> wrote
> Hi I am trying to read a html document into a text file
Hi, welcome to the tutor list.
First thing to point out is that HTML files are just text files with
a particular structure. But so far as reading them in Python
goes they are no different to any other text file.
> purpose of spliting the data.
Now this is where it gets interesting, it depends what
exactly you are trying to "split". What do you mean by
the "data"? If its the HTML elements there are specialised
Python tools that will make this a lot easier. But if it is
simply splitting into separate lines, read on...
> I am not sure I am doing this right. I have my html
> document, my text file and a python script I called
> convt.py on the desktop
Its usually a bad idea in Windows to do anythiong on
the Desktop, keep that as a place for putting icons to
launch programs. Put working files into separate project
folders.
> in a folder.
I assume this means a folder on your Desktop?
That's slightly better but still leaves problems because the
true path to your folder is:
C:\Documents and Settings\YourName\Desktop\YourFolder
Which Windows tries to hide most of the time!
Personally I'd recommend creating a "Projects" or "Work"
folder at the top level of one of your drives (if you have more than one)
and moving your folder under that. Then the full path becomes
D:\Work\MyFolder
Which is a lot easier to deal with and less likely to run into Windows
"cleverness" issues.
> I opened up IDLE (python GUI) opened the folder on my desk top and then
> went to run module.
OK, I'm still not 100% clear on what you are doing here but
this is probably a good time to get to know the Windows
command prompt. (Take a look at the box on the Getting Started
topic in my tutorial for a brief intro.) Thats a better way to run your
programs on real data IMHO.
> I keep getting an error saying "No such file or directory: 'source.txt'
If you start a command prompt
Start->Run
Type CMD, Hit OK
At the prompt
C:\WINDOWS> or similar
type
python myscript.py
It should now find it.
> I am new to python so I really don't know if I am doing this right.
>
> #!/usr/bin/python
> u=open("source.txt").read()
> lines = u.split("<p><b>")
> print lines[1]
You are splitting by the sequence of <p><b>.
That is each "line" starts with a paragraph tag followed immediately
by a bold tag, is that really what you want? If so it looks fine.
You could modify your program so that it takes the filename at the
command line, so you can process more than one file:
python myscript foo.html
or
python myscript.py bar.html
for example
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/
More information about the Tutor
mailing list