[Tutor] reading into a text file

Alan Gauld alan.gauld at btinternet.com
Sun Jul 26 00:07:50 CEST 2009


<keithabt at beyondbb.com> wrote

> Hi I am trying to read a html document into a text file 

Hi, welcome to the tutor list.

First thing to point out is that HTML files are just text files with 
a particular structure. But so far as reading them in Python 
goes they are no different to any other text file.

> purpose of spliting the data. 

Now this is where it gets interesting, it depends what 
exactly you are trying to "split". What do you mean by
the "data"? If its the HTML elements there are specialised 
Python tools that will make this a lot easier. But if it is 
simply splitting into separate lines, read on...

> I am not sure I am doing this right. I have my html 
> document, my text file and a python script I called 
> convt.py on the desktop 

Its usually a bad idea in Windows to do anythiong on 
the Desktop, keep that as a place for putting icons to 
launch programs. Put working files into separate project
folders.

> in a folder. 

I assume this means a folder on your Desktop?

That's slightly better but still leaves problems because the 
true path to your folder is:

C:\Documents and Settings\YourName\Desktop\YourFolder

Which Windows tries to hide most of the time!

Personally I'd recommend creating a "Projects" or "Work"
folder at the top level of one of your drives (if you have more than one)
and moving your folder under that. Then the full path becomes

D:\Work\MyFolder

Which is a lot easier to deal with and less likely to run into Windows 
"cleverness" issues.

> I opened up IDLE (python GUI) opened the folder on my desk top and then 
> went to run module. 

OK, I'm still not 100% clear on what you are doing here but 
this is probably a good time to get to know the Windows 
command prompt. (Take a look at the box on the Getting Started 
topic in my tutorial for a brief intro.) Thats a better way to run your 
programs on real data IMHO.

> I keep getting an error saying "No such file or directory: 'source.txt'

If you start a command prompt

Start->Run
Type CMD, Hit OK

At the prompt

C:\WINDOWS> or similar

type 

python myscript.py

It should now find it.

> I am new to python so I really don't know if I am doing this right.  
> 
> #!/usr/bin/python
> u=open("source.txt").read()
> lines = u.split("<p><b>")
> print lines[1]

You are splitting by the sequence of <p><b>.
That is each "line" starts with a paragraph tag followed immediately 
by a bold tag, is that really what you want? If so it looks fine.

You could modify your program so that it takes the filename at the 
command line, so you can process more than one file:

python myscript foo.html

or

python myscript.py bar.html

for example

HTH,


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/



More information about the Tutor mailing list