[Tutor] PDF to TXT

Hongbao Chen microcore at yahoo.com.cn
Mon Jan 24 01:39:45 CET 2011


Check the python scripts and find the cause of the raise of exception. Good
luck.

Cheers

-----Original Message-----
From: tutor-bounces+microcore=yahoo.com.cn at python.org
[mailto:tutor-bounces+microcore=yahoo.com.cn at python.org] On Behalf Of
tutor-request at python.org
Sent: Monday, January 24, 2011 7:33 AM
To: tutor at python.org
Subject: Tutor Digest, Vol 83, Issue 100

Send Tutor mailing list submissions to
	tutor at python.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
	tutor-request at python.org

You can reach the person managing the list at
	tutor-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

   1. Re: What is a semantic error? (Richard D. Moores)
   2. PDF to TXT (Robert Berman)
   3. Re: Telephone app (David Hutto)
   4. Re: Telephone app (Walter Prins)
   5. Re: Telephone app (bob gailer)
   6. Re: Telephone app (Steven D'Aprano)


----------------------------------------------------------------------

Message: 1
Date: Sun, 23 Jan 2011 11:24:27 -0800
From: "Richard D. Moores" <rdmoores at gmail.com>
To: Tutor List <tutor at python.org>
Subject: Re: [Tutor] What is a semantic error?
Message-ID:
	<AANLkTikokvgUkjMV-=kBib10ZKCZkRFDK_c3Kv=jat+9 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Thanks, Tutors, for the excellent replies. I think I've got it now.

Dick


------------------------------

Message: 2
Date: Sun, 23 Jan 2011 11:56:14 -0500
From: Robert Berman <bermanrl at cfl.rr.com>
To: Tutor at python.org
Subject: [Tutor] PDF to TXT
Message-ID: <1295801774.1653.11.camel at bermanrl-desktop>
Content-Type: text/plain; charset="UTF-8"

Hi,

I am trying to convert .pdf files to .txt files. The script I am using
below is mostly taken from research done on Google and it appears to be
the one outline most consistently favored
(http://code.activestate.com/recipes/577095-convert-pdf-to-plain-text/).

I am using Win 7, Python 2.7.1.
My code:

#pdf2txt.py
import sys
import pyPdf
import os

def getPDFContent(path):
content = ""
# Load PDF into pyPDF
pdf = pyPdf.PdfFileReader(file(path, "rb"))
# Iterate pages
for i in range(0, pdf.getNumPages()):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + " \n"
# Collapse whitespace
# content = u" ".join(content.replace(u"\xa0", u" ").strip().split())
return content

def main():
pdf = sys.argv[1]
filedir,filename = os.path.split(pdf)
nameonly = os.path.splitext(filename)
newname = nameonly[0] + ".txt"
outtxt = os.path.join(filedir,newname)
f = open(outtxt,'w')
f.write(getPDFContent(pdf))
f.close()

main()
exit()

============================================================================
==================================

The program runs for a while and then dies while in one of the pypdf
functions.  The trace is below. Any insight into how to resolve this
situation will be most appreciated.

Thank you,

Robert

============================================================================
===========================================
The trace I get is:
decimal.InvalidOperation: Invalid literal for Decimal: '.'
File "C:\Users\bermanrl\Projects\ScriptSearch\testdir\pdf2txt.py", line
28, in <module>
main()
File "C:\Users\bermanrl\Projects\ScriptSearch\testdir\pdf2txt.py", line
25, in main
f.write(getPDFContent(pdf))
File "C:\Users\bermanrl\Projects\ScriptSearch\testdir\pdf2txt.py", line
13, in getPDFContent
content += pdf.getPage(i).extractText() + " \n"
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\pdf.py", line 1381, in extractText
content = ContentStream(content, self.pdf)
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\pdf.py", line 1464, in __init__
self.__parseContentStream(stream)
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\pdf.py", line 1503, in __parseContentStream
operands.append(readObject(stream, None))
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\generic.py", line 87, in readObject
return NumberObject.readFromStream(stream)
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\generic.py", line 234, in readFromStream
return FloatObject(name)
File "C:\Python27\Lib\site-packages\pyPdf-1.13-py2.7-win32.egg\pyPdf
\generic.py", line 207, in __new__
return decimal.Decimal.__new__(cls, str(value), context)
File "C:\Python27\Lib\decimal.py", line 548, in __new__
"Invalid literal for Decimal: %r" % value)
File "C:\Python27\Lib\decimal.py", line 3844, in _raise_error
raise error(explanation)



------------------------------

Message: 3
Date: Sun, 23 Jan 2011 16:04:31 -0500
From: David Hutto <smokefloat at gmail.com>
To: Alan Gauld <alan.gauld at btinternet.com>
Cc: tutor at python.org
Subject: Re: [Tutor] Telephone app
Message-ID:
	<AANLkTi=SBcaYNDjbpqXtcuc7N8vgt9-9ov5UfjL68vxe at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> Can you step back a bit and explain what it is you are trying to
> accomplish? "flow through" and "functional flow through" are meaningless
> terms in telecomms - at least so far as I am aware (after my 35 years in
> telecomms engineering...)


It's two fold. First is the obvious of conducting a call/receiving the
caller id info. The second is to send directly to the phone, and
transmit the caller id data.

By flow through, I mean that the phone has one of 2 states(on the
hook, off the hook), and three sub states(on the hook/off the hook in
use/off the hook not in use).

On the actual phone we pick up the receiver, or press the button on
the cordless to receive, but the line is always connected, meaning it
stops at the phone(terminal). I pick up the receiver, and transmit a
series of specific tones which indicate the area code, trunk number
and extension(if I remember this correctly, it's been a while since I
studied the phone itself).

So I have to receive the signal that the phone is ringing(then I'm
assuming it sends the caller id info in between rings in some form)/or
transmit a series of tones to them to connect.

So I think my main question is what modules might be relevant to doing
this? And should I be thinking of it any differently than a USB port
which has 4 pins two data(+-), and two dc current(+-)?


------------------------------

Message: 4
Date: Sun, 23 Jan 2011 21:33:33 +0000
From: Walter Prins <wprins at gmail.com>
To: David Hutto <smokefloat at gmail.com>
Cc: Alan Gauld <alan.gauld at btinternet.com>, tutor at python.org
Subject: Re: [Tutor] Telephone app
Message-ID:
	<AANLkTi=EAEQT1bL-eHJn-wnW-Wx3bgGwwNr6DjMv1Gii at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On 23 January 2011 21:04, David Hutto <smokefloat at gmail.com> wrote:

> So I have to receive the signal that the phone is ringing(then I'm
> assuming it sends the caller id info in between rings in some form)/or
> transmit a series of tones to them to connect.
>
> So I think my main question is what modules might be relevant to doing
> this? And should I be thinking of it any differently than a USB port
> which has 4 pins two data(+-), and two dc current(+-)?
>
>
I think you're thinking too low level, as alluded to by Alan this type of
stuff is done via a voice-modem that you can directly control (via serial
port) and get signals from e.g. using its command set.  A common standard
for about 3 decades has been the Hayes command set:
http://en.wikipedia.org/wiki/Hayes_command_set

As for control from Python - given that the modem would be present as a
serial (COM port) device in the system, I'd have thought that (at worst)
you'd be looking to use PySerial to interact with the modem.  There may also
be more targetted wrappers specifically wrapping modems (don't know, haven't
looked).  And as mentioned before, you can probably also use the more
abstract interface provided by the operating system (TAPI stuff).

And yes, USB is quite different from the POTS (Plain Old Telephone System).
Forget any ideas that they're anywhere the same thing.

Hope that helps.

Walter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.python.org/pipermail/tutor/attachments/20110123/9e9cfc4a/attach
ment-0001.html>

------------------------------

Message: 5
Date: Sun, 23 Jan 2011 17:41:39 -0500
From: bob gailer <bgailer at gmail.com>
To: David Hutto <smokefloat at gmail.com>
Cc: tutor at python.org
Subject: Re: [Tutor] Telephone app
Message-ID: <4D3CAEA3.60503 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 1/23/2011 4:04 PM, David Hutto wrote:

Warning - do NOT connect a telco landline to a USB port. The line
voltage when "on hook" is around 50 V and rises over 100 when ringing.
That will certainly fry the port.

There are expansion cards and other devices designed to connect to the
landline and to a phone. They also process caller id and send touch tone
signals.

I've tried to follow your explanation. It is too vague for me to make
sense of.

I guess you want to take the line that comes to you from your local
telco, stick something computer-wise between it and an ordinary analog
phone, so the computer can receive and process the caller id from an
incoming call, and also ensure that the caller id appears on the phone
itself, and use the computer to dial numbers (NOT known as caller id).

Correct so far?

> It's two fold. First is the obvious of conducting a call/receiving the
> caller id info.

May be obvious to you, but not to me! To support your query please
provide some kind of wiring diagram and define "conducting a call".
> The second is to send directly to the phone, and transmit the caller id
data.

Again this is not very precise or clear. What do you want to send to the
pone?
> By flow through, I mean that the phone has one of 2 states(on the
> hook, off the hook)
I'm OK with that.
> off the hook in use/off the hook not in use).

That is not clear.
> On the actual phone we pick up the receiver, or press the button on
> the cordless to receive, but the line is always connected, meaning it
> stops at the phone(terminal). I pick up the receiver, and transmit a
> series of specific tones which indicate the area code, trunk number
> and extension(if I remember this correctly, it's been a while since I
> studied the phone itself).
>
> So I have to receive the signal that the phone is ringing(then I'm
> assuming it sends the caller id info in between rings in some form)/or
> transmit a series of tones to them to connect.

Huh?
> So I think my main question is what modules might be relevant to doing
> this?

As someone mentioned earlier - TAPI is your friend.
> And should I be thinking of it any differently than a USB port
> which has 4 pins two data(+-), and two dc current(+-)?

As I warned above, YES.


--
Bob Gailer
919-636-4239
Chapel Hill NC



------------------------------

Message: 6
Date: Mon, 24 Jan 2011 10:28:05 +1100
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] Telephone app
Message-ID: <4D3CB985.3000006 at pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

bob gailer wrote:
> On 1/23/2011 4:04 PM, David Hutto wrote:
[...]
> I guess you want to take the line that comes to you from your local
> telco, stick something computer-wise between it and an ordinary analog
> phone, so the computer can receive and process the caller id from an
> incoming call, and also ensure that the caller id appears on the phone
> itself, and use the computer to dial numbers (NOT known as caller id).

Folks, this question has nothing to do with Python and is off-topic for
this list. Can you all take it off-list please?

There are probably communities on the Internet or Usenet that are
interested in low-level telecommunications protocols and devices. We
don't go there to talk about Python, please don't stay here talking
about their areas of expertise.


Thank you.



--
Steven


------------------------------

_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor


End of Tutor Digest, Vol 83, Issue 100
**************************************


__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com


More information about the Tutor mailing list