[Csv] PEP 305

sjmachin at lexicon.net sjmachin at lexicon.net
Fri Oct 3 00:45:53 CEST 2003


The data in longs.csv has suffered a triple-witching, and could be recovered easily by 
reversing the spells:
(1) remove two instances of " from front and back of string
(2) CSV decoding with quote char of " and delimiter = [anything not in string, e.g TAB 
character]
(3) normal European CSV decoding with quote char of "" and period/dot as the delimiter

Well easily using my homebrew 'delimited' module anyway :-)

>>> import delimited
>>> guff = '"""INTC.""Intel 
Corporation"".""1"".""2,07"".""0,22"".""13,00"".""53.669.700"".""28,37"""""'
>>> unpk1 = delimited.unpacker(delimiter="\t")
>>> unpk2 = delimited.unpacker(delimiter=".")
>>> guff2 = guff[2:-2]
>>> guff2
'"INTC.""Intel Corporation"".""1"".""2,07"".""0,22"".""13,00"".""53.669.700"".""28,37"""'
>>> guff3 = unpk1(guff2)
>>> guff3
['INTC."Intel Corporation"."1"."2,07"."0,22"."13,00"."53.669.700"."28,37"']
# interesting that the ticker code (INTC) is *not* quoted
>>> guff4 = unpk2(guff3[0])
>>> guff4
['INTC', 'Intel Corporation', '1', '2,07', '0,22', '13,00', '53.669.700', '28,37']

which appears to be what Roberto expected.

> 
> (Let's keep csv at mail.mojam.com in the loop.  This is good input for
> all of us.) 
> 
>     >> Using the attached CSV file (which I think is correct and uses
>     your >> screener object, I get >> >> ['INTC', 'Intel Corporation',
>     '1', '2,07', '0,22', '13,00', '53', '669', '700', '28,37'] >> >>
>     which looks fine to me. >> Roberto> but it doesn't to me, because
>     53, 669, 700 are not three Roberto> different data, but the single
>     number 53669700, only, as you Roberto> can see in the following
>     line, is represented with dots as Roberto> usual in financial
>     conventions.
> 
> I understand that it wasn't quite right.  I had to guess about the
> quoting. It's still all wrong.  It's not just that there are extra
> quotation marks at the beginning and the end (the ones you stripped),
> it's that every other quotation mark is doubled.  The parser only
> supports a single character quote character, so they are a problem.
> 
> One thing you can do to make like easier is to write a generator
> function which sits between the file and the parser.  It will strip
> the extra quotes in each line.
> 
> I've attached a simple Python script (which requires Python 2.2 or
> 2.3) that seems to work correctly, as well as your longs.csv file
> (with the extra leading and trailing triple quotes) so the other
> developers can see it.
> 
> Skip
> 
> 




More information about the Csv mailing list