[Numpy-discussion] fromfile() for reading text (one more time!)

alan at ajackson.org alan at ajackson.org
Sat Jan 23 13:50:19 EST 2010


>On Mon, Jan 4, 2010 at 10:39 PM,  <alan at ajackson.org> wrote:
>>>Hi folks,
>>>
>>>I'm taking a look once again at fromfile() for reading text files. I
>>>often have the need to read a LOT of numbers form a text file, and it
>>>can actually be pretty darn slow do i the normal python way:
>>>
.....................big snip
>>
>> I agree. I've tried using it, and usually find that it doesn't quite get there.
>>
>> I rather like the R command(s) for reading text files - except then I have to
>> use R which is painful after using python and numpy. Although ggplot2 is
>> awfully nice too ... but that is a later post.
>>
>>     read.table(file, header = FALSE, sep = "", quote = "\"'",
>>                dec = ".", row.names, col.names,
>>                as.is = !stringsAsFactors,
>>                na.strings = "NA", colClasses = NA, nrows = -1,
>>                skip = 0, check.names = TRUE, fill = !blank.lines.skip,
>>                strip.white = FALSE, blank.lines.skip = TRUE,
>>                comment.char = "#",
>>                allowEscapes = FALSE, flush = FALSE,
>>                stringsAsFactors = default.stringsAsFactors(),
>>                fileEncoding = "", encoding = "unknown")
....................... big snip
>
>
>Aren't the newly improved
>
>numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#',
>delimiter=None, skiprows=0, converters=None, missing='',
>missing_values=None, usecols=None, names=None, excludelist=None,
>deletechars=None, case_sensitive=True, unpack=None, usemask=False,
>loose=True)
>
>and friends indented to handle all this
>
>Josef
>

Reopening an old thread...

genfromtxt is a big step forward. Something I'm fiddling with is trying to work
through the book "Using R for Data Analysis and Graphics, Introduction, Code,
and Commentary" by J H Maindonald (available online), in python. So I am trying
to see what it takes in python/numpy to work his examples and problems, sort of
a learning exercise for me. So anyway, with that introduction, here is a case
that I believe genfromtxt fails on, because it doesn't support the reasonable
(IMHO) behavior of treating quote delimited strings in the input file as a
single field. 

Below is the example from the book... So we have 2 issues. The header for the
first field is quote-blank-quote, and various values for field one have 1 to 3
blank delimited strings, but encapsulated in quotes. I'm putting something
together to read it using shlex.split, since it honors strings protected by
quote pairs.

I'm not an excel person, but I think it might export data like this in a format
similar to what is shown below.

" " "distance" "climb" "time"
"Greenmantle" 2.5 650 16.083
"Carnethy" 6 2500 48.35
"Craig Dunain" 6 900 33.65
"Ben Rha" 7.5 800 45.6
"Ben Lomond" 8 3070 62.267
"Goatfell" 8 2866 73.217
"Bens of Jura" 16 7500 204.617
"Cairnpapple" 6 800 36.367
"Scolty" 5 800 29.75
"Traprain" 6 650 39.75
"Lairig Ghru" 28 2100 192.667


-- 
-----------------------------------------------------------------------
| Alan K. Jackson            | To see a World in a Grain of Sand      |
| alan at ajackson.org          | And a Heaven in a Wild Flower,         |
| www.ajackson.org           | Hold Infinity in the palm of your hand |
| Houston, Texas             | And Eternity in an hour. - Blake       |
-----------------------------------------------------------------------



More information about the NumPy-Discussion mailing list