Questions on Pandas

Steven D'Aprano steve at pearwood.info
Fri Jun 26 02:26:04 EDT 2015


On Fri, 26 Jun 2015 02:34 pm, Tommy C wrote:

> Hi there, I have a number of questions related to the Pandas exercises
> found from the book, Python for Data Analysis by Wes McKinney.
> Particularly, these exercises are from Chapter 6 of the book. It'd be much
> appreciated if you could answer the following questions!

Too many questions for one post!


> Does the header appear as "X.#" by default when it is set to be None?

Why don't you try setting header to something else and see what happens?


> 2.
> [code]
> Input: chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
> Input: chunker
> Output: <pandas.io.parsers.TextParser at 0x8398150>
> 
> [/code]
> 
> Please explain the idea of chunksize and the output meaning.

The output shows that chunker is a TextParser object.

The chunksize let's you set how much data is read at once. Just like it says
in your next question:

> 3.
> [code]
> The TextParser object returned by read_csv allows you to iterate over the
> parts of the file according to the chunksize. 

Do you have access to the csv file? How many rows does it take to get the
results you see below?

> For example, we can iterate 
> over ex6.csv, aggregating the value counts in the 'key' column like so:
> chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
> tot = Series([])
> for piece in chunker:
>  tot = tot.add(piece['key'].value_counts(), fill_value=0)
> tot = tot.order(ascending=False)
> We have then:
> In [877]: tot[:10]
> Out[877]:
> E 368
> X 364
> L 346
> O 343
> Q 340
> M 338
> J 337
> F 335
> K 334
> H 330
> 
> [/code]
> 
> I couldn't run the Series function successfully... is there something
> missing in this code?

I don't know. What did you do, and what error did you get? "I couldn't run
this successfully..." could mean anything:

- is the keyboard plugged in?
- did you import Pandas?
- did you make a typo?

and about a million other possible things could have gone wrong. Unless you
tell us *what you did* and *what happened*, how can we possibly guess why
you can't run the code?


> Error occured as I tried to run this code with sys.stdout.

Again, are we supposed to guess what the error was?

And now I'm bored and stopped reading. I suggest you think a bit more
carefully about the questions you ask, and the way you ask them. Imagine
that we're not watching over your shoulder to see what you did wrong when
you get an error. Don't assume that an error means the tutorial is wrong.
Please give more detail: what you did, and COPY AND PASTE the result you
got, don't summarise it, or re-type it from memory.

Ideally, you should have one question per post, or at least no more than a
few *related* questions. No, "all part of the same tutorial" doesn't make
them related. Try to give each set of questions a descriptive subject line,
so people can keep track of what you are asking and which questions they
care about and which ones they don't.



-- 
Steven




More information about the Python-list mailing list