Newbie problem with Python pandas

RueTheDay nospam at nospam.com
Sun Jan 6 08:57:17 EST 2013


I'm working my way through the examples in the O'Reilly book Python For 
Data Analysis and have encountered a snag.

The following code is supposed to analyze some web server log data and 
produces aggregate counts by client operating system.

###################
import json # used to process json records
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

path = '/home/rich/code/sample.txt'
records = [json.loads(line) for line in open(path)] #read in records one 
line at a time
frame = DataFrame(records)

cframe = frame[frame.a.notnull()]
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
print count_subset
####################

I am getting the following error when running on Python 2.7 on Ubuntu 
12.04:

>>>>>>
Traceback (most recent call last):
  File "./lp1.py", line 12, in <module>
    operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
AttributeError: 'Series' object has no attribute 'str'
>>>>>>>

Note that I was able to get the code to work fine on Windows 7, so this 
appears to be specific to Linux.

A little Googling showed others have encountered this problem and 
suggested replacing the np.where with a find, as so:

########
operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows' 
for a in cframe['a']]
########

This appears to solve the first problem, but then it fails on the next 
line with:

>>>>>>>>
Traceback (most recent call last):
  File "./lp1.py", line 14, in <module>
    by_tz_os = cframe.groupby(['tz', operating_system])
  File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133, 
in groupby
    sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522, 
in groupby
    return klass(obj, by, **kwds)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115, 
in __init__
    level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705, 
in _get_groupings
    ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600, 
in __init__
    self.grouper = self.index.map(self.grouper)
  File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in 
map
    return self._arrmap(self.values, mapper)
  File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64 
(pandas/src/tseries.c:40593)
TypeError: 'list' object is not callable
>>>>>>>>>

The problem looks to be with the pandas module and appears to be Linux-
specific.

Any ideas?  I'm pulling my hair out over this.



More information about the Python-list mailing list