Newbie problem with Python pandas
RueTheDay
nospam at nospam.com
Sun Jan 6 08:57:17 EST 2013
I'm working my way through the examples in the O'Reilly book Python For
Data Analysis and have encountered a snag.
The following code is supposed to analyze some web server log data and
produces aggregate counts by client operating system.
###################
import json # used to process json records
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
path = '/home/rich/code/sample.txt'
records = [json.loads(line) for line in open(path)] #read in records one
line at a time
frame = DataFrame(records)
cframe = frame[frame.a.notnull()]
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
print count_subset
####################
I am getting the following error when running on Python 2.7 on Ubuntu
12.04:
>>>>>>
Traceback (most recent call last):
File "./lp1.py", line 12, in <module>
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
AttributeError: 'Series' object has no attribute 'str'
>>>>>>>
Note that I was able to get the code to work fine on Windows 7, so this
appears to be specific to Linux.
A little Googling showed others have encountered this problem and
suggested replacing the np.where with a find, as so:
########
operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows'
for a in cframe['a']]
########
This appears to solve the first problem, but then it fails on the next
line with:
>>>>>>>>
Traceback (most recent call last):
File "./lp1.py", line 14, in <module>
by_tz_os = cframe.groupby(['tz', operating_system])
File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133,
in groupby
sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522,
in groupby
return klass(obj, by, **kwds)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115,
in __init__
level=level, sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705,
in _get_groupings
ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600,
in __init__
self.grouper = self.index.map(self.grouper)
File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in
map
return self._arrmap(self.values, mapper)
File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64
(pandas/src/tseries.c:40593)
TypeError: 'list' object is not callable
>>>>>>>>>
The problem looks to be with the pandas module and appears to be Linux-
specific.
Any ideas? I'm pulling my hair out over this.
More information about the Python-list
mailing list