[Tutor] Reading CSV files in Pandas

Danny Yoo dyoo at hashcollision.org
Mon Oct 21 19:52:38 CEST 2013


On Sat, Oct 19, 2013 at 7:29 AM, Manish Tripathi <tr.manish at gmail.com>
wrote:
>
> I am trying to import a csv file in Pandas but it throws an error. The
format of the data when opened in notepad++ is as follows with first row
being column names:
>
> "End Customer Organization ID,End Customer Organization Name,End Customer
Top Parent Organization ID,End Customer Top Parent Organization
Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum
Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary
Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal
Year,Sales Date"
> "11027676,Baroda Western Uttar Pradesh Gramin
Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of
Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,""Hcl
Infosystems Ltd - Partnerdghftrutyhb
frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw"",Server &
CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server &
CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL
CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL
CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open
Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,""Open
Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho"",125.85,1,FY07,12/28/2006"
> "12835756,Uttam Strips Pvt Ltd,12835756,Uttam Strips Pvt
Ltd,12565538,Redington C/O Fortis Financial Services Ltd,MBS,Dynamics
ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS
SA,MBS New Customer Enhanc. Def,0,0,FY09,9/15/2008"
> "12233135,Bhagwan Singh Tondon,12233135,Bhagwan Singh Tondon,2652941,H B
S Systems Pvt Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL
CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA - New,0,0,FY09,9/15/2008"
> "11602305,Maya Academy Of Advanced Cinematics,9750934,Maya Entertainment
Ltd,336146,Embee Software Pvt Ltd,Server & CAL,Windows Server & CAL,Windows
Server HPC,Windows Compute Cluster Server,Non-specific,Open,Open V/MYO -
Rec,OLV Perpet L&SA Recur-Def,0,0,FY09,9/25/2008"
> "13336009,Remiel Softech Solution Pvt Ltd,13336009,Remiel Softech
Solution Pvt Ltd,13335482,Redington C/O Remiel Softech Solutions Pvt
Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business
Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc.
Def,0,0,FY09,12/23/2008"
> "7872800,Science Application International Corporation,2839760,GOVERNMENT
OF KARNATAKA,10237455,Cubic Computing P.L,Server & CAL,SQL Server & CAL,SQL
Server Standard,SQL Server Standard Edition,Non-specific,Open,Open
SA/UA,Deferred Open SA - Renewal,0,0,FY09,1/15/2009"
> "13096361,Pratham Software Pvt Ltd,13096361,Pratham Software Pvt
Ltd,10133086,Krap Computer,Information Worker,Office,Office Standard /
Basic,Office Standard,2007,Open,Open L,Open Std,7132.44,28,FY09,9/24/2008"
> "12192276,Texmo Precision Castings,12192276,Texmo Precision
Castings,4059430,Quadra Systems. - Partner,Server & CAL,Windows Server &
CAL,Windows Standard Server,Windows Server Standard,Non-specific,Open,Open
L&SA,Deferred Open L&SA - New,0,0,FY09,11/15/2008"
>
> Kindly note that the same file when double clicked in the csv format
opens in excel with comma separated values BUT with NO quotation marks in
each line as shown in notepad++.
>
> I have used encoding as UTF-8 which gives the following error:

Questions:

* Where is this data coming from?
* Who or what is generating this file?
* Is it being automatically generated, or is someone manually typing in the
file's content?


Knowing the answers to these questions may help to isolate what the actual
problem is.

The source of this input, if they are a good, responsible party, should be
saying up front how to interpret its bytes.  Otherwise you are being put
into a position of having to guess the proper interpretation.  Guessing can
be fun sometimes, I suppose, but I personally don't like doing it unless I
have no choice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20131021/0f89a9c8/attachment.html>


More information about the Tutor mailing list