*.csv to *.txt after adding columns

Dave Angel davea at davea.name
Wed Sep 18 03:55:52 EDT 2013


On 17/9/2013 22:28, Bryan Britten wrote:

> Dave -
>
> I can't print the output because there are close to 1,000,000 records. It would be extremely inefficient and resource intensive to look at every row.

Not if you made a sample directory with about 3 files, each containing
half a dozen lines.

> Like I said, when I take just one file and run the code over the first
> few records I get what I'd expect to see. Here's an example(non-redacted code):
>
> INPUT:
>
> import csv
>
> fileHandle = 'C:/Users/Bryan/Data Analysis/Crime Analysis/Data/'

Now, that directory specification ends with a slash.  So "+" will work
correctly.  But your original did not.

>
> varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
>
> outFile = open(fileHandle + 'ChiCrime01_02.txt', 'w')
> inFile = open(fileHandle + 'ChiCrime01_02.csv', 'rb')

Instead of changing this code, you could have switched to a directory
containing only one file.

>
> Like I said, the output is exactly what I want, but it doesn't seem to be writing to the file and I don't know why. I said I didn't know if it was raising an exception because I'm new to Python and I didn't know if there were some methods that included "silent" errors where it would continue the code but produce the wrong results, such as not writing my files. 
>

The only "silent" exception I know of is the one triggered by
sys.exit(), used to terminate the process.

> Lastly, why does everyone seem to push for os.path.join versus the method I have used? Is it just a 'standard' that people like to see?
>

Because os.path.join will be smart enough to add the slashes only if
they are necessary.   That can be useful, especially if the directory
you're using as a prefix came from the user.


I think Peter's suggestion is probably right on;  you don't limit your
infiles to *.csv, so you will be processing *.txt files the second time.

Another useful debugging aid would have been to add print statements
right after opening the files, perhaps something like:

print "Reading :", infile.name
print "Writing:", outfile.name

If those names had been missing slashes, I would have been vindicated,
while if they were the same, you'd know Peter had nailed it.



-- 
DaveA





More information about the Python-list mailing list