*.csv to *.txt after adding columns
Dave Angel
davea at davea.name
Wed Sep 18 03:55:52 EDT 2013
On 17/9/2013 22:28, Bryan Britten wrote:
> Dave -
>
> I can't print the output because there are close to 1,000,000 records. It would be extremely inefficient and resource intensive to look at every row.
Not if you made a sample directory with about 3 files, each containing
half a dozen lines.
> Like I said, when I take just one file and run the code over the first
> few records I get what I'd expect to see. Here's an example(non-redacted code):
>
> INPUT:
>
> import csv
>
> fileHandle = 'C:/Users/Bryan/Data Analysis/Crime Analysis/Data/'
Now, that directory specification ends with a slash. So "+" will work
correctly. But your original did not.
>
> varNames = 'ID\tCaseNum\tDate\tTime\tBlock\tIUCR\tPrimaryType\tDescription\tLocDesc\tArrest\tDomestic\tBeat\tDistrict\tWard\tCommArea\tFBICode\tXCoord\tYCoord\tYear\tUpdatedOn\tLat\tLong\tLoc\n'
>
> outFile = open(fileHandle + 'ChiCrime01_02.txt', 'w')
> inFile = open(fileHandle + 'ChiCrime01_02.csv', 'rb')
Instead of changing this code, you could have switched to a directory
containing only one file.
>
> Like I said, the output is exactly what I want, but it doesn't seem to be writing to the file and I don't know why. I said I didn't know if it was raising an exception because I'm new to Python and I didn't know if there were some methods that included "silent" errors where it would continue the code but produce the wrong results, such as not writing my files.
>
The only "silent" exception I know of is the one triggered by
sys.exit(), used to terminate the process.
> Lastly, why does everyone seem to push for os.path.join versus the method I have used? Is it just a 'standard' that people like to see?
>
Because os.path.join will be smart enough to add the slashes only if
they are necessary. That can be useful, especially if the directory
you're using as a prefix came from the user.
I think Peter's suggestion is probably right on; you don't limit your
infiles to *.csv, so you will be processing *.txt files the second time.
Another useful debugging aid would have been to add print statements
right after opening the files, perhaps something like:
print "Reading :", infile.name
print "Writing:", outfile.name
If those names had been missing slashes, I would have been vindicated,
while if they were the same, you'd know Peter had nailed it.
--
DaveA
More information about the Python-list
mailing list