Extracting data from dump file

John Machin sjmachin at lexicon.net
Fri Nov 23 16:28:59 EST 2007


On Nov 24, 5:46 am, TYR <a.harrow... at gmail.com> wrote:
> I have a large dump file that originated in a MySQL db; I need to get
> it into an SQLite file.
>
> Various options are suggested around the web; none of them seem to
> work (most failing to import the thing in the first place). So I
> removed the assorted taggery from each end, leaving just a big text
> file taking the following format:
>
> ('value', 'value', 'value, 'value'),
> ('value','value','value','value')...

Consider the possibility that individual data rows are separated by "),
\n(" or maybe "), \n(" or something else, not "),(".

What does
    print repr(open("my_dump.txt", "rb").read(1000))
tell you? Are there any "\n" in there at all? BTW, what platform are
you running on?

>
> I planned to find some way of splitting the thing at the commas
> outside the bracketed groups, thus giving me a list of tuples; then I
> could of course CREATE TABLE foo VALUES  '1', '2, '3', '4' and then

CREATE TABLE foo VALUES '1', .... are you sure?


> iterate through the list INSERTing INTO.
>
> Then my problems began; I tried using the python csv method, replacing
> the string ),( with \t and then using \t as the delimiter. First
> problem; there's a size limit coded into the module. No problem, use
> csv.field_size_limit() to alter it.

>>> import csv
>>> csv.field_size_limit()
131072
>>>

And do you really expect that any one of your data rows will take more
than 128KB in the dump file?? How large did you have to set the limit
to make it "work"???

> Problem; it doesn't actually parse
> at all,

Sounds like it did parse, with the result being one csv row containing
one very long field.

> just sends the whole thing as a string and the SQL INSERT
> fails with a "not enough args" error.
>
> Tried using string.split() and re.split(data, r'\t'); first gave the
> same error, second failed with a "too many named groups" error.

All of these are indications that you don't have the correct
delimiter.

> Tried
> using ; as a delimiter and going back to csv; this fails to match
> the ; for some reason. Any ideas?

Examine your dump carefully. Are there any newlines other than at the
logical end of data rows (like e.g. it's folded at column 78)? If so,
you'll have to replace those by spaces or delete them. Work out
exactly what is separating data rows. Let's call it "sep". If you can
use file_contents.replace(sep, "\t"), then you can use
file_contents.split(sep) directly; there's no need for a two step
approach. If sep is variable, then use re.split.

Try this on a small subset of your dump file. Before you go anywhere
near the database, check the results of your split: did you get the
expected number of rows [currently you are getting 1 row], does each
row look OK? Unsubtle hint: the first row may start with "junk(" and
the last may end with ")junk" if you haven't already allowed for that.

Then try inserting into the database. Print a row number and row
contents before each insert (or wrap try/except around each insert) so
that you know which row any error relates to.

If you still have a problem, ask again, but do show exactly what your
dump file looks like and what your code is -- my crystal ball needs a
long recharge after all of the above!

HTH,
John



More information about the Python-list mailing list