[Tutor] Optimal solution in dealing with huge databases inpython
Eric Walstad
eric at ericwalstad.com
Thu Jan 25 23:57:29 CET 2007
Alan Gauld wrote:
> "Shadab Sayani" <shadabsayani at yahoo.com> wrote
>
>> Thank you very much for immediate response.I didnt get
>> the point of loading the data using SQL.What does that
>> mean?
>
> It means writing a SQL file that can then be run from
> the database SQL prompt. I don't know what that means
> in PostGres terms
psql <dbname> -f <loadfile>
> Loadfile.sql would in turn contain lots of SQL commands like:
>
> INSERT into CUSTOMER
> VALUES ( 'fred', 'bloggs', 45, '01773-987321');
I think PostgreSQl will wrap each of these INSERT statements into a
transaction, which will add a lot of overhead and slow the process. I
find the following pattern to result in very fast loads with the psql
command line tool:
"""
ALTER TABLE <table_name> DROP CONSTRAINT <constraint_name>;
DROP INDEX <index_name>
COPY <table_name> (<list of field names>)
FROM stdin;
<rows of tab delimited field data>
\.
CREATE INDEX <index_name> ON <table_name> (<field/fields>);
ALTER TABLE <table_name> ADD CONSTRAINT <constraint_name> FOREIGN KEY
(<field_name>) REFERENCES <other_table>(<other_field>);
"""
You can wrap parts/all of the above in BEGIN;/COMMIT; if you want them
done in a transaction.
> One thing to note is that due to caching issues you might find
> it works better if you keep the individual loader files fairly small
> - say 1000 or so records each. On other databases (DB2 for
> example) very big data files seem to be faster, it just depends on
> how the internal SQL engine works.
My SQL files are about 350MB, fwiw. I haven't tried breaking them down
to smaller files because I haven't read in the PostgreSQL docs, or
forums, that doing so would be helpful.
>> Do have any idea about the C api for Postgresql and
>> some documentation to use it?
I've not used the C interface directly. I think 'psql' will do what you
want:
man psql
or, just browse the docs online for your db version (psql --version)
<http://www.postgresql.org/docs/>
<http://www.postgresql.org/docs/8.1/interactive/sql-copy.html>
Best regards,
Eric.
More information about the Tutor
mailing list