[Tutor] Optimal solution in dealing with huge databases inpython

Eric Walstad eric at ericwalstad.com
Thu Jan 25 23:57:29 CET 2007


Alan Gauld wrote:
> "Shadab Sayani" <shadabsayani at yahoo.com> wrote 
> 
>> Thank you very much for immediate response.I didnt get
>> the point of loading the data using SQL.What does that
>> mean?
> 
> It means writing a SQL file that can then be run from 
> the database SQL prompt. I don't know what that means 
> in PostGres terms

psql <dbname> -f <loadfile>


> Loadfile.sql would in turn contain lots of SQL commands like:
> 
> INSERT into CUSTOMER 
> VALUES ( 'fred', 'bloggs', 45, '01773-987321');
I think PostgreSQl will wrap each of these INSERT statements into a
transaction, which will add a lot of overhead and slow the process.  I
find the following pattern to result in very fast loads with the psql
command line tool:
"""
ALTER TABLE  <table_name> DROP CONSTRAINT <constraint_name>;
DROP INDEX <index_name>

COPY <table_name> (<list of field names>)
FROM stdin;
<rows of tab delimited field data>
\.

CREATE INDEX <index_name> ON <table_name> (<field/fields>);
ALTER TABLE <table_name> ADD CONSTRAINT <constraint_name> FOREIGN KEY
(<field_name>) REFERENCES <other_table>(<other_field>);
"""
You can wrap parts/all of the above in BEGIN;/COMMIT; if you want them
done in a transaction.


> One thing to note is that due to caching issues you might find 
> it works better if you keep the individual loader files fairly small 
> - say 1000 or so records each. On other databases (DB2 for 
> example) very big data files seem to be faster, it just depends on 
> how the internal SQL engine works.
My SQL files are about 350MB, fwiw.  I haven't tried breaking them down
to smaller files because I haven't read in the PostgreSQL docs, or
forums, that doing so would be helpful.


>> Do have any idea about the C api for Postgresql and
>> some documentation to use it?
I've not used the C interface directly.  I think 'psql' will do what you
want:
man psql
or, just browse the docs online for your db version (psql --version)
<http://www.postgresql.org/docs/>
<http://www.postgresql.org/docs/8.1/interactive/sql-copy.html>


Best regards,

Eric.


More information about the Tutor mailing list