[Ncr-Python.in] [ILUGD] Which technology is better for large database calculations

Vinay Dahiya vinay.not.nice at gmail.com
Mon May 12 21:22:30 CEST 2014


Hey Raakesh,
You can use iterator from the openpyxl for the excel reading problem
it works really nicely.
If you have to use relational database then MySql and Postgres are a
good option cause they can take 1 million row easily. We have our
website running on a mysql database that have around 10 million
objects in just one table.
But if you can go for nosql databases i would like to propose some
crazy ideas like mongodb and elasticsearch both have lots of api's to
do trivial relational database operations. And are quite scaleable
option too.

On Tue, May 13, 2014 at 12:31 AM, Saurabh Kumar <thes.kumar at gmail.com> wrote:
> You should probably consider pandas and using it's HDF5 store [1]. I am also
> sharing some links that you might
> find helpful[2][3].
>
> [1]: http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables
> [2]: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
> [3]: http://stackoverflow.com/a/14268804/782901
>
> Disclaimer: I'm not a pandas expert.
>
> Cheers,
>
> Saurabh Kumar
> http://keybase.io/theskumar
>
>
>
> On Mon, May 12, 2014 at 11:47 PM, Raakesh kumar <kumar3180 at gmail.com> wrote:
>>
>> Hi All,
>> I am seeking help in terms of DB technology for a specific requirement. I
>> have done some research but can not conclude which one is best.
>> So the requirement is:
>> I have an application and have 3-4 excel documents in specified
>> format(expecting 1 Million of rows). I have to upload it and save into some
>> tables and run some calculation against data base values. After that it will
>> present me with some numbers/chart based on calculations.
>>
>> So my first question is, how should i select my database technology which
>> will stand better in this scenario. I found that MySql can perform read
>> operations perfectly in these scenario but if someone can help me understand
>> it properly? I also read that Postgre is a better choice in terms of
>> reliability and for a better structure, although i am not sure.
>>
>> Second question is how much time will it take to upload 1 million rows of
>> data from Excel, considering there will be 6 columns? Or how can i optimize
>> to speed up read and write operations for such a large data.
>>
>> Third, What are the technologies at application development level
>> (programming level), i can use to achieve a better performance. One solution
>> i can think of is to use Hadoop but need guidance on this too.
>>
>> Thanks
>>
>>
>> --
>> Regards
>> RAKESH KUMAR
>> http://www.raakeshkumar.in
>>
>>
>> _______________________________________________
>> https://mail.python.org/mailman/listinfo/ncr-python.in
>> Mailing list guidelines
>> :http://python.org.in/wiki/NcrPython/MailingListGuidelines
>
>
>
> _______________________________________________
> https://mail.python.org/mailman/listinfo/ncr-python.in
> Mailing list guidelines
> :http://python.org.in/wiki/NcrPython/MailingListGuidelines



-- 
pygoku
(Vinay Dahiya)
http://vinaydahiya.com


More information about the Ncr-Python.in mailing list