[AstroPy] Astropy and large VOTable files
Michael Droettboom
mdroe at stsci.edu
Mon May 18 16:29:57 EDT 2015
I understood Jennifer's question to be specific to the VOTable XML file
format and the problems are really specific to parsing XML. In
iterator/streaming interface could probably be built on top of it,
however. Arbitrary (random access) slicing isn't really possible with
XML, though.
Mike
On 05/18/2015 02:09 PM, Andrew Hearin wrote:
> Being able to read large data in chunks, make cuts on the chunks, and
> return a table of rows that pass the cuts is a pretty common data
> mining task that I think would be good to include in Astropy. I’m
> happy to (re-)raise a GitHub issue for this purpose, and contribute
> some code, but first: Jennifer, this is the functionality you are
> describing, right? If so: Mike, do you see any fundamental obstacles
> with this?
>
>
>
> On May 18, 2015, at 2:00 PM, Michael Droettboom <mdroe at stsci.edu
> <mailto:mdroe at stsci.edu>> wrote:
>
>> Thanks for the question.
>>
>> Unfortunately, it will read the entire file into memory each time.
>> It does read it in as a Numpy array, so the memory used should
>> generally be less than the space on disk, however, depending on the
>> content.
>>
>> XML doesn't really support the kind of slicing that FITS (or another
>> binary format) can, because you can't know how big something is (or
>> even what it is!) without parsing the whole file. That said, given
>> the constraint of the file format, minimal memory usage is one of the
>> main design features of astropy.io.votable, so I'd recommend trying
>> it on large files and seeing how it goes. It shouldn't ever take
>> significantly more memory than a binary array of data, i.e. the same
>> as the equivalent FITS file loaded entirely into memory.
>>
>> Cheers,
>> Mike
>>
>> On 05/17/2015 10:11 AM, Jennifer Baldwin wrote:
>>> Hi all,
>>>
>>> I was trying to find an answer to this but could not. I am wondering
>>> if parse_single_table will attempt to read an entire VOTable file?
>>> Or if it will operate the same way as for FITS files so that when
>>> you slice the returned data array, it only loads the part it needs
>>> into memory? I'm concerned with how it will perform with extremely
>>> large xml files, but could not find a direct answer anywhere in the
>>> documentation.
>>>
>>> Thanks!
>>>
>>>
>>> _______________________________________________
>>> AstroPy mailing list
>>> AstroPy at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.scipy.org_mailman_listinfo_astropy&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=AHkQ8HPUDwzl0x62ybAnwN_OEebPRGDtcjUPBcnLYw4&m=fqrZPrNFrzwqmHSxKJ-shiCsIXJN8_SWmuwg5yOr9sA&s=m6R7fy7bDIllNOJ0BaVKj5GdN1j87_QtxcNSxOty56I&e=
>>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20150518/36c43754/attachment.html>
More information about the AstroPy
mailing list