[Expat-bugs] [ expat-Bugs-1284386 ] Byte count in large XML files fails

SourceForge.net noreply at sourceforge.net
Sun Nov 27 21:22:31 CET 2005


Bugs item #1284386, was opened at 2005-09-08 01:01
Message generated for change (Comment added) made by pointsman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1284386&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Rolf Ade (pointsman)
Assigned to: Karl Waclawek (kwaclaw)
Summary: Byte count in large XML files fails

Initial Comment:

XML_GetCurrentByteIndex(XML_Parser parser) returns a
long, which is at least on the most 32 bit Systems 32
bit long. That means, for XML input larger than 2 GByte
file size, XML_GetCurrentByteIndex() returns does not
return the right number.

Sure, such big XML files will be parsed in chunks, so
it is possbile, to keep track about the nr of overflows
by self, but come on.

It's surely a limbo dance by its own to introcude long
long in a source, so portable as expat, but that would
be it.

If you switch to long long if avaliable for this,
please consider also XML_GetCurrentLineNumber() and
XML_GetCurrentColumnNumber(). They return an int, which
is on most 32-byte systems 2 Gig. Though, I'm not
stumbled over this two limits in real life, as I in
fact did with XML_GetCurrentByteIndex(). 

----------------------------------------------------------------------

>Comment By: Rolf Ade (pointsman)
Date: 2005-11-27 20:22

Message:
Logged In: YES 
user_id=13222

Karl,

Most reasonable 32bit platforms have support for file sizes
> 2 GB these days even on 32. It was in fact a 32bit
platform, at which I stumbled over the problem. That for
your easy question.

Much harder is how to slove this in a portable way. I'm
afraid that may need platform depending #defines (with
fallback to long).

I'll go out digging what other portable software does in
this case and will come back with a more concrete proposal.

rolf




----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-11-27 19:22

Message:
Logged In: YES 
user_id=290026

Rolf,

should the type be 64 bit integer on all platforms,
or 32bit on 32bit platforms and 64bit on 64bit platforms?
I think we are talking about m_parseEndByteIndex,
POSITION.lineNumber and POSITION.columnNumber.

Options could be size_t, ptrdiff_t.
MS VC++ 6.0 does not know about long long, but it knows
about __int64. Is there an ANSI definition for 64 bit ints?

What do you suggest that works on all platforms?

Karl





----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1284386&group_id=10127


More information about the Expat-bugs mailing list