[Expat-bugs] [ expat-Bugs-2020141 ] Characters lost during parsing

SourceForge.net noreply at sourceforge.net
Sun Aug 3 16:47:45 CEST 2008


Bugs item #2020141, was opened at 2008-07-17 03:22
Message generated for change (Comment added) made by hartwork
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2020141&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: Andrew D. Arenson (arenson9)
>Assigned to: Sebastian Pipping (hartwork)
Summary: Characters lost during parsing

Initial Comment:
Characters can be lost during parsing.

I'm going to attach a file. My example file is too big to be included through this interface, so I've made it available at:

http://miniscd.uits.iupui.edu/aarenson/example6.xml

I put the Perl program I used to demonstrate the error as an attachment on this submission, but here it is as well:

#!/usr/bin/perl -w
use XML::Parser;
my $XmlFile = shift @ARGV;
my $xp = new XML::Parser(Handlers => {Start => \&start,
                                      End   => \&end,
                                      Char  => \&cdata});
$xp->parsefile($XmlFile);

sub start { $curTag = lc($_[1]); }

sub end { $curTag = ""; }

sub cdata {
    my ($xp,$data) = @_;

    if ($curTag eq "globalid") { $ID = $data; }

    if ($data eq ".5") {
        print "ID: $ID; TAG: $curTag\n";
    }
}


When I use the above program on the example XML file, the last value in the XML file, '52.5', gets parsed as just '.5'.


I wonder if this is related to something that was reported twelve months ago on the XML::Parser bug list at:

   http://rt.cpan.org/Public/Bug/Display.html?id=28585

That bug report on the XML::Parser bug list has not been opened. It is still listed as New.

I'm sorry I don't know the version number of Expat that I'm using or how to determine it.


----------------------------------------------------------------------

>Comment By: Sebastian Pipping (hartwork)
Date: 2008-08-03 16:47

Message:
Logged In: YES 
user_id=1022691
Originator: NO

Hello Andrew. Are you aware that character data handlers can receive
the content of an element split among several handler calls?
For instance your "52.5" text here could come in as two calls serving
"52" first and then ".5".

I modified the script you attached to help tracing this. Please let us
know if this is happening on your machine. (On mine "52.5" is served
as a single unit from the XML file you provided.)

Best regards, Sebastian
File Added: show_err_2.pl

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2020141&group_id=10127


More information about the Expat-bugs mailing list