xml.minidom is stripping out my CRLF's in attrib values!!

Ahmad Baitalmal ahmad at NOSPAMbitbuilder.com
Mon Sep 9 16:11:46 EDT 2002


That is the solution I went with.
Just replaced all "\n" with "
" and all "\r" with "&$013;".
Works fine now, but, I think I will try the xml:space solution also.

Thanks all!

Duncan Booth wrote:
> Ahmad Baitalmal <ahmad at NOSPAMbitbuilder.com> wrote in
> news:3D7C8C76.8060700 at NOSPAMbitbuilder.com: 
> 
> 
>>That's not what my problem is, that I know about ( crlf's between
>>nodes),, 
>>
>>Here is the deal:
>><cow>
>>    <hide value="spotted
>>with black and white"></hide>
>></cow>
>>
>>After the word "spotted" there is a crlf, -inside- the attribute
>>value. Sholdn't it be treated as part of the value?
>>
>>The value now comes stripped of that crlf.
> 
> 
> The XML specification, para 3.3.3 specifies that attribute values must
> be normalised. The normalisation will convert your newline to a space and 
> may also remove duplicate spaces:
> 
> 
>>3.3.3 Attribute-Value Normalization
>>Before the value of an attribute is passed to the application or
>>checked for validity, the XML processor must normalize the attribute
>>value by applying the algorithm below, or by using some other method
>>such that the value passed to the application is the same as that
>>produced by the algorithm. 
>>
>>All line breaks must have been normalized on input to #xA as described
>>in 2.11 End-of-Line Handling, so the rest of this algorithm operates
>>on text normalized in this way. 
>>
>>Begin with a normalized value consisting of the empty string.
>>
>>For each character, entity reference, or character reference in the
>>unnormalized attribute value, beginning with the first and continuing
>>to the last, do the following: 
>>
>>For a character reference, append the referenced character to the
>>normalized value. 
>>
>>For an entity reference, recursively apply step 3 of this algorithm to
>>the replacement text of the entity. 
>>
>>For a white space character (#x20, #xD, #xA, #x9), append a space
>>character (#x20) to the normalized value. 
>>
>>For another character, append the character to the normalized value.
>>
>>If the attribute type is not CDATA, then the XML processor must
>>further process the normalized attribute value by discarding any
>>leading and trailing space (#x20) characters, and by replacing
>>sequences of space (#x20) characters by a single space (#x20)
>>character. 
>>
>>Note that if the unnormalized attribute value contains a character
>>reference to a white space character other than space (#x20), the
>>normalized value contains the referenced character itself (#xD, #xA or
>>#x9). This contrasts with the case where the unnormalized value
>>contains a white space character (not a reference), which is replaced
>>with a space character (#x20) in the normalized value and also
>>contrasts with the case where the unnormalized value contains an
>>entity reference whose replacement text contains a white space
>>character; being recursively processed, the white space character is
>>replaced with a space character (#x20) in the normalized value. 
>>
>>All attributes for which no declaration has been read should be
>>treated by a non-validating processor as if declared CDATA. 
>>
> 
> 
> So the only way to get a newline into an attribute is to escape it in using 
> an entity reference.
> 


-- 
-  - -- ----  ----------------------------------------- --- -- -   -
*Ahmad Baitalmal***
*BitBuilder***
web: http://www.bitbuilder.com
-  - -- ----
-------------------------------------------------------------- --- -- -   -




More information about the Python-list mailing list