[Python-Dev] beta1 coming real soon

"Martin v. Löwis" martin at v.loewis.de
Tue Jun 13 20:33:59 CEST 2006


Walter Dörwald wrote:
> And passing MB_ERR_INVALID_CHARS in a call to MultiByteToWideChar()
> doesn't help either, because AFAICT there's no information about the
> error location. What could work would be to try MultiByteToWideChar()
> with various string lengths to try to determine whether the error is due
> to an incomplete byte sequence or invalid data. But that sounds ugly and
> slow to me.

That's all true, yes.

>> but can't possibly work for ISO-2022.
> 
> So does that mean that IsDBCSLeadByte() returns garbage in this case?

IsDBCSLeadByteEx is documented to only validate lead bytes for selected
code pages; MSDN versions differ in what these code pages are. The
current online version says

"This function validates leading byte values only in the following code
pages: 932, 936, 949, 950, and 1361."

whereas my January 2006 MSDN (DVD version) says

"IsDBCSLeadByteEx does not validate any lead byte in multi-byte
character set (MBCS) code pages, for example, code pages 52696, 54936,
51949 and 5022x."

Whether or not this is relevant for IsDBCSLeadByte also, I cannot tell:
- maybe they forgot to document the limitation there as well
- maybe you can't use one of the unsupported code pages as CP_ACP,
  so the problem cannot occur
- maybe IsDBCSLeadByte does indeed work correctly in these cases, when
  IsDBCSLeadByteEx doesn't

The latter is difficult to believe, though, as IsDBCSLeadByte is likely
implemented as


BOOL IsDBCSLeadByte(BYTE TestChar)
{
  return IsDBCLeadByteEx(GetACP(), TestChar);
}

Regards,
Martin


More information about the Python-Dev mailing list