PEP 249 Compliant error handling

Wed Oct 18 12:32:48 EDT 2017

On Oct 17, 2017, at 12:02 PM, MRAB <python at mrabarnett.plus.com> wrote:
> 
> On 2017-10-17 20:25, Israel Brewster wrote:
>> 
>>> On Oct 17, 2017, at 10:35 AM, MRAB <python at mrabarnett.plus.com <mailto:python at mrabarnett.plus.com>> wrote:
>>> 
>>> On 2017-10-17 18:26, Israel Brewster wrote:
>>>> I have written and maintain a PEP 249 compliant (hopefully) DB API for the 4D database, and I've run into a situation where corrupted string data from the database can cause the module to error out. Specifically, when decoding the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 86-87: illegal UTF-16 surrogate" error. This makes sense, given that the string data got corrupted somehow, but the question is "what is the proper way to deal with this in the module?" Should I just throw an error on bad data? Or would it be better to set the errors parameter to something like "replace"? The former feels a bit more "proper" to me (there's an error here, so we throw an error), but leaves the end user dead in the water, with no way to retrieve *any* of the data (from that row at least, and perhaps any rows after it as well). The latter option sort of feels like sweeping the problem under the rug, but does at least leave an error character in the s
>>> tring to
>>> l
>>>>  et them know there was an error, and will allow retrieval of any good data.
>>>> Of course, if this was in my own code I could decide on a case-by-case basis what the proper action is, but since this a module that has to work in any situation, it's a bit more complicated.
>>> If a particular text field is corrupted, then raising UnicodeDecodeError when trying to get the contents of that field as a Unicode string seems reasonable to me.
>>> 
>>> Is there a way to get the contents as a bytestring, or to get the contents with a different errors parameter, so that the user has the means to fix it (if it's fixable)?
>> 
>> That's certainly a possibility, if that behavior conforms to the DB API "standards". My concern in this front is that in my experience working with other PEP 249 modules (specifically psycopg2), I'm pretty sure that columns designated as type VARCHAR or TEXT are returned as strings (unicode in python 2, although that may have been a setting I used), not bytes. The other complication here is that the 4D database doesn't use the UTF-8 encoding typically found, but rather UTF-16LE, and I don't know how well this is documented. So not only is the bytes representation completely unintelligible for human consumption, I'm not sure the average end-user would know what decoding to use.
>> 
>> In the end though, the main thing in my mind is to maintain "standards" compatibility - I don't want to be returning bytes if all other DB API modules return strings, or visa-versa for that matter. There may be some flexibility there, but as much as possible I want to conform to the majority/standard/whatever
>> 
> The average end-user might not know which encoding is being used, but providing a way to read the underlying bytes will give a more experienced user the means to investigate and possibly fix it: get the bytes, figure out what the string should be, update the field with the correctly decoded string using normal DB instructions.

I agree, and if I was just writing some random module I'd probably go with it, or perhaps with the suggestion offered by Karsten Hilbert. However, neither answer addresses my actual question, which is "how does the STANDARD (PEP 249 in this case) say to handle this, or, baring that (since the standard probably doesn't explicitly say), how do the MAJORITY of PEP 249 compliant modules handle this?" Not what is the *best* way to handle it, but rather what is the normal, expected behavior for a Python DB API module when presented with bad data? That is, how does psycopg2 behave? pyodbc? pymssql (I think)? Etc. Or is that portion of the behavior completely arbitrary and different for every module?

It may well be that one of the suggestions *IS* the normal, expected, behavior, but it sounds more like you are suggesting how you think would be best to handle it, which is appreciated but not actually what I'm asking :-) Sorry if I am being difficult.

> -- 
> https://mail.python.org/mailman/listinfo/python-list