[Python-ideas] Add a builtin method to 'int' for base/radix conversion

Yuvgoog Greenle ubershmekel at gmail.com
Wed Sep 16 04:39:55 CEST 2009


also, negative numbers aren't taboo. FTFY
import string

def mirror_dict(dict):
    # list because iteration doesn't support changing of the dict
    for key, val in list(dict.items()):
        dict[val] = key

_lows = string.digits + string.ascii_lowercase
default_alphabet = dict(zip(range(len(_lows)), _lows))
mirror_dict(default_alphabet)

def encode_int(value, base, alphabet=default_alphabet):
    if value < 0:
        is_negative = True
        value = value*(-1)
    else:
        is_negative = False

    cs = []
    while True:
        value, index = divmod(value, base)
        cs.insert(0, alphabet[index])
        if not value:
            break

    if is_negative:
        cs.insert(0, '-')

    return ''.join(cs)

def decode_int(str_num, base, alphabet=default_alphabet):
    char_to_int = {c: i for i, c in enumerate(alphabet)}
    value = 0
    for c in str_num:
        value = value * base + char_to_int[c]
    return value

alphabet = '1ilI|:'
enc = encode_int(10**10, len(alphabet), alphabet)
print('|IIli|l|ili||')
print(enc)

dec = decode_int(enc, len(alphabet), alphabet)
print(dec)
print(10000000000)

print(encode_int(12345, 32))
print(encode_int(-12345, 32))

-----
The only problem with {'A': 10, 'a': 10} is that it's not reversible. If we
wantted to encode, 10, what should be used, A or a?


On Tue, Sep 15, 2009 at 7:48 PM, MRAB <python at mrabarnett.plus.com> wrote:

> Mark Dickinson wrote:
>
>> On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel at gmail.com>
>> wrote:
>>
>>> Btw, when you say translation table, do you mean just a string? Because a
>>> translation table would need to be continuous from 0 to the base so a
>>> real
>>> dicitionary-esque table may be overkill. The only advantage of a table
>>> might
>>> be to convert certain digits into multiple bytes (some sort of ad-hoc
>>> unicode use case?).
>>>
>>
>> Yes, sorry, I just meant a string (or possibly some other iterable of
>> characters).
>> Something like (3.x code):
>>
>> def encode_int(n, alphabet):
>>    if n < 0:
>>        raise ValueError("nonnegative integers only, please")
>>    base = len(alphabet)
>>    cs = []
>>    while True:
>>        n, c = divmod(n, base)
>>        cs.append(alphabet[c])
>>        if not n:
>>            break
>>    return ''.join(reversed(cs))
>>
>> def decode_int(s, alphabet):
>>    base = len(alphabet)
>>    char_to_int = {c: i for i, c in enumerate(alphabet)}
>>    n = 0
>>    for c in s:
>>        n = n * base + char_to_int[c]
>>    return n
>>
>>  alphabet = '1ilI|:'
>>>>> encode_int(10**10, alphabet)
>>>>>
>>>> '|IIli|l|ili||'
>>
>>> decode_int(_, alphabet)
>>>>>
>>>> 10000000000
>>
>> This doesn't allow negative numbers.  If negative numbers should be
>> permitted, there are some decisions to be made there too.  How are
>> they represented?  With a leading '-'?  If so, then '-' should not be
>> permitted in the alphabet.  Should the negative sign character be
>> user-configurable?
>>
>> One problem with allowing multi-character digits in encoding is that it
>> complicates the decoding:  parsing the digit string is no longer trivial.
>> I don't see how to make this a viable option.
>>
>> I'm still only +0 (now leaning towards -0, having seen how easy this
>> is to implement, and thinking about how much possible variation
>> there might be in what's actually needed) on adding something like this.
>>
>>  I'd prefer the arguments to be: value, base, optional translation table.
> The translation table would default to 0-9, A-Z/a-z (when decoding,
> multiple characters could map to the same numeric value, eg 'A' => 10
> and 'a' => 10, hence the ability to use a dict). The default translation
> table would work up to base 36; higher bases would raise a ValueError
> exception "translation table too small for base".
>
> Could a single translation table work both ways? A dict for decoding
> could contain {'A': 10, 'a': 10}, but how could you reverse that for
> encoding?
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20090916/bc06f497/attachment.html>


More information about the Python-ideas mailing list