right list for SIGABRT python binary question ?

dieter dieter at handshake.de
Thu Nov 2 02:56:50 EDT 2017


Karsten Hilbert <Karsten.Hilbert at gmx.net> writes:
>> >> It points to a memory corruption.
>> 
>> The i386/x64 architecture supports memory access breakpoints
>> and GDB, too, has support for this. You know the address which
>> gets corrupted. Thus, the following apporach could have a chance
>> to succeed:
>> 
>>    Put a "memory write" breakpoint on the address which gets corrupted.
>>    this should stop the program each time this address is written;
>>    Check then the backtrace. As the address forms part of the
>>    address block prologue, it should only be accessed from
>>    Python's "malloc" (and "free") implementation. Any other access
>>    indicates bad behaviour.
>
> I understand. Thank you for the explanation. This may seem
> like a dumb question: the actual address that gets corrupted
> varies from run to run (it may be the same "place" in the
> code but that place gets put at a different address each
> run).

That's sad.

It is a long time ago (more than 10 years)
that I had to analyse such a kind of memory corruption. Fortunately,
in my case, the address was stable accross runs.
Likely, ASLR was not yet used by that time on a standard Linux platform.

Maybe, you find a way to disable ASLR.

If ASLR is the cause of the randomness, it might also be possible to
compute the new address. More on this later.


In another message, you reported how you tried to obtain an invariant
for the affected address by using "info symbol". I have not much
hope that this will succeed:
It is most likely, that the corrupted memory block is part of the
heap (in may also be a stack block, wrongly freed; this would be
a local error - and easily detectable from the traceback).
If you use "info symbol" on a heap address, you get not very
reliable information - especially, if ASLR is in effect (which
randomizes the various process segments and the heap blocks
independently from one another).


Back to an approach how to maybe compute the corrupted address
for a new run. The approach assumes a heap address and
uses that "mallog" (and friends) request large (which means hopefully few)
memory blocks from the OS which are then split into smaller blocks internally.
You can then catalog the large memory block requests and determine
the index of the block and the corrupted offset. In a following run,
you determine the new base address of this block and apply the
same offset to find the corrupted address.

Of cause, this assumes that your application is totally deterministic
(apart from maybe ASLR).




More information about the Python-list mailing list