How to replace an instance method?

Wed Sep 21 12:41:24 EDT 2022

Another possibility would be to use a lambda function or a callable
object. This adds an overhead but would also allow you to inject new
parameters that go into the function call. It also does not require
any extra import.

obj.old_method_name = lambda *a, **kw: new_method_name(obj, *a, **kw)

A full example goes like this:

    class C:

        def __init__(self):
            self.value = 21

        def get(self):
            return self.value

    def new_get(self):
        return self.value * 2

    obj = C()
    print(obj.get())
    obj.get = lambda *a, **kw: new_get(obj, *a, **kw)
    print(obj.get())

This would first output 21 and then 42.

--

What you are trying to do requires more than just replacing the
function _convert_cell. By default, OpenpyxlReader loads the workbook
in read_only mode, discarding all links. This means that the cell
object present in _convert_cell has no hyperlink attribute. There is
no option to make it load the links. To force it to be loaded, we need
to replace load_workbook as well. This method asks openpyxl to load
the workbook, deciding whether it will discard the links or not.

The second problem is that as soon as you instantiate an ExcelFile
object it will instantiate an OpenpyxlReader and load the file.
Leaving you with no time to replace the functions. Happily, ExcelFile
gets the engine class from a static dictionary called _engines. This
means that we can extend OpenpyxlReader, overwrite those two methods
and replace the reference in ExcelFile._engines. The full source is:

    import pandas as pd

    class MyOpenpyxlReader(pd.ExcelFile.OpenpyxlReader):

        def load_workbook(self, filepath_or_buffer):
            from openpyxl import load_workbook
            return load_workbook(
                filepath_or_buffer,
                read_only=False,
                data_only=False,
                keep_links=True
            )

        def _convert_cell(self, cell, convert_float: bool):
            value = super()._convert_cell(cell, convert_float)
            if cell.hyperlink is None:
                return value
            else:
                return (value, cell.hyperlink.target)

    pd.ExcelFile._engines["openpyxl"] = MyOpenpyxlReader
    df = pd.read_excel("links.xlsx")
    print(df)

The source above worked on python 3.8.10, pandas 1.5.0, and openpyxl
3.0.10. The output for a sample xlsx file with the columns id, a page
name (with links), and the last access is shown next. The first
element in the second column's output tuple is the cell's text and the
second element is the cell's link:

        id
  page   last access
    0   1                 (google, https://www.google.com/)  2022-04-12
    1   2                              (gmail, https://gmail.com/)  2022-02-06
    2   3          (maps, https://www.google.com/maps)  2022-02-17
    3   4                                  (bbc, https://bbc.co.uk/)  2022-08-30
    4   5                     (reddit, https://www.reddit.com/)  2022-12-02
    5   6    (stackoverflow, https://stackoverflow.com/)  2022-05-25

--

Should you do any of this? No.

1. What makes a good developer is his ability to create clear and
maintainable code. Any of these options are clearly not clear,
increase cognitive complexity, and reduce reliability.
2. We are manipulating internal class attributes and internal methods
(those starting with _). Internal elements are not guaranteed to stay
there over different versions, even minor updates. You should not
manipulate them unless you are working on a fixed library version,
like implementing tests and checking if the internal state has
changed, hacking it, or debugging. Python assumes you will access
these attributes wisely.
3. If you are working with other developers and you commit this code
there is a huge chance another developer is using a slightly different
pandas version that misses one of these elements. You will break the
build, your team will complain and start thinking you are a naive
developer.
4. Even if you adapt your code for multiple pandas versions you will
end up with multiple ifs and different implementations. You don't want
to maintain this over time.
5. It clearly takes more time to understand pandas' internals than
writing your reader using openpyxl. It is not cumbersome, and if it
changes the execution time from 20ms to 40ms but is much more reliable
and maintainable we surely prefer the latter.

The only scenario I see in which this would be acceptable is when you
or your boss have an important presentation in the next hour, and you
need a quick fix to make it work in order to demonstrate it. After the
presentation is over and people have validated the functionality you
should properly implement it.

Keep It Simple and Stupid (KISS)

-- 
Diego Souza
Wespa Intelligent Systems
Rio de Janeiro - Brasil

On Mon, Sep 19, 2022 at 1:00 PM <python-list-request at python.org> wrote:
>
>
> From: "Weatherby,Gerard" <gweatherby at uchc.edu>
> Date: Mon, 19 Sep 2022 13:06:42 +0000
> Subject: Re: How to replace an instance method?
> Just subclass and override whatever method you wish to modify
> “Private” is conceptual. Mostly it means when the next version of a module comes out, code that you wrote that accesses *._ parts of the module might break.
> ___
>
>
> import pandas
>
>
> class MyClass(pandas.ExcelFile.OpenpyxlReader):
>
>      def _convert_cell(self, cell, convert_float: bool) -> 'Scalar':
>          """override"""
>          # do whatever you want, or call the base class version
>          return super()._convert_cell(cell, convert_float)
>
> —
> Gerard Weatherby | Application Architect NMRbox | NAN | Department of Molecular Biology and Biophysics
>  UConn Health 263 Farmington Avenue, Farmington, CT 06030-6406 uchc.edu
> On Sep 17, 2022, 5:29 PM -0400, Ralf M. <Ralf_M at t-online.de>, wrote:
> *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***
>
> Am 17.09.2022 um 00:35 schrieb Dan Stromberg:
>
>
> On Fri, Sep 16, 2022 at 2:06 PM Ralf M. <Ralf_M at t-online.de
> <mailto:Ralf_M at t-online.de>> wrote:
>
> I would like to replace a method of an instance, but don't know how to
> do it properly.
>
>
> You appear to have a good answer, but... are you sure this is a good idea?
>
> It's definitely a dirty hack.
>
> It'll probably be confusing to future maintainers of this code, and I
> doubt static analyzers will like it either.
>
> I agree that I will have to add sufficient comments for the future
> maintainer, should there ever be one (and even for me to still
> understand it next year). I don't use static analyzers.
>
> I'm not the biggest fan of inheritance you'll ever meet, but maybe this
> is a good place for it?
>
> Using a derived version of the class in question to overwrite the
> method was my first idea, however I don't instantiate the class in
> question myself, it is instantiated during the initialisation of
> another class, so I would at least have to derive a modified version of
> that as well. And that code is rather complex, with metaclasses and
> custom decorators, and I feel uncomfortable messing with that, while
> the method I intend to change is quite simple and straightforward.
>
> In case anybody is interested what I'm trying to achieve:
>
> It's simple in pandas to read an excel file into a dataframe, but only
> the cell value is read. Sometimes I need more / other information, e.g.
> some formatting or the hyperlink in a cell. Reopening the file with
> openpyxl and getting the info is possible, but cumbersome.
> Looking into the pandas code for reading excel files (which uses
> openpyxl internally) I noticed a method (of an internal pandas class)
> that extracts the value from an openpyxl cell. This method is rather
> simple and seems the ideal spot to change to get what I want.
>
> My idea is to instantiate pandas.ExcelFile (official pandas API), get
> the reader instance (an attribute of the ExcelFile object) and modify
> the method of the reader instance.
>
> The fact that the method I change and the ExcelFile attribute containing
> the reader are both private (start with _) doesn't make it any better,
> but I'm desperate enough to be willing to adapt my code to every major
> pandas release, if necessary.
>
> Ralf M.
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mYWFkAugwhU4HgCv9nRg1vSJhyJCA8RApcnyGTRNGQYTTmvVigqANAagTbBwo96YFdHmzfCYU8gN3KpVmcrmOg$