[Tutor] subtyping builtin type

Steven D'Aprano steve at pearwood.info
Thu Jan 2 03:21:25 CET 2014


On Wed, Jan 01, 2014 at 02:49:17PM +0100, spir wrote:
> On 01/01/2014 01:26 AM, Steven D'Aprano wrote:
> >On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
[...]
> I take the opportunity to add a few features, but would do 
> without Source altogether if it were not for 'i'.
> The reason is: it is for parsing library, or hand-made parsers. Every 
> matching func, representing a pattern (or "rule"), advances in source 
> whenever mathc is ok, right? Thus in addition to return the form (of what 
> was matched), they must return the new match index:
> 	return (form, i)

The usual way to do this is to make the matching index an attribute of 
the parser, not the text being parsed. In OOP code, you make the parser 
an object:

class Parser:
    def __init__(self, source):
        self.current_position = 0  # Always start at the beginning
        self.source = source
    def parse(self):
        ...

parser = Parser("some text to be parsed")
for token in parser.parse():
    handle(token)

The index is not an attribute of the source text, because the source 
text doesn't care about the index. Only the parser cares about the 
index, so it should be the responsibility of the parser to manage.


> Symmetrically, every match func using another (meaning nearly all) receive 
> this pair. (Less annoyingly, every math func also takes i as input, in 
> addition to the src str.) (There are also a handful of other annoying 
> points, consequences of those ones.)

The match functions are a property of the parser, not the source text. 
So they should be methods on a Parser object. Since they need to track 
the index (or indexes), the index ought to be an attribute on the 
parser, not the source text.


> If I have a string that stores its index, all of this mess is gone.

What you are describing is covered by Martin Fowler's book 
"Refactoring". He describes the problem:

    A field is, or will be, used by another class more than the 
    class on which it is defined.

and the solution is to move the field from that class to the class where 
it is actually used.

("Refactoring - Ruby Edition", by Jay Fields, Shane Harvie and Martin 
Fowler.)

Having a class (in your case, Source) carry around state which is only 
used by *other functions* is a code-smell. That means that Source is 
responsible for things it has no need of. That's poor design.

By making the parser a class, instead of a bunch of functions, they can 
share state -- the *parser state*. That state includes:

- the text being parsed;
- the tokens that can be found; and
- the position in the text.

The caller can create as many parsers as they need:

parse_this = Parser("some text")
parse_that = Parser("different text")

without them interfering, and then run the parsers independently of each 
other. The implementer, that is you, can change the algorithm used by 
the Parser without the caller needing to know. With your current design, 
you start with this:

# caller is responsible for tracking the index
source = Source("some text")
assert source.i = 0
parse(source)


What happens if next month you decide to change the parsing algorithm? 
Now it needs not one index, but two. You change the parse() function, 
but the caller's code breaks because Source only has one index. You 
can't change Source, because other parts of the code are relying on 
Source having exactly a single index. So you have to introduce *two* new 
pieces of code, and the caller has to make two changes::

source = SourceWithTwoIndexes("some text")
assert source.i = 0 and source.j = -1
improved_parse(source)


Instead, if the parser is responsible for tracking it's own data (the 
index, or indexes), then the caller doesn't need to care if the parsing 
algorithm changes. The internal details of the parser are irrelevant to 
the caller. This is a good thing!

parser = Parse("some text")
parser.parse()

With this design, if you change the internal details of the parser, the 
caller doesn't need to change a thing. They get the improved parser for 
free.

Since the parser tracks both the source text and the index, it doesn't 
need to worry that the Source object might change the index. 

With your design, the index is part of the source text. That means that 
the source text is free to change the index at any time. But it can't do 
that, since there might be a parser in the middle of processing it. So 
the Source class has to carry around data that it isn't free to use.

This is the opposite of encapsulation. It means that the Source object 
and the parsing code are tightly coupled. The Source object has no way 
of knowing whether it is being parsed or not, but has to carry around 
this dead weight, an unused (unused by Source) field, and avoid using it 
for any reason, *just in case* it is being used by a parser. This is the 
very opposite of how OOP is supposed to work.


> It 
> makes for clean and simple interfaces everywhere. Also (one of the 
> consequences) I can directly provide match funcs to the user, instead of 
> having to wrap them inside a func which only utility is to hide the 
> additional index (in both input & output).

I don't quite understand what you mean here. 



-- 
Steven


More information about the Tutor mailing list