[Tutor] subtyping builtin type
Steven D'Aprano
steve at pearwood.info
Thu Jan 2 03:21:25 CET 2014
On Wed, Jan 01, 2014 at 02:49:17PM +0100, spir wrote:
> On 01/01/2014 01:26 AM, Steven D'Aprano wrote:
> >On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
[...]
> I take the opportunity to add a few features, but would do
> without Source altogether if it were not for 'i'.
> The reason is: it is for parsing library, or hand-made parsers. Every
> matching func, representing a pattern (or "rule"), advances in source
> whenever mathc is ok, right? Thus in addition to return the form (of what
> was matched), they must return the new match index:
> return (form, i)
The usual way to do this is to make the matching index an attribute of
the parser, not the text being parsed. In OOP code, you make the parser
an object:
class Parser:
def __init__(self, source):
self.current_position = 0 # Always start at the beginning
self.source = source
def parse(self):
...
parser = Parser("some text to be parsed")
for token in parser.parse():
handle(token)
The index is not an attribute of the source text, because the source
text doesn't care about the index. Only the parser cares about the
index, so it should be the responsibility of the parser to manage.
> Symmetrically, every match func using another (meaning nearly all) receive
> this pair. (Less annoyingly, every math func also takes i as input, in
> addition to the src str.) (There are also a handful of other annoying
> points, consequences of those ones.)
The match functions are a property of the parser, not the source text.
So they should be methods on a Parser object. Since they need to track
the index (or indexes), the index ought to be an attribute on the
parser, not the source text.
> If I have a string that stores its index, all of this mess is gone.
What you are describing is covered by Martin Fowler's book
"Refactoring". He describes the problem:
A field is, or will be, used by another class more than the
class on which it is defined.
and the solution is to move the field from that class to the class where
it is actually used.
("Refactoring - Ruby Edition", by Jay Fields, Shane Harvie and Martin
Fowler.)
Having a class (in your case, Source) carry around state which is only
used by *other functions* is a code-smell. That means that Source is
responsible for things it has no need of. That's poor design.
By making the parser a class, instead of a bunch of functions, they can
share state -- the *parser state*. That state includes:
- the text being parsed;
- the tokens that can be found; and
- the position in the text.
The caller can create as many parsers as they need:
parse_this = Parser("some text")
parse_that = Parser("different text")
without them interfering, and then run the parsers independently of each
other. The implementer, that is you, can change the algorithm used by
the Parser without the caller needing to know. With your current design,
you start with this:
# caller is responsible for tracking the index
source = Source("some text")
assert source.i = 0
parse(source)
What happens if next month you decide to change the parsing algorithm?
Now it needs not one index, but two. You change the parse() function,
but the caller's code breaks because Source only has one index. You
can't change Source, because other parts of the code are relying on
Source having exactly a single index. So you have to introduce *two* new
pieces of code, and the caller has to make two changes::
source = SourceWithTwoIndexes("some text")
assert source.i = 0 and source.j = -1
improved_parse(source)
Instead, if the parser is responsible for tracking it's own data (the
index, or indexes), then the caller doesn't need to care if the parsing
algorithm changes. The internal details of the parser are irrelevant to
the caller. This is a good thing!
parser = Parse("some text")
parser.parse()
With this design, if you change the internal details of the parser, the
caller doesn't need to change a thing. They get the improved parser for
free.
Since the parser tracks both the source text and the index, it doesn't
need to worry that the Source object might change the index.
With your design, the index is part of the source text. That means that
the source text is free to change the index at any time. But it can't do
that, since there might be a parser in the middle of processing it. So
the Source class has to carry around data that it isn't free to use.
This is the opposite of encapsulation. It means that the Source object
and the parsing code are tightly coupled. The Source object has no way
of knowing whether it is being parsed or not, but has to carry around
this dead weight, an unused (unused by Source) field, and avoid using it
for any reason, *just in case* it is being used by a parser. This is the
very opposite of how OOP is supposed to work.
> It
> makes for clean and simple interfaces everywhere. Also (one of the
> consequences) I can directly provide match funcs to the user, instead of
> having to wrap them inside a func which only utility is to hide the
> additional index (in both input & output).
I don't quite understand what you mean here.
--
Steven
More information about the Tutor
mailing list