[Tutor] [Re: class/type methods/functions]

Sat Nov 1 17:21:58 CET 2008

Thank you for this relevant & precise review, Albert. I will answer 
specific topic, then give a  overall introduction of the problem(s) 
adressed by this project, that may clarify a bit some topics.

<this is quite a long post>

A.T.Hofkamp a écrit :

 > However, by moving the 'type' information to a seperate object, your 
classes become less cluttered, and the problem (and thus the solution) 
becomes easier to understand/program.
 > You have a Factory class + factory object (usually 1) that represents 
the real factory, and you have a Product class + many prduct objects for 
the products it makes.
 >
 > Each product object has properties and functions (ie its color and 
how you can use it).
 > Meta information, like what kind of products exist and how to 
make/transform them however is *not* part of the product object but 
belongs in the factory object.
 > (Look at real life, you can use a car to get from A to B and back, 
but a car cannot tell you what kinds of other cars exist, or how to 
install eg a radio.)
 > In your application, that would mean that the things you call 
'instance' above are part of eg the 'format' object, and things you name 
'type' above, are part of the 'formatfactory' (or 'formattype') object.
 >
 > The difference between your approach and the factory pattern imho is 
that the 'type' information of an object is not stored in its class, but 
in another object (of class Factory).
[...]

I think I get it better now.
FactoryCLass ==> factory
                    |
                 "models"
                    |
                    v
               ObjectClass ==> objects

Couldn't the factory actually be the same object as ObjectClass?
Isn't this in a way similar to the use of meta-classes (that I have 
never used, even never explored)?
I think I understand the point in this pattern, mainly for 
clarification. In the case of my projet, it could be used to implement 
the flexibility of the types/classes which depend on config parameters, 
and these parameters may change at runtime. However, this seems to me 
overload / over structuration / over abstraction. I don't make a heavy 
use of type/class attributes, and they rather clearly appear as what 
they are/mean.
(See also below for further clarification.)

 >> class Format(Symbol):
 >>   ''' block formats symbol '''
 >>   # import relevant config data for this symbol type
 >>   from_codes = config.formats.from_codes
 >>   to_codes = config.formats.to_codes
 >>   names = config.formats.names
 >>   # open_format to be checked with closing tag (xhtml only)
 >>   open_format = None
 >>   def __init__(self, source, mark, section_level, posV, posH, close 
= False,
 >>                   list_type = None, list_level=0):
 >>       # posV & posH ~ line & element numbers for error output
 >>       # 'mark' can be a wiki code or an xhfml tag
 >>       self.section_level = section_level        # = current title level
 >>       self.indent = section_level * TAB         # for nicer xhtml output
 >
 > As a side-note:
 > self.indent can be computed from self.section_level. In general, it 
is better in such cases not to store the computable attribute, but 
instead to compute whenever you need it.
 > This strategy prevents that you get inconsistencies in your data (ie 
suppose that self.section_level == 1 and self.indent == TAB + TAB at 
some point in the program, which of both is then the true value?)

Right! I let down self.indent. In my case, such an unexpected change 
could not happen, but I agree upon the better practice.

 >
 >>       self.close = close                        # for automatic 
closing (wiki only)
 >>       self.list_level = list_level              # nesting level
 >>       self.list_type = list_type                # bullet or number
 >>       self.error = False                        # flag used for no 
output
 >
 > As a side-note:
 > you may want to consider using doc strings for documenting your class 
and instance variables, see eg pydoc or epydoc.

Right again! I will move argument and attribute description to doc strings.

 > A variable like "section_level" leads me to believe that Format class 
is a part of a page.

This is true. All (symbol) types, of which Format is a sample, represent 
document elements/language features. There is a type for plain text 
content, one for (block) format, one for additional 'aspect' (e.g. 
strong or a custom span class), one for links, etc... See below.
Section_level is used for writing into wiki, because wiki codes for 
headers are usually of the form "===", where the number of chars means 
the section level. Additionally, it lets me write xhtml docs in a nicer 
form, where indentation represents the logical structure, similar to 
python code:
<h1> section 1 </h1>
some text
    <h2> section 1.1 </h2>
    some text
    <h2> section 1.2</h2>
    some text

By the way, I answer here your question about the meaning of "Format": 
after your remark, I called it back BlockFormat. A BlockFormat instance 
represents a block formatting mark. For instance:
table_cell  : xhtml <td> <--> wiki |
list_item   : xhtml <li> <--> wiki * or #
paragraph   : xhtml <p>  <--> wiki (implicit)
I first considered making a symbol type for each kind of block format. 
But I finally merged them all into BlockFormat, because the only major 
difference is their class (=CSS class), that becomes an instance attribute.

 >
 >>       # read & record symbol data
 >>       if source == WIKI:
 >>           self.from_wiki(mark, posV, posH)
 >>           return
 >>       if source == XHTML:
 >>           self.from_xhtml(mark, posV, posH)
 >
 > This looks like a (factory!) class hierarchy you currently don't 
have. In OOP I would expect something like
 >
 > self.load(...)    # 'self.from()' is not possible, since 'from' is a 
reserved word in Python
 >
 > and the underlying object type of 'self' would decide the conversion 
you actually perform.

In a sense, yes. I first considered creating a version of each type for 
each language. E.g. WikiBlockFormat &  HTMLBlockFormat. Then I merged 
them because they actually represent the same thing: a common language 
feature. The type holds r/w methods for both languages. I actually find 
this clearer and more consistent than having specific types for each 
language, all having the same sense ("block format") & holding the same 
data (class, open/close status, additional sub-kind & nesting level for 
list items). Right? Or Have misunderstood?

 > In the factory pattern, you'd have a formattype object (of class 
FormatType) that holds the global (config?) information, and each Format 
object may hold a reference to formattype (if you want).

Yes. The reference would mainly be used to access config data, in order 
to check the source text validity and to write (back) to a specific 
language.

 >> Now, imagine the source is a wiki text, and the user wishes to 
output into a different wiki language. At some point between reading & 
writing will the config been rebuilt and, as a consequence, data held by 
each Symbol (sub-)type. As this is an action of the type, I wish it to 
be performed by the type. So:
 >
 > That's one option. Another option may be to make the page object 
language-agnostic (ie you have a 'list', a 'picture' and many more page 
elements, but not attached to a specific language.)

Exactly. You reach here the core of the model. The result of parsing is 
fully "language agnostic". An object I called tortue (turtle) ;-) parses 
the source document; it's a kind of state machine, as it needs to 'know' 
its position (e.g. start of a block) and 'remember' things like open 
tags. The result simply is a list of symbols which builds an abstract 
representation of the parsed text. Right? These symbols are independent 
of the input language -- actually it is *the* point; and are able to 
further 'express' themselves into any know language.
Presently the turtle can only parse wiki and a third format (table, see 
below), not xhtml; but xhtml is simpler to parse (more explicit & 
regular) because less human-oriented.

[Side note: Each symbol type is implemented as a sub-type of a 
super-type called Symbol. Symbol presently is of nearly no use, but who 
knows? Now, thank to your explanations, I tend to see it as a symbol 
type factory! It could even hold the whole configuration, instead of 
each symbol holding its relevant part.]

 > Also, you have a number of language objects that know how to convert 
the page to their language.
 > A few class definitions to make it a bit clearer (hopefully):
 >
 > class Language(object):
 >     """
 >     Base class containing symbols/tags of any language used in your 
application. It also states what operations you can do, and has common code.
 >     """
 >     def __init__(self):
 >         # Setup common attributes (valid for all languages)
 >
 >     def load_page(self, ....):
 >         """ Load a page """
 >         raise NotImplementedError("Implement me in a derived class")
 >
 > Language class is not really needed, but very nice to make clear what 
operations you >can do with it.
 >
 > class WikiLanguage(Language):
 >     """
 >     Properties and symbols/tags that exist in the wiki language
 >     """
 >     def __init__(self):
 >         Language.__init__()
 >         self.list_tag = '*'
 >
 >     def load_page(self, ....):
 >         # load page in wiki format, and return a Page object
 >
 > class XHtmlLanguage(Language):
 >     """
 >     Properties and symbols/tags that exist in the xhtml language
 >     """
 >     def __init__(self):
 >         Language.__init__()
 >         self.list_tag = 'li'
 >
 >     def load_page(self, ....):
 >         # load page in xhtml format, and return a Page object
 >
 > For each language class that you have, you make a object (probably 1 
for each language that you have). It describes what the language 
contains and how it performs its functions.
 > A Language object also knows how to load/save a page in its language, 
how to render it, etc.

Well, I understand your point of view. However, this is where I rather 
disagree. Additional information is probably needed for a constructive 
exchange, now. You will find some below.

 > class Page(object):
 >     """
 >     A page of text
 >     """
 >     def __init__(self, lang):
 >     self.symbols = [] # Contents of the page, list or tree of Element's

Exactly. Actually, you could replace 'Page' with Turtle (I see it as a 
dynamic thing) and add this methods:
    def symbolise(wiki_doc):
        <parse & save into self.symbols>
    def symbolise(html_doc):
        <to be done>

 > class Element(object):
 >     """
 >     A concept that exists at a page (list, text, paragraph, picture, etc)
 >     """

= Symbol -- except that symbol is not language specific

 > class ListElement(Element):
 >     ....
 >
 > class TextElement(Element):
 >     .....

= Symbol sub-types -- ditto

 > A page is simply a tree or a list of Elements. Since a page here is 
language-agnostic, it doesn't even need to know its language.
 > (don't know whether this would work for you).

Perfectly well. I first started buil a tree-like model of a page. Then 
swithed to a simple list (that better matches both wiki and xhtml 
expression -- actually a series of token, with block highest level of 
structure -- the rest beeing implicit). Maybe I go back to tree model 
later, just as an additional tool. May alse be used for semantic parsing?

 > Hope it makes some sense,

I'm rather impressed how clearly you're able to dive into a (for me 
rather complex) problem, without even knowing what kind of need it is 
supposed to meet. So if you like to know a bit more, I will here start 
from the start.

You may have a look at www.creole.org. Creole is an attempt to create an 
interwiki standard language, in order to allow users of several wiki 
engines (& languages) to contribute on 'foreign' wiki sites that use 
another format.
I have had for a long time the idea that it is indeed possible to allow 
(programming) language customization. Implemented through an editor 
configuration layer, this allows both respect of a common standard for 
code sharing, and comfortable personal use (Gemütlichkeit). Right? This 
is similar to syntax highlighting or indent preferences. The /saving/ 
form is not touched (obviously, semantics neither).
[Note that 95% of the programmers seem not to reach this point, as they 
argue that this would launch millions of weird versions of their beloved 
language into the wild.]
[note: I plan to extend a python editor to allow this. I long for the 
day when I can get rid of the ':' at end of headers, use ':' for 
assignment, endly use '=' for "equals" -- among lot's of other (so 
important for me) details.]
This applies for wiki of course. What I propose. A subset of xhtml, 
matching common wiki lang feature, can be used as 
saving/standard/exchange format. The present project, as an amateur work 
basically done for pleasure, was first an attempt to build a 
demonstration of how this may actually work. Now, it has become a bit more.

A list of requisites:
* use of several wiki lang configuration
* further customization (i.e. def of differences only)
* inner abstract represention (currently beeing refactored)
* r/w to/from presently configured wiki lang
* use an xhtml subset as saving format, thus
* r/w to from xhtml (write ok, read planned)
* r/w to/from table format (ok)
The last format is used for debug; but it's also a kind of template or 
bridge for DB r/w.

I had never thought at implementing whole language specifications as you 
propose above. This is actually an option. But first, here is how it 
works up to now:
Each symbol type started as a kind of description of a common wiki 
language feature, for instance a token that introduces a section title:

type "title", level 3: creole ===  <--> xhtml <h3>...</h3>
According to this pattern, both following source text snippets
|I'm a **pride** cell
<te>I'm a <strong>pride</strong> cell</te>
will be innerly represented as the same list of symbols. Which output in 
table format is:
BlockFormat table_cell  open
Text    I'm a
SegmentAspect   important   open
Text    pride
SegmentAspect   important   close
Text     cell
BlockFormat table_cell  close
Conversely, this symbol list can be output in either form.

Now, as you clearly explain above, in order to implement language 
"super_objects", I will still need to build classes for each type of 
symbol. For instance a Link type, inside the XHTML_language object. Now, 
I need nearly the same object, with the same semantics and same held 
data inside the wiki_language object, and also inside the table_format 
object. Correct? So why not just add r/w methods for each language 
inside a single, common, symbol type?

Now, why the idea of language object didn't jump into my brain is 
probably because it is too hard for me! Anyway, I see several 
uses/advantages for it:
* for wiki, hold the current config
* hold language specific r/w rules (e.g. the <...>, presently a tool 
function does it)
* specify syntactic rules
This is the hard part for me! For instance,the wiki config is presently  
nearly only about lexik (ie choice of codes), only some syntactic 
details can be set -- eg whether an aspect code must be closed. The real 
syntax rules are hidden, actually implicit inside the r/w methods of 
each language and turtle's symbolisation method. Samples of such rule:
* wiki: block format codes a single characters, lie at the start of 
block, are not closed
* html : segment aspect take either a <xxx>...</xxx> or a <span 
class="xxx""> </span> form.
* both list nesting and list mixing are available
Now, if I could specify such rules into a set of parameters, then I 
could write a general parsing (symbolisation) method for turtle, that 
takes this rule set as parameter. Idem for each symbol type read & write 
method. Now, this seems much too difficult for programming talent: the 
way I see, it tends to a parser generator.

 > Albert

Denis