New Python regex Doc (was: Python documentation moronicities)

Xah Lee xah at xahlee.org
Sat May 7 19:28:11 EDT 2005


Let me expose one another fucking incompetent part of Python doc, in
illustration of the Info Tech industry's masturbation and ignorant
nature.

The official Python doc on regex syntax (
http://python.org/doc/2.4/lib/re-syntax.html ) says:

--begin quote--

"|"
A|B, where A and B can be arbitrary REs, creates a regular expression
that will match either A or B. An arbitrary number of REs can be
separated by the "|" in this way. This can be used inside groups (see
below) as well. As the target string is scanned, REs separated by "|"
are tried from left to right. When one pattern completely matches, that
branch is accepted. This means that once A matches, B will not be
tested further, even if it would produce a longer overall match. In
other words, the "|" operator is never greedy. To match a literal "|",
use \|, or enclose it inside a character class, as in [|].

--end quote--

Note: “In other words, the "|" operator is never greedy.”

Note the need to inject the high-brow jargon “greedy” here as a
latch on sentence.

“never greedy”? What is greedy anyway?

“Greedy”, when used in the context of computing, describes a
certain characteristics of algorithms. When a algorithm for a
minimizing/maximizing problem is such that, whenever it faced a choice
it simply chose the shortest path, without considering whether that
choice actually results in a optimal solution.

The rub is that such stratedgy will often not obtain optimal result in
most problems. If you go from New York to San Francisco and always
choose the road most directly facing your destination, you'll never get
on.

For a algorithm to be greedy, it is implied that it faces choices. In
the case of alternatives in regex "regex1|regex2|regex3", there is
really no selection involved, but following a given sequence.

What the writer were thinking when he latched on about greediness, is
that the result may not be from the pattern that matches the most
substring, therefore it is not “greedy”. It's not greedy Python
docer's ass.

Such blind jargon throwing, as found everywhere in tech docs, is a
significant reason why the computing industry is filled with shams the
likes of unix, Perl, Programing Patterns, eXtreme Programing,
“Universal Modeling Language”, fucking shits.

----
A better writen doc for the complete regex module is at:
http://xahlee.org/perl-python/python_re-write/lib/module-re.html

See also: Responsible Software Licensing
http://xahlee.org/UnixResource_dir/writ/responsible_license.html

 Xah
 xah at xahlee.orghttp://xahlee.org/




More information about the Python-list mailing list