Seeking feedback on parsing method.
Matt Gushee
mgushee at havenrock.com
Wed Jan 5 12:31:43 EST 2000
Greetings, everybody--
I've written a class for handling CSS stylesheets. The whole module is
included at the end of this message, and you're welcome to use it,
hack it, comment on it, whatever. It has a couple of features that I
think are kinda cool (yes, they're documented). But I'd particularly
like to know what you all think of my parsing method.
I wrote the parser from scratch rather than using an external parsing
library because I wanted something that could be easily included in a
package for public distribution and wouldn't require the user to
download extra stuff. However, I don't have a whole lot of experience
with parsers, so there's probably a lot of room for improvement. I'm
also not positive I've accounted for all features of CSS
syntax. Anyway, here's the parsing code:
## These regexps are defined in the class namespace
## capture a specification group -- e.g., a group of style
## properties pertaining to a particular context
specgroup = re.compile(r"([^{}]+){([^{}]+)}")
## weed out supposed property definitions that are empty
## or missing the required colon.
bogus = re.compile(r"^[^:]*$")
def _parse(self, cssdata):
propdict = {}
while cssdata:
spec = self.specgroup.match(cssdata)
if spec is None:
break
selectors, stylegroup = spec.groups()
## Parse the styles first. That way, if there are
## multiple identifiers, we don't have to redo
## the styles for each one.
for style in string.split(stylegroup, ';'):
if self.bogus.match(style):
break
style = string.split(style, ':')
prop = string.strip(style[0])
val = string.split(style[1], ',')
if len(val) < 2:
propdict[prop] = string.strip(val[0])
else:
propdict[prop] = []
for item in val:
propdict[prop].append(string.strip(item))
for sel in string.split(selectors, ','):
sel = string.strip(sel)
## Hierarchical selector, e.g. .Body P
hiersel = string.split(sel)
if len(hiersel) > 1:
sel = tuple(hiersel)
if not self.has_key(sel):
self[sel] = {}
for prop, val in propdict.items():
if not self[sel].has_key(prop):
self[sel][prop] = val
cssdata = cssdata[spec.end():]
And the module:
------------- cut below this line -----------------------------------
#! /usr/bin/env python
"""
css.py
-----------
Provides a CSS class, which reads in a css specification in
file or string form and wraps it in a dictionary-like object.
See the class documentation for more info.
Copyright 2000 by Matthew C. Gushee
The author hereby grants permission to use, redistribute, or
modify this program for any purpose, on condition that the
above copyright notice and this statement be retained without
alteration. This program carries no warranty of any kind.
Please send comments and questions to <mgushee at havenrock.com>.
"""
import string, re, copy, os
from UserDict import UserDict
class CSS(UserDict):
"""
A dictionary-like wrapper for CSS style specifications.
Usage:
mycss = CSS('./mystyle.css')
mycss = CSS(css_data)
Instances may be created based on CSS files, or on data
in CSS or Python dictionary form. See __init__ method
documentation for details.
Style data is stored as a dictionary in the form:
{<selector>: {<prop-name>: [<prop-value>, ...], ...}, ...}
Each key of the dictionary is a single CSS selector, which may
be either an identifier in string form (i.e., an element type
such as 'h1,' 'table,' etc., a class identifier such as '.Menu',
or a unique ID such as '#my-dog-photo') or a tuple of such
identifiers. The value of each item is also a dictionary, its
keys being things like 'font-family', 'margin-l', and so on;
and its values being either strings or lists of strings.
Note that if a stylesheet contains a group of properties with
multiple selectors, the CSS instance will have one key for each
of these selectors. E.g.,
.MainMenu, .WhatsNew, H5 { spam: eggs }
... becomes
{'.MainMenu': {'spam': 'eggs'}, '.WhatsNew', {'spam': 'eggs'},
H5 {'spam': 'eggs'}}
Hierarchical selectors, on the other hand, are parsed into tuples.
For example:
.BodyText A { spam: eggs }
... becomes
{('.BodyText', 'A'): {'spam': 'eggs'}}
In this documentation, I use the term 'context' to refer to a
single or hierarchical selector *in the form* used by the CSS
class methods -- a string like 'BODY', or a sequence like
('TABLE', 'A'), or ['.Menu', 'UL']. Although the methods are
currently written to accept context arguments in list form, I
may at some point decide to restrict them to tuples.
Public methods:
getspec Return style data for a particular context.
setspec Define style data for a particular context.
merge Assimilate data from another CSS instance.
remove Remove specified data from the current instance.
Additionally, the CSS class overloads the + and - operators to
provide a convenient means of adding and subtracting data, e.g.:
this_style = this_style + that_style
(that_style may be a CSS class instance or a dictionary in the
same form as CSS.data) If the two objects have any keys in common,
the values in the right-hand object take precedence. Thus, the
above operation may overwrite data in this_style. If you want to
preserve all existing data while adding new data, simply reverse
the order:
this_style = that_style + this_style
The following removes all data contained in that_style from
this_style:
this_style = this_style - that_style
You can even empty all the data by doing this:
this_style = this_style - this_style
The current version of this class will probably fail to parse
most stylesheets containing syntax errors. It makes no attempt
to validate the stylesheet, however, so any data following CSS
syntax will work.
"""
## capture a specification group -- e.g., a group of style
## properties pertaining to a particular context
specgroup = re.compile(r"([^{}]+){([^{}]+)}")
## weed out supposed property definitions that are empty
## or missing the required colon.
bogus = re.compile(r"^[^:]*$")
def __init__(self, datasrc, defaultcontext=None):
"""
Arguments:
datasrc -- The data source for this instance. May be given in
any of three forms: (1) the name of a CSS file, (2) a string
containing style data in CSS syntax, or (3) a dictionary in
the same form as self.data
defaultcontext -- (OPTIONAL) The default context from which
others inherit properties. If your stylesheet is for an
ordinary web page, for example, you might use "BODY" as a
default context. Must be present in the data source.
"""
UserDict.__init__(self)
if type(datasrc) is type({}):
self.data = datasrc
else:
if os.path.isfile(datasrc):
cssdata = self._readin(datasrc)
elif type(datasrc) is type(''):
cssdata = datasrc
else:
raise RuntimeError, 'Invalid data type: %s' % type(datasrc)
self._parse(cssdata)
self.defaultcontext = defaultcontext
def __add__(self, other):
if ((type(other) is type(self) and
other.__class__ is self.__class__) or
type(other) is type({})):
return CSS(self.merge(other))
elif os.path.isfile(other) or type(other) is type({}):
temp = CSS(other)
newcss = CSS(self.merge(temp))
del(temp)
return newcss
else:
raise RuntimeError, 'Invalid data type: %s' % type(other)
def __radd__(self, other):
if os.path.isfile(other) or type(other) is type({}):
temp = CSS(other)
newcss = CSS(self.merge(temp, 1))
del(temp)
return newcss
else:
raise RuntimeError, 'Invalid data type: %s' % type(other)
def __sub__(self, other):
if ((type(other) is type(self) and
other.__class__ is self.__class__) or
type(other) is type({})):
return CSS(self.remove(other))
elif os.path.isfile(other) or type(other) is type({}):
temp = CSS(other)
newcss = CSS(self.remove(temp))
del(temp)
return newcss
else:
raise RuntimeError, 'Invalid data type: %s' % type(other)
def __rsub__(self, other):
if os.path.isfile(other) or type(other) is type({}):
temp = CSS(other)
newcss = CSS(self.remove(temp, 1))
del(temp)
return newcss
else:
raise RuntimeError, 'Invalid data type: %s' % type(other)
def _error(self, strict, errmsg=''):
if strict:
raise RuntimeError, errmsg
else:
return 0
def _readin(self, file):
try:
f = open(file, 'r')
except IOError:
print 'Unable to read file %s' % file
css = f.read()
f.close()
return css
def _parse(self, cssdata):
propdict = {}
while cssdata:
spec = self.specgroup.match(cssdata)
if spec is None:
break
selectors, stylegroup = spec.groups()
## Parse the styles first. That way, if there are
## multiple identifiers, we don't have to redo
## the styles for each one.
for style in string.split(stylegroup, ';'):
if self.bogus.match(style):
break
style = string.split(style, ':')
prop = string.strip(style[0])
val = string.split(style[1], ',')
if len(val) < 2:
propdict[prop] = string.strip(val[0])
else:
propdict[prop] = []
for item in val:
propdict[prop].append(string.strip(item))
for sel in string.split(selectors, ','):
sel = string.strip(sel)
## Hierarchical selector, e.g. .Body P
hiersel = string.split(sel)
if len(hiersel) > 1:
sel = tuple(hiersel)
if not self.has_key(sel):
self[sel] = {}
for prop, val in propdict.items():
if not self[sel].has_key(prop):
self[sel][prop] = val
cssdata = cssdata[spec.end():]
def getspec(self, context=None, strict=0, inherit=1,
usecurrent=[], set={}):
"""
Return dictionary of style properties for a given context.
Arguments (all optional):
context -- (A dictionary key representing the context where
this style spec will apply). If omitted, self.defaultcontext
will be used.
strict -- (boolean) If true, any properties having values of
None, and any context names that do not exist in self.data,
will cause errors. If false, values of None will be returned,
and non-existent context names will be silently ignored.
inherit -- (boolean) If true, the returned data will include
inherited properties. If false, only properties explicitly
defined for this context will be returned. If true and
'usecurrent' is empty , all properties applying to this
context, whether explicitly defined or inherited, will be
returned.
usecurrent -- (list of property names) If any names are
listed, only the listed properties will be returned, using
inherited values for those not explicitly defined in this
context.
set -- (dictionary of property names and values) Sets
properties to the specified values. This argument overrides
existing values. If a 'set' argument is supplied, 'inherit'
is false, and 'usecurrent' is omitted, the method will simply
return the value of 'set'.
"""
spec = {}
base = self.defaultcontext or ''
if type(context) is type(''):
context = [context]
elif type(context) is type(()):
if self.data.has_key(context):
context = list(context) + [context]
else:
context = list(context)
if inherit:
if base and context[0] != base:
context.insert(0, base)
else:
context = context[-1]
if usecurrent or not inherit:
inheritall = 0
else:
inheritall = 1
context = context or base
if usecurrent:
## Work down from current context to default
context.reverse()
found = []
for prop in usecurrent:
for layer in context:
if self.data.has_key(layer):
propdict = self.data[layer]
if propdict.has_key(prop):
spec[prop] = propdict[prop]
found.append(prop)
break
elif strict:
raise RuntimeError, "Invalid selector: '%s'." % layer
for prop in usecurrent:
if prop not in found:
spec[prop] = None
if inheritall:
## Work up from default context
for layer in context:
if self.data.has_key(layer):
for prop in self.data[layer].keys():
spec[prop] = self.data[layer][prop]
elif strict:
raise RuntimeError, "Invalid selector: '%s'." % layer
if set:
for prop in set.keys():
spec[prop] = set[prop]
if strict:
for prop in spec.keys():
if spec[prop] is None:
raise RuntimeError, "Property not found: '%s'." % prop
return spec
def setspec(self, context=None, strict=0, inherit=1,
keeponly=[], set={}):
"""
Define style properties for a particular context.
[ DON'T USE THIS YET! Something is screwy in the way this
method calls self.getspec(). To be fixed soon. ]
Arguments (all optional):
context -- (A dictionary key representing the context where
this style spec will apply). The key may, but need not, be
present in the instance data. If omitted, self.defaultcontext
will be used.
strict -- (boolean) See getspec documentation.
inherit -- (boolean) If true, any existing properties applying
to the given context, but not specified in 'set', will be
retained, including inherited properties. If false, the new
settings will include only the properties explicitly defined
for the context.
keeponly -- (list of property names) If empty, all existing
properties will be retained. If any property names are listed,
only the properties listed here or in 'set' will be kept; all
others will be removed.
set -- (dictionary of property names and values) Sets
properties to the specified values. This argument overrides
inherited values.
"""
if context is None:
try:
## MakeError is undefined, so raises exception
context = self.defaultcontext | MakeError
except:
raise RuntimeError, 'No context for style spec.'
if not self.data.has_key(context):
self.data[context] = {}
if keeponly:
newspec = self.getspec(context, strict,
usecurrent=keeponly, set=set)
elif inherit:
newspec = self.getspec(context, strict, set=set)
elif set:
newspec = set
else:
newspec = self.getspec(context)
for prop in set.keys():
self.data[context] = newspec
def merge(self, cssobj, selfish=0):
"""
Assimilate data from another CSS instance.
Arguments:
cssobj -- a CSS class instance
selfish -- (optional, boolean) In case of conflicts, this
argument specifies whether new or existing data takes
precedence. If false, new data (specified in 'cssobj')
will take precedence; if true, all existing data will
be preserved.
"""
if selfish:
## Dunno why I was getting errors with this ...
## result = copy.deepcopy(cssobj)
result = {}
for k in cssobj.keys():
result[k] = copy.copy(cssobj[k])
newdata = self.data
else:
## result = copy.deepcopy(self.data)
result = {}
for k in self.data.keys():
result[k] = copy.copy(self.data[k])
newdata = cssobj
for k in newdata.keys():
if result.has_key(k):
for l in newdata[k].keys():
result[k][l] = newdata[k][l]
else:
result[k] = newdata[k]
return result
def remove(self, cssobj, selfish=0):
"""
Remove specified data from the current instance.
cssobj -- a CSS class instance
selfish -- (optional, boolean) If true, the contents
of self.data will be removed from cssobj.data. If false,
the contents of cssobj.data will be removed from
self.data.
"""
if selfish:
result = copy.deepcopy(cssobj)
rmdata = self.data
else:
result = copy.deepcopy(self.data)
rmdata = cssobj
for k in rmdata.keys():
if result.has_key(k):
for m in rmdata[k].keys():
if result[k].has_key(m):
del result[k][m]
if not result[k]:
del result[k]
return result
if __name__ == '__main__':
import sys, os
cssfile = raw_input('What file would you like to parse?\n> ')
cssfile = string.strip(cssfile)
if string.find(cssfile, '~/') == 0:
try:
cssfile = os.path.join(os.environ['HOME'], cssfile[2:])
except:
print """This system appears not to support filenames beginning
with '~'. Please try again using the full path.
"""
sys.exit()
try:
mycss = CSS(cssfile)
print """'%s' successfully parsed. Data is as follows:
%s
""" % (cssfile, mycss)
except:
print """Failed to parse '%s.' Please check for CSS syntax
errors. If your CSS file is correct, please send a
detailed bug report to <mgushee at havenrock.com>, including
a copy of '%s'.
"""
------------- cut above this line -----------------------------------
--
Matt Gushee
Portland, Maine, USA
mgushee at havenrock.com
http://www.havenrock.com/
More information about the Python-list
mailing list