Design of a URL encoded language to specify sets of files on a WebDAV server

Andrew James drew at gremlinhosting.com
Thu Nov 18 18:10:14 EST 2004


Gentlemen,
I'm currently in the process of designing a language which will be used
to specify sets of files on a WebDAV server, encoded in a URL. The aims
of the language are to (in no particular order):
 * Be concise, aesthetic and easy to type
 * Be as similar as possible to existing query languages
 * Allow for (nested) boolean operations
 * Be cross-platform (so don't include any characters which can't be
used in filenames on *NIX or Win32)

There is a project Wiki which I will release once I've got it into a
semi-stable state.

This is part of a wider project (in Python, of course) which I'm
developing for my degree but which may have more uses. I'd like to draw
on all the experience here and ask some questions so that I can better
shape the implementation to suit the people who may choose to use it. 

I would very much appreciate it if anyone could spare the time to have a
look through the specification and documentation below and let me know
about the following:

 * Whether you think the language is up to the job (and if not, why not)
 * Any additions which you think should be made that will increase
functionality and/or decrease ambiguity
 * Any common pitfalls which you think I might fall into
 * Any bugs in my specification

There will be a more verbose query language (probably SQLXML) also
implemented in the project but one key feature is that users should be
able to search by simply typing something into the 'Goto' box...

Well, that's about all. Please see the specification below and I hope to
be hearing your feedback in the near future.

Regards,
Andrew

MetaFS Path Query Language
==========================

Background
One way of specifying search criteria in MetaFS is by using a dynamic
URL, making it trivial to work with MetaFS from the command line. This
document explains the characteristics and formatting of these queries
and offers some examples to be expanded upon.


Diving In
The query language used by MetaFS is closely related to XPath/LaTeX,
with a few restrictions around reserved characters in filenames. The
basic format of a query is:

{BASE}/fs/catA/catB/catC[criteria]

An example of a simple query based on this format would be:

{BASE}/fs/photos/america/beach[type=jpeg,author='Andrew James']

This query is equivalent to searching for files which are in the
photos,america and beach categories (the intersection), have a jpeg mime
type and whose author is Andrew James.

This type of query should be enough to perform most sorts of simple
searches, but MetaFS also includes some advanced features that can be
accessed with more complicated queries.


Advanced Features
Boolean Logic Engine
MetaFS includes a complete boolean logic engine which allows for both
grouping of terms and the boolean operators and, or and not.

Reserved Operator Characters 

/ 
Boolean AND (categories only)
^ 
Boolean OR (categories only)
~ 
Boolean NOT (categories only)
() 
Term Grouping (categories only)
- 
Less than (criteria only)
+ 
More than (criteria only)
= 
Logical EQUALS (criteria only)
! 
Logical NOT (criteria only)
~ 
Logical CONTAINS (criteria only)
, 
Boolean AND (criteria only)

This allows us to create much more complex queries, such as

{BASE}/fs/photos/~(america^france)

or even

{BASE}/fs/music/mp3[artist~'Jackson',artist!~'Micheal']
Operator Order of Precedence
The operator order of precedence for filesystem queries is as follows:

Categories 
(), ~, ^, / 
Criteria 
',', !, (=, -, +, ~) 
Namespaces
Metadata criteria can not only include the default metadata attributes
which MetaFS assigns to files but also namespaces. These are simply
defined by prefixing the name with a namespace and colon, as in XML. For
example, one could specify criteria like:

[owner=drew,moddate>10/10/2004,ns:bitrate='128']


== Query Language Grammar == 
Below is the initial (and buggy, no doubt) specification for the MetaFS
query language in the TPG form of BNF.


Language Specification
   # Tokens
    separator space '\s+';
    token Num '\d+(.\d+)?';
    token Ident '[a-zA-Z]\w*';
    token CharList '\'.*\'';
    token CatUnOp '~';
    token CatOp '[/\^]';
    token MetaOp '[=\+\-!]';
    token Date '\d\d-\d\d-\d\d\d\d';
    token FileID '(\w+\.\w+)';
    token EmptyLine '^$';
    
    # Rules
    START -> CatExpr ('\[' MetaExpr '\]')?
    | FileID
    | EmptyLine
    ;
    CatExpr -> CatUnOp CatName
    | CatName (CatOp CatExpr)*
    ;
    CatName -> Ident
    | '\(' CatExpr '\)'
    ;
    MetaExpr -> MetaCrit (',' MetaCrit)*
    ;
    MetaCrit -> Ident MetaOp Value
    ;
    Value -> CharList | Num | Date
    ;

Test Queries
The following test queries have been run through the parser to check
whether they are parsed correctly. The version of the language grammar
above does this without errors. As you can see, the grammar allows for
term nesting, unary operators and arbitrarily complex boolean
expressions. In addition to this, file metadata querying is available
for extra filtering.

parseTests = (
    "simple",
    "this/is/a/simple/test",
    "a/test/with/metadata[author='drew',date=10]",
    "music/mp3/~jackson/michael",
    "docs/latex/~(computer^science)",
    "media/video/((comedy/action)^thriller)"
)
-- 
Andrew James <drew at gremlinhosting.com>




More information about the Python-list mailing list