ANN: Shipyard 0.02
Florian Diesch
diesch at spamfence.net
Sun Oct 19 19:07:59 CEST 2008
I'm happy to announce version 0.02 of the Shipyard python module
<http://www.florian-diesch.de/software/shipyard/>
=================
What is shipyard?
=================
Shipyard is a module to process data in a format inspired by email
headers (RFC 2822).
The goal of shipyard is to have a simple, human readable and human writable
replacement for CSV that works better for long data and many rows and
doesn't need difficult escaping rules for special characters.
It's called ``shipyard`` because that word contains ``py`` and doesn't
seem to be taken yet.
===========
File format
===========
Character encoding
==================
A character encoding can be specified similar to :pep:`0263` using::
# -*- coding: <encoding name> -*-
in the first line. ``#`` is replaced with the actual `comment`_ mark.
More precisely, the first line must match the regular
expression::
^#.*coding[:=]\s*([-\w.]+)
Again ``#`` is replaced by the actual `comment`_ mark. The first group
of this expression is then interpreted as encoding name.
Data set
========
A *data set* consists of zero or more `records <#record>`__ separated
by one or more empty lines.
Comment
=======
Lines starting with the *comment mark* (default: ``#``) are
ignored. Comments can be used in or between `records <#record>`__.
Record
======
A *record* consists of one or more `fields <#field>`__
Field
=====
A *field* is a line that has the form::
key: value
*key* is a string that
- doesn't contain a colon
- doesn't start with the `comment`_ mark
- doesn't start with the `continuation`_ mark
*value* is an arbitrary string. It can span multiple line using
`continuation`_ marks.
Continuation
============
If a line starts with the *continuation mark* (default: " " [one blank])
it gets appended to the preceding line, with the
continuation mark removed.
=====
Usage
=====
Obviuosly we need to import shipyard:
>>> import shipyard
First we open the file:
>>> input = open('nobel.sy')
Then we create a parser object:
>>> reader = shipyard.Parser(keep_linebreaks=False,
... keys=['id', 'discipline', 'year',
... 'name', 'country', 'rationale'])
For every record the given keys are initialized with None.
Now we can iterater through the records:
>>> for record in reader.parse(input): # doctest:+ELLIPSIS
... print record['country']
United States
Japan
United States
...
Instead of iterating we may want to get a list of dicts:
>>> input.seek(0)
>>> lod = reader.get_list(input)
>>> print lod # doctest:+ELLIPSIS
[{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]
Sometimes we need a dict of dicts (using the 'id' field as key):
>>> input.seek(0)
>>> dod = reader.get_dict(input, key='id')
>>> print dod.keys()
[u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8']
>>> print dod[u'5'][u'rationale']
for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics
If we don't want dicts we can use the 'factory' parameter:
>>> input.seek(0)
>>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values()))
>>> print los[0]
Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0
Of course a class works as a factory, too:
>>> input.seek(0)
>>> class Laureate(object):
... def __init__(self, id, discipline, year, name, country, rationale):
... self.name = name
>>> doo = reader.get_dict(input, key='id', factory = Laureate)
>>> print doo[u'2'] # doctest:+ELLIPSIS
<Laureate object at ...>
>>> print doo[u'2'].name
Roger Y. Tsien
Now let's write a Shipyard file.
First we create a StringIO (any other file-like object will do, too):
>>> import StringIO
>>> output = StringIO.StringIO()
Next we need a Writer object:
>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')
Now we can use write() to write a single record:
>>> writer.write(output, {'foo': 1, 'bar': 2})
>>> print output.getvalue()
foo: 1
bar: 2
<BLANKLINE>
<BLANKLINE>
Using write_many() we can write a list of records:
>>> output = StringIO.StringIO()
>>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)]
>>> writer.write_many(output, d)
>>> print output.getvalue()
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>
To get a encoding line we use write_coding():
>>> output = StringIO.StringIO()
>>> writer.write_coding(output)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
<BLANKLINE>
Now let's do everything at once using write_full():
>>> output = StringIO.StringIO()
>>> writer.write_full(output, d)
>>> print output.getvalue()
#-*- coding: utf-8 -*-
<BLANKLINE>
foo: 0
bar: 0
<BLANKLINE>
foo: 1
bar: 2
<BLANKLINE>
foo: 2
bar: 4
<BLANKLINE>
<BLANKLINE>
Florian
--
<http://www.florian-diesch.de/>
-----------------------------------------------------------------------
** Hi! I'm a signature virus! Copy me into your signature, please! **
-----------------------------------------------------------------------
More information about the Python-announce-list
mailing list