[omaha] GAE support for web2py - Some corrections

Tim Alexander dragonfyre13 at gmail.com
Thu Feb 23 21:26:42 CET 2012


Wanted to post to the list, since I was inaccurate on a few things in
talking about GAE support in web2py. Some of the stuff I was thinking was
surrounding the googleSQL engine, and some of it was workarounds that I
thought were transparent (below). Here's a synopsis of
restrictions/workarounds/alternatives on GAE. Note, everything else in
web2py works as expected, other than the restrictions, most have relatively
straightforward workarounds (posted to the web2py mailing list for most of
them, some in the book itself, but most DB related in the
web2pyslices.comlink directly below)

First, the good stuff: In 1.69.1 (in 2009) "workarounds" were added to
web2py so that what can't be done in BigTable itself is done at the web2py
layer level. For example, joins, OR, orderby (including complex orderby),
LIKE, datetime manipulations, etc. Here's a great
page<http://www.web2pyslices.com/slices/take_slice/16>on those, and
they work quite well in my limited experience and from what I
hear. I believe it's not in the book because you have to be careful of
using these methods since they can potentially be quite inefficient
(performing evaluations in python of large data sets, rather than a
database).

Straight from the book, however:

*GAE also supports a Google SQL database (compatible with MySQL) and a
Google NoSQL (referred to as "Datastore").**
**web2py supports both. If you wish to use Google SQL database follow the
instructions on Chapter 6. This section assues you will be using the Google
Datastore.**
**The GAE platform provides several advantages over normal hosting
solutions:*


   - *Ease of deployment. Google completely abstracts the underlying
      architecture.*
      - *Scalability. Google will replicate your app as many times as it
      takes to serve all concurrent requests.*
      - *One can choose between a SQL and a NoSQL database (or both
      together).*

*But also some disadvantages:*


   - *No read or write access to the file system.*
      - *No HTTPS unless you use the appspot.com domain with a Google
      certificate.*

*and some Datastore specific disadvantages:*


   - *No typical transactions.*
      - *No complex datastore queries. In particular there are no JOIN,
      LIKE, and DATE/DATETIME operators.*
      - *No multiple OR sub-queries unless they involve one and the same
      field.*
      - *Because of the readonly filesystem, web2py cannot store sessions,
      error tickets, cache files and uploaded files in the filesystem;
they must
      be stored in the datastore and not in the filesystem.*

Also from another section of the book:

*Google NoSQL (Datastore) does not allow joins, left joins, aggregates,
expression, OR involving more than one table, the like operator and search
in "text"" fields. Transactions are limited and not provided automatically
by web2py (you need to use the Google API run_in_transaction which you can
look up in the Google App Engine documentation online). Google also limits
the number of records you can retrieve in each one query (1000 at the time
of writing). On the Google datastore record IDs are integer but they are
not sequential. While on SQL the "list:string" type is mapped into a "text"
type, on the Google Datastore it is mapped into aListStringProperty.
Similarly "list:integer" and "list:reference" are mapped into
"ListProperty". This makes that searches for content inside these fields
types are more efficient on Google NoSQL than on SQL databases.*


Note, joins and many to many relationships be worked around using
tagging<http://web2py.com/books/default/chapter/29/6#Many-to-many>,
and the "OR" restriction is worked around in the current DAL code by
essentially doing it programmatically. Text field search IS supported in
two ways, *kinda*. See this
post<http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine>for
the idea behind how it's done. Essentially, it boils down to storage
of
ngrams into a stringlistproperty<http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html#StringListProperty>object,
which actually works quite well on GAE (been pleased with this
method for other stuff). If you just want simple, and not neccesarily
performant, this works fine too, though since it's run directly in web2py
it is more expensive (this is the semi-official way of doing in web2py):
db(...).select().find(lambda row: row.find('substring')>=0)

Eli, you were interested specifically in OR queries, here's the method as on
the slices page <http://www.web2pyslices.com/slices/take_slice/16> (with a
complex orderby thrown in). You can see the computation is done in web2py,
rather than the DB:
on RDBS

rows = db((purchase.buyer==kenny)|(purchase.buyer==cartman))\
         (purchase.buyer==buyer.id).select(orderby=buyer.name)

on RDBS and GAE

rows = (db(purchase.buyer==kenny).select()|db(purchase.buyer==cartman)\
       .select()).sort(lambda row:row.buyer.name)
print rows


More information about the Omaha mailing list