[Python-Dev] Unicode patches checked in

M.-A. Lemburg mal@lemburg.com
Wed, 15 Mar 2000 18:39:14 +0100


Vladimir Marangozov wrote:
> 
> > [me]
> > >
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> >
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Uhm, comparing the two versions Python 1.5 and the current
CVS Python I get these figures on Linux:

Executing : ./python -i -c '1/0'

Python 1.5: 1208kB / 728 kB (resident/shared)
Python CVS: 1280kB / 808 kB ("/")

Not much of a change if you ask me and the CVS version has the
unicodedata module linked statically... so there's got to be
some sharing and load-on-demand going on behind the scenes:
this is what I was referring to when I mentioned static
C data. The OS can much better deal with these sharing techniques
and delayed loads than anything we could implement on top of
it in C or Python.

But perhaps this is Linux-specific...
 
> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

I'm leaving this as exercise to the interested reader ;-)
Really, if you have better ideas for the unicodedata module,
please go ahead.
 
> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

Could you check this on other platforms ? Perhaps Linux is
doing more than other OSes are in this field.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/