Perl vs. Python for text manipulation

Serge Orlov sombDELETE at pobox.ru
Sat Jan 24 13:00:17 EST 2004


"Aahz" <aahz at pythoncraft.com> wrote in message news:buu7pq$i67$1 at panix1.panix.com...
> In article <buu6j5$12lf$1 at nadya.doma>, Serge Orlov <sombDELETE at pobox.ru> wrote:
> >"Ganesan R" <rganesan at myrealbox.com> wrote in message news:ou4qulnwy0.fsf at andlx-anamika.cisco.com...
> >>
> >> fileinput is not optimized yet, at least I don't remember any mails
> >> about fileinput in python-devel since python 2.3 was released.  I
> >> know that it is slow. for line in file: does seem to be optimized
> >> though. The last time I ran the tests python was definitely twice as
> >> slow (which was before python 2.3 was officially released); now it
> >> appears to be only about 40% slower. I need to revisit these crude
> >> benchmarks.
> >
> >Since this problem is not IO bound but rather python internals bound,
> >it makes sense to try psyco.
>
> Pysco won't help and might actually make things worse.

Sorry, I was dreaming :) I was dreaming about the day when psyco can
recognize the file use pattern and optimize the hell out it. Of course this
optimization is not there at this time.

> At this point,
> Perl's speed advantage should come from two and only two sources: Perl
> optimizes the snot out of platform I/O (so these speed tests don't apply
> to a platform Perl hasn't been ported to), and Perl does not use the
> thread-safe forms of I/O.  Python does a fair amount of internal caching
> to make up for that, and the file object is already written in C.

I'm not sure I understand what does it mean "optimize the snot out of
platform I/O"? You just use the bare bones non-caching API and do your
own simple caching doing as little as possible. This will allow programs
that work with files as streams (read sequentially a lot from one file, write
sequentially to another) to run at top speed. Other access patterns may
suffer but not very much.
As for threads, is it required that after one thread .write() the other thread
can immediately .read() what the first thread wrote? If not, threads can
have separate caching. After all, doing I/O on the same file from multipile
threads is uncommon so it can suffer. 99+ percent of I/O in the world <wink>
is done from one thread, why should it suffer?

-- Serge.






More information about the Python-list mailing list