[Python-bugs-list] [ python-Bugs-603509 ] MemoryError when eval'ing string

Mon, 02 Sep 2002 14:23:09 -0700

Bugs item #603509, was opened at 2002-09-02 09:56
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=603509&group_id=5470

Category: Python Interpreter Core
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Martin v. Löwis (loewis)
Summary: MemoryError when eval'ing string

Initial Comment:
eval("'label;home;encoding=quoted-printable:r.'")

dies with a bogus MemoryError.  Assigned to Martin 
because this minimal substring dies the same way:

eval("'coding=q'")

Of course the result should be the string

    coding=q

Somehow it looks like parsing a string literal is getting 
mixed up with searching for a source-file encoding.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-09-02 17:23

Message:
Logged In: YES 
user_id=31435

I don't understand the deeper issues here, but

    eval(repr(s)) == s

must be true for every string s.  Take that as an absolute 
requirement and I'm sure you'll find a way to do it <wink>.

Waiting for a complaint isn't really an option.  It's been 
perfectly safe to dump strings out to text files via repr(), 
and restore them via eval(), since Python's first release.  
The program I was running when this happened was doing 
exactly that.  The strings it was dumping and restoring 
came from c.l.py msgs, and there's no string that can be 
guaranteed not to show up there.  In particular, it's likely 
that a msg containing an encoding decoration will show up 
there as an example.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-09-02 13:40

Message:
Logged In: YES 
user_id=21627

The attached patch fixes the problem. It is still possible
to trick this code, with

eval("'#coding=q'")

I'm not really sure how to deal with that; I see the
following options:
1. tighten PEP 263 to require that the encoding comment is
the only thing in a source line.
2. perform some minimal scanning of the line, to see whether
we are inside a string literal when we see the #. This can
probably be tricked with a multi-line string.
3. perform source encoding analysis after in the tokenizer
proper, where comments are detected. This would be a heavy
change.
4. Just apply this patch, and wait until somebody complains.

Directions appreciated.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=603509&group_id=5470