[Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)

noreply@sourceforge.net noreply@sourceforge.net
Wed, 10 Jul 2002 13:51:14 -0700


Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Skip Montanaro (montanaro)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470