re beginner

Mon Jun 5 08:30:06 EDT 2006

John Machin a écrit :
> On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
> 
>> SuperHik a écrit :
>>
>>> hi all,
>>>
(snip)

>>> I have an old(er) script with the 
>>> following task - takes a string I copy-pasted and wich always has the 
>>> same format:
>>>
(snip)
 >>>
>> def to_dict(items):
>>     items = items.replace('\t', '\n').split('\n')
> 
> 
> In case there are leading/trailing spaces on the keys:

There aren't. Test passes.

(snip)

> Fantastic -- at least for the OP's carefully copied-and-pasted input.

That was the spec, and my code passes the test.

> Meanwhile back in the real world,

The "real world" is mostly defined by customer's test set (is that the 
correct translation for "jeu d'essai" ?). Code passes the test. period.

> there might be problems with multiple 
> tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.

Which means that the spec and the customer's test set is wrong. Not my 
responsability. Any way, I refuse to change anything in the parsing 
algorithm before having another test set.

> In that case a loop approach that validated as it went and was able to 
> report the position and contents of any invalid input might be better.

One doesn't know what *will* be better without actual facts. You can be 
right (and, from my experience, you probably are !-), *but* you can be 
wrong as well. Until you have a correct spec and test data set on which 
the code fails, writing any other code is a waste of time. Better to 
work on other parts of the system, and come back on this if and when the 
need arise.

<ot>
Kind of reminds me of a former employer that paid me 2 full monthes to 
work on a very hairy data migration script (the original data set was so 
f... up and incoherent even a human parser could barely make any sens of 
it), before discovering than none of the users of the old system was 
interested in migrating that part of the data. Talk about a waste of 
time and money...
</ot>

Now FWIW, there's actually something else bugging me with this code : it 
loads the whole data set in memory. It's ok for a few lines, but 
obviously wrong if one is to parse huge files. *That* would be the first 
thing I would change - it takes a couple of minutes to do so no real 
waste of time, but it obviously imply rethinking the API, which is 
better done yet than when client code will have been written.

My 2 cents....