Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
tiissa
tiissa at nonfree.fr
Mon Apr 18 15:00:23 EDT 2005
Synonymous wrote:
> tiissa <tiissa at nonfree.fr> wrote in message news:<42623ba8$0$10322$636a15ce at news.free.fr>...
>
>>tiissa wrote:
>>
>>>If you know the number of characters to match can't you just compare
>>>slices?
>>
>>If you don't, you can still do it by hand:
>>
>>In [7]: def cmp(s1,s2):
>> ....: diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1),
>>len(s2)))]
>> ....: diff_index=''.join(diff_map).find(chr(True))
>> ....: if -1==diff_index:
>> ....: return min(len(s1), len(s2))
>> ....: else:
>> ....: return diff_index
>> ....:
>
> I will look at that, although if i have 300 images i dont want to type
> all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would
> just be easier to sort them then :).
I didn't meant you had to type it by hand. I thought about writing a
small script (as opposed to using some in the standard tools). It might
look like:
In [22]: def make_group(L):
....: root,res='',[]
....: for i in range(1,len(L)):
....: if ''==root:
....: root=L[i][:cmp(L[i-1],L[i])]
....: if ''==root:
....: res.append((L[i-1],[L[i-1]]))
....: else:
....: res.append((root,[L[i-1],L[i]]))
....: elif len(root)==cmp(root,L[i]):
....: res[-1][1].append(L[i])
....: else:
....: root=''
....: if ''==root:
....: res.append((L[-1],[L[-1]]))
....: return res
....:
In [23]: L=['cccat','cccap','cccan','dddfa','dddfg','dddfz']
In [24]: L.sort()
In [25]: make_group(L)
Out[25]: [('ccca', ['cccan', 'cccap', 'cccat']), ('dddf', ['dddfa',
'dddfg', 'dddfz'])]
However I guarantee no optimality in the number of classes (but, hey,
that's when you don't specify the size of the prefix).
(Actually, I guarantee nothing at all ;p)
But in particular, you can have some file singled out:
In [26]: make_group(['cccan','cccap','cccat','cccb'])
Out[26]: [('ccca', ['cccan', 'cccap', 'cccat']), ('cccb', ['cccb'])]
It is a matter of choice: either you want to specify by hand the size of
the prefix and you'd rather look at itertools as pointed out by Kent, or
you don't and a variation with the above code might do the job.
More information about the Python-list
mailing list