Help with Pyrex: how to declare char arrays?
Tim Churches
tchur at optushome.com.au
Mon Dec 30 14:59:03 EST 2002
I have been experimenting with Greg Ewing's fabulous Pyrex module, but I
am a complete idiot when it comes to C (I pulled K&R's "The C
programming language" off my shelf, but I am not much wiser after
re-acquainting myself with the horrors, err, extreme flexibility of
C...). How should I go about declaring the various string variables in
the following Pyrex programme? As it is, Pyrex converts the code to a C
extension module successfully, but it runs slightly slower than the
equivalent Python code, but produces identical results, which is
encouraging.
#########################################
def jaro(str1, str2):
"""Return approximate string comparator measure (between 0.0 and 1.0)
USAGE:
score = jaro(str1, str2)
ARGUMENTS:
str1 The first string
str2 The second string
DESCRIPTION:
As desribed in 'An Application of the Fellegi-Sunter Model of
Record Linkage to the 1990 U.S. Decennial Census' by William
Winkler and Yves Thibaudeau.
COPYRIGHT:
Peter Christen and Tim Churches 2002. Available under an open source
license at http://datamining.anu.edu.au/projects/linkage.html
"""
cdef int len1
cdef int len2
cdef int halflen
cdef int common1
cdef int common2
cdef int start
cdef int end
cdef int index
cdef float transposition
cdef float w
# Quick check if the strings are the same
#
if (str1 == str2):
return 1.0
len1 = len(str1)
len2 = len(str2)
halflen = max(len1, len2) / 2 + 1
ass1 = '' # Characters assigned in str1
ass2 = '' # Characters assigned in str2
workstr1 = str1 # Copy of original string
workstr2 = str2
common1 = 0 # Number of common characters
common2 = 0
# Analyse the first string
#
for i in range(len1):
start = max(0,i-halflen)
end = min(i+halflen+1,len2)
index = workstr2.find(str1[i],start,end)
if (index > -1): # Found common character
common1 = common1 + 1
ass1 = ass1+str1[i]
workstr2 = workstr2[:index]+'*'+workstr2[index+1:]
# Analyse the second string
#
for i in range(len2):
start = max(0,i-halflen)
end = min(i+halflen+1,len1)
index = workstr1.find(str2[i],start,end)
if (index > -1): # Found common character
common2 = common2 + 1
ass2 = ass2 + str2[i]
workstr1 = workstr1[:index]+'*'+workstr1[index+1:]
if (common1 != common2):
common1 = float(common1+common2) / 2.0
if (common1 == 0):
return 0.0
# Compute number of transpositions
#
transposition = 0
for i in range(len(ass1)):
if (ass1[i] != ass2[i]):
transposition = transposition + 1
transposition = transposition / 2.0
common1 = float(common1)
w = 1./3.*(common1 / float(len1) + common1 / float(len2) + \
(common1-transposition) / common1)
return w
############################################
Tim C
More information about the Python-list
mailing list