Help with Pyrex: how to declare char arrays?

Tim Churches tchur at optushome.com.au
Mon Dec 30 14:59:03 EST 2002


I have been experimenting with Greg Ewing's fabulous Pyrex module, but I
am a complete idiot when it comes to C (I pulled K&R's "The C
programming language" off my shelf, but I am not much wiser after
re-acquainting myself with the horrors, err, extreme flexibility of
C...). How should I go about declaring the various string variables in
the following Pyrex programme? As it is, Pyrex converts the code to a C
extension module successfully, but it runs slightly slower than the
equivalent Python code, but produces identical results, which is
encouraging.

#########################################
def jaro(str1, str2):
  """Return approximate string comparator measure (between 0.0 and 1.0)
  USAGE:
    score = jaro(str1, str2)
  ARGUMENTS:
    str1  The first string
    str2  The second string
  DESCRIPTION:
    As desribed in 'An Application of the Fellegi-Sunter Model of
    Record Linkage to the 1990 U.S. Decennial Census' by William
    Winkler and Yves Thibaudeau.
  COPYRIGHT:
    Peter Christen and Tim Churches 2002. Available under an open source
    license at http://datamining.anu.edu.au/projects/linkage.html
  """
  
  cdef int len1
  cdef int len2
  cdef int halflen
  cdef int common1
  cdef int common2
  cdef int start
  cdef int end
  cdef int index
  cdef float transposition
  cdef float w
  
  # Quick check if the strings are the same
  #
  if (str1 == str2):
    return 1.0

  len1 = len(str1)
  len2 = len(str2)
  halflen = max(len1, len2) / 2 + 1

  ass1 = ''  # Characters assigned in str1
  ass2 = ''  # Characters assigned in str2

  workstr1 = str1  # Copy of original string
  workstr2 = str2

  common1 = 0  # Number of common characters
  common2 = 0

  # Analyse the first string 
  #
  for i in range(len1):
    start = max(0,i-halflen)
    end   = min(i+halflen+1,len2)
    index = workstr2.find(str1[i],start,end)
    if (index > -1):  # Found common character
      common1 = common1 + 1
      ass1 = ass1+str1[i]
      workstr2 = workstr2[:index]+'*'+workstr2[index+1:]

  # Analyse the second string
  #
  for i in range(len2):
    start = max(0,i-halflen)
    end   = min(i+halflen+1,len1)
    index = workstr1.find(str2[i],start,end)
    if (index > -1):  # Found common character
      common2 = common2 + 1
      ass2 = ass2 + str2[i]
      workstr1 = workstr1[:index]+'*'+workstr1[index+1:]

  if (common1 != common2):
    common1 = float(common1+common2) / 2.0   

  if (common1 == 0):
    return 0.0

  # Compute number of transpositions
  #
  transposition = 0
  for i in range(len(ass1)):
    if (ass1[i] != ass2[i]):
      transposition = transposition + 1
  transposition = transposition / 2.0

  common1 = float(common1)
  w = 1./3.*(common1 / float(len1) + common1 / float(len2) + \
           (common1-transposition) / common1)

  return w
############################################

Tim C






More information about the Python-list mailing list