[Tutor] Re: Tutor Digest, Vol 8, Issue 67

Fri Oct 22 12:03:28 CEST 2004

A few tweaks for your current algorithm:

- Don't do the sqrt in get_distance(), you don't need the actual distance. 
Change Sequence.distance to 0.005**2 and everything will work the same.

- Keep the dict keys as floats so you don't have to convert every time in 
get_distance(), the conversion is likely to be expensive.

- This is gross abuse of a dictionary! You are getting the keys as a list 
and searching the list!!
         if s.start in endpoints.keys():
             endpoints[s.start].append(s)

instead use
         if s.start in endpoints:
             endpoints[s.start].append(s)

or my preference which only looks up s.start once:
       try:
             endpoints[s.start].append(s)
       except KeyError: pass

Not sure why you have this. s.start will only be in endpoints once:
>         if self.points[0] == self.points[-1]:
>             try: endpoints.pop(s.start)
>             except KeyError: pass # I don't understand why this exc. is 
> called

Kent

At 09:49 AM 10/22/2004 +0200, Michael Dunn wrote:
> > Message: 9
> > Date: Thu, 21 Oct 2004 20:29:42 -0400
> > From: Kent Johnson <kent_johnson at skillsoft.com>
> > Subject: Re: [Tutor] Re: processing lines and polygons
> >
> > Michael,
> >
> > The problem with snap-to-grid is you can still have two points that are
> > close but snap to different grid points. Maybe you could put all the points
> > into a sorted list and go through it looking for points that are close? If
> > the list contained the point and the list containing the point, you could
> > merge them...
> >
> > Kent
>
>Hi Kent,
>
>Yeah, the problem with points snapping to different grid points occurred to me
>before I got too far, so I went with just iterating through the list of
>endpoints searching for near points (I specifically don't want to join any
>lines by their middles, which makes the task easier). Current problems:
>
>1. This search is very slow, which is only a bit of a problem, since I don't
>need to run this script many times once it's working. However, maybe you (or
>anybody else interested) could take a look at the get_distance function. There
>must be some way of sorting the list of points so I don't have to iterate
>through the entire thing.
>
>2. The endpoint matching routine fails if a segment should be matched at both
>ends. I think I can get around this by filtering the data through the script a
>few times (i.e. until no further changes are made) before doing the format
>conversion.
>
>Here's where things stand:
>
>#!/usr/bin/env python
>"""Take matlab/splus or mapgen vector and join broken lines
>Input data from: http://rimmer.ngdc.noaa.gov/mgg/coast/getcoast.html"""
>import re, sys
>from math import sqrt
>
>class Sequence:
>     _trace = False # append list of contributing input sequences to output
>     output_format = "matlab" # matlab, mapgen, mif (MapInfo Interchange 
> Format)
>     distance = 0.005 # join open ends <= Sequence.distance
>
>     def __init__(self, segment):
>         self.points = [tuple(point.split("\t")) for point\
>                 in segment.strip().split("\n")]
>         self.start = self.points[0]
>         self.end = self.points[-1]
>         self._history = []
>         return
>
>     def append(self, other):
>         # only called if self is already in the output_sequences list
>         self._history.extend(other._history)
>         # reverse the candidate sequence if necessary
>         if get_distance(self.start, other.start) <= Sequence.distance or\
>                 get_distance(self.end, other.end) <= Sequence.distance:
>             other.points.reverse()
>             other.start, other.end = other.end, other.start
>         # do the join
>         add_key, remove_key = None, None
>         if self.start == other.end:
>             self.points = other.points[:-1] + self.points
>             add_key, remove_key = other.start, self.start
>         elif get_distance(self.start, other.end) <= Sequence.distance:
>             self.points = other.points + self.points
>             add_key, remove_key = other.start, self.start
>         elif self.end == other.start:
>             self.points = self.points + other.points[1:]
>             add_key, remove_key = other.end, self.end
>         elif get_distance(self.end, other.start) <= Sequence.distance:
>             self.points = self.points + other.points
>             add_key, remove_key = other.end, self.end
>         # close up near-polygons
>         if get_distance(self.points[0], self.points[-1]) <= 
> Sequence.distance and\
>                 self.points[0] != self.points[-1]: # close but not coincident
>             self.points.append(self.points[0])
>         # update endpoints dictionary
>         if self.points[0] == self.points[-1]:
>             try: endpoints.pop(s.start)
>             except KeyError: pass # I don't understand why this exc. is 
> called
>         if remove_key:
>             try: endpoints.pop(remove_key)
>             except KeyError: pass # Likewise...
>         if add_key:
>             endpoints[add_key] = self
>         self.start = self.points[0]
>         self.end = self.points[-1]
>         return
>
>     def __str__(self):
>         if Sequence.output_format == "mif":
>             divider = "PLine\n   %s" % len(self.points)
>         elif Sequence.output_format == "matlab":
>             divider = "NA NA"
>         elif Sequence.output_format == "mapgen":
>             divider = "# -b"
>         if Sequence._trace:
>             divider = "%s <-- %s" % (divider, str(self._history))
>         s = [divider]
>         for point in self.points:
>             s.append("%s\t%s" % point)
>         return "\n".join(s)
>
>def get_distance(a, b):
>     (ax, ay), (bx, by) = a, b
>     return sqrt((float(ax)-float(bx))**2 + (float(ay)-float(by))**2)
>
>def get_nearest(value, points):
>     # print >> sys.stderr, "Checking for nearest point to", value
>     try: nearest_point = points[0]
>     except IndexError: return (0,0)
>     distance = get_distance(value, nearest_point)
>     for point in points[1:]:
>         if get_distance(value, point) < distance:
>             nearest_point = point
>             distance = get_distance(value, point)
>     return nearest_point
>
>endpoints = {}
>output_sequences = []
>mif_header = """version 3
>Coordsys earth projection 1,104
>Columns 1
>   SEG_ID integer
>Data
>"""
>
>if __name__ == "__main__":
>     f = file(sys.argv[1], "rU")
>     data = f.read()
>     f.close()
>     print >> sys.stderr, "Processing sequences"
>     input_sequences = re.split("NA NA|# -b", data.strip())
>     for i, s in enumerate(input_sequences[1:]):
>         s = Sequence(s)
>         s._history.append(i)
>         start_snap, end_snap = None, None
>         if s.start in endpoints.keys():
>             endpoints[s.start].append(s)
>         elif s.end in endpoints.keys():
>             endpoints[s.end].append(s)
>         else:
>             start_snap = get_nearest(s.start, endpoints.keys())
>             end_snap = get_nearest(s.end, endpoints.keys())
>         # this fails if a new sequence should be joined at both ends
>         # a possible workaround would be to filter the data through the 
> program
>         # a few times before converting to .mif
>         if start_snap and get_distance(s.start, start_snap) <= 
> Sequence.distance:
>             endpoints[start_snap].append(s)
>         elif end_snap and get_distance(s.end, end_snap) <= Sequence.distance:
>             endpoints[end_snap].append(s)
>         else:
>             endpoints[s.start] = s
>             endpoints[s.end] = s
>             output_sequences.append(s)
>     # print mif_header
>     # Sequence.output_format = "mif"
>     for s in output_sequences:
>         print s
>     print >> sys.stderr, "Original: %s sequences\nJoined: %s sequences" %\
>             (len(input_sequences), len(output_sequences))
>
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor