[Tutor] Re: Tutor Digest, Vol 8, Issue 67
Kent Johnson
kent_johnson at skillsoft.com
Fri Oct 22 12:03:28 CEST 2004
A few tweaks for your current algorithm:
- Don't do the sqrt in get_distance(), you don't need the actual distance.
Change Sequence.distance to 0.005**2 and everything will work the same.
- Keep the dict keys as floats so you don't have to convert every time in
get_distance(), the conversion is likely to be expensive.
- This is gross abuse of a dictionary! You are getting the keys as a list
and searching the list!!
if s.start in endpoints.keys():
endpoints[s.start].append(s)
instead use
if s.start in endpoints:
endpoints[s.start].append(s)
or my preference which only looks up s.start once:
try:
endpoints[s.start].append(s)
except KeyError: pass
Not sure why you have this. s.start will only be in endpoints once:
> if self.points[0] == self.points[-1]:
> try: endpoints.pop(s.start)
> except KeyError: pass # I don't understand why this exc. is
> called
Kent
At 09:49 AM 10/22/2004 +0200, Michael Dunn wrote:
> > Message: 9
> > Date: Thu, 21 Oct 2004 20:29:42 -0400
> > From: Kent Johnson <kent_johnson at skillsoft.com>
> > Subject: Re: [Tutor] Re: processing lines and polygons
> >
> > Michael,
> >
> > The problem with snap-to-grid is you can still have two points that are
> > close but snap to different grid points. Maybe you could put all the points
> > into a sorted list and go through it looking for points that are close? If
> > the list contained the point and the list containing the point, you could
> > merge them...
> >
> > Kent
>
>Hi Kent,
>
>Yeah, the problem with points snapping to different grid points occurred to me
>before I got too far, so I went with just iterating through the list of
>endpoints searching for near points (I specifically don't want to join any
>lines by their middles, which makes the task easier). Current problems:
>
>1. This search is very slow, which is only a bit of a problem, since I don't
>need to run this script many times once it's working. However, maybe you (or
>anybody else interested) could take a look at the get_distance function. There
>must be some way of sorting the list of points so I don't have to iterate
>through the entire thing.
>
>2. The endpoint matching routine fails if a segment should be matched at both
>ends. I think I can get around this by filtering the data through the script a
>few times (i.e. until no further changes are made) before doing the format
>conversion.
>
>Here's where things stand:
>
>#!/usr/bin/env python
>"""Take matlab/splus or mapgen vector and join broken lines
>Input data from: http://rimmer.ngdc.noaa.gov/mgg/coast/getcoast.html"""
>import re, sys
>from math import sqrt
>
>class Sequence:
> _trace = False # append list of contributing input sequences to output
> output_format = "matlab" # matlab, mapgen, mif (MapInfo Interchange
> Format)
> distance = 0.005 # join open ends <= Sequence.distance
>
> def __init__(self, segment):
> self.points = [tuple(point.split("\t")) for point\
> in segment.strip().split("\n")]
> self.start = self.points[0]
> self.end = self.points[-1]
> self._history = []
> return
>
> def append(self, other):
> # only called if self is already in the output_sequences list
> self._history.extend(other._history)
> # reverse the candidate sequence if necessary
> if get_distance(self.start, other.start) <= Sequence.distance or\
> get_distance(self.end, other.end) <= Sequence.distance:
> other.points.reverse()
> other.start, other.end = other.end, other.start
> # do the join
> add_key, remove_key = None, None
> if self.start == other.end:
> self.points = other.points[:-1] + self.points
> add_key, remove_key = other.start, self.start
> elif get_distance(self.start, other.end) <= Sequence.distance:
> self.points = other.points + self.points
> add_key, remove_key = other.start, self.start
> elif self.end == other.start:
> self.points = self.points + other.points[1:]
> add_key, remove_key = other.end, self.end
> elif get_distance(self.end, other.start) <= Sequence.distance:
> self.points = self.points + other.points
> add_key, remove_key = other.end, self.end
> # close up near-polygons
> if get_distance(self.points[0], self.points[-1]) <=
> Sequence.distance and\
> self.points[0] != self.points[-1]: # close but not coincident
> self.points.append(self.points[0])
> # update endpoints dictionary
> if self.points[0] == self.points[-1]:
> try: endpoints.pop(s.start)
> except KeyError: pass # I don't understand why this exc. is
> called
> if remove_key:
> try: endpoints.pop(remove_key)
> except KeyError: pass # Likewise...
> if add_key:
> endpoints[add_key] = self
> self.start = self.points[0]
> self.end = self.points[-1]
> return
>
> def __str__(self):
> if Sequence.output_format == "mif":
> divider = "PLine\n %s" % len(self.points)
> elif Sequence.output_format == "matlab":
> divider = "NA NA"
> elif Sequence.output_format == "mapgen":
> divider = "# -b"
> if Sequence._trace:
> divider = "%s <-- %s" % (divider, str(self._history))
> s = [divider]
> for point in self.points:
> s.append("%s\t%s" % point)
> return "\n".join(s)
>
>def get_distance(a, b):
> (ax, ay), (bx, by) = a, b
> return sqrt((float(ax)-float(bx))**2 + (float(ay)-float(by))**2)
>
>def get_nearest(value, points):
> # print >> sys.stderr, "Checking for nearest point to", value
> try: nearest_point = points[0]
> except IndexError: return (0,0)
> distance = get_distance(value, nearest_point)
> for point in points[1:]:
> if get_distance(value, point) < distance:
> nearest_point = point
> distance = get_distance(value, point)
> return nearest_point
>
>endpoints = {}
>output_sequences = []
>mif_header = """version 3
>Coordsys earth projection 1,104
>Columns 1
> SEG_ID integer
>Data
>"""
>
>if __name__ == "__main__":
> f = file(sys.argv[1], "rU")
> data = f.read()
> f.close()
> print >> sys.stderr, "Processing sequences"
> input_sequences = re.split("NA NA|# -b", data.strip())
> for i, s in enumerate(input_sequences[1:]):
> s = Sequence(s)
> s._history.append(i)
> start_snap, end_snap = None, None
> if s.start in endpoints.keys():
> endpoints[s.start].append(s)
> elif s.end in endpoints.keys():
> endpoints[s.end].append(s)
> else:
> start_snap = get_nearest(s.start, endpoints.keys())
> end_snap = get_nearest(s.end, endpoints.keys())
> # this fails if a new sequence should be joined at both ends
> # a possible workaround would be to filter the data through the
> program
> # a few times before converting to .mif
> if start_snap and get_distance(s.start, start_snap) <=
> Sequence.distance:
> endpoints[start_snap].append(s)
> elif end_snap and get_distance(s.end, end_snap) <= Sequence.distance:
> endpoints[end_snap].append(s)
> else:
> endpoints[s.start] = s
> endpoints[s.end] = s
> output_sequences.append(s)
> # print mif_header
> # Sequence.output_format = "mif"
> for s in output_sequences:
> print s
> print >> sys.stderr, "Original: %s sequences\nJoined: %s sequences" %\
> (len(input_sequences), len(output_sequences))
>
>
>_______________________________________________
>Tutor maillist - Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list