[Python-checkins] r83448 - in python/branches/release31-maint: Lib/test/test_robotparser.py Lib/urllib/robotparser.py Misc/NEWS

georg.brandl python-checkins at python.org
Sun Aug 1 22:57:27 CEST 2010


Author: georg.brandl
Date: Sun Aug  1 22:57:27 2010
New Revision: 83448

Log:
Merged revisions 83238 via svnmerge from 
svn+ssh://svn.python.org/python/branches/py3k

........
  r83238 | georg.brandl | 2010-07-29 19:55:01 +0200 (Do, 29 Jul 2010) | 1 line
  
  #4108: the first default entry (User-agent: *) wins.
........


Modified:
   python/branches/release31-maint/   (props changed)
   python/branches/release31-maint/Lib/test/test_robotparser.py
   python/branches/release31-maint/Lib/urllib/robotparser.py
   python/branches/release31-maint/Misc/NEWS

Modified: python/branches/release31-maint/Lib/test/test_robotparser.py
==============================================================================
--- python/branches/release31-maint/Lib/test/test_robotparser.py	(original)
+++ python/branches/release31-maint/Lib/test/test_robotparser.py	Sun Aug  1 22:57:27 2010
@@ -216,6 +216,20 @@
 
 RobotTest(14, doc, good, bad)
 
+# 15. For issue #4108 (obey first * entry)
+doc = """
+User-agent: *
+Disallow: /some/path
+
+User-agent: *
+Disallow: /another/path
+"""
+
+good = ['/another/path']
+bad = ['/some/path']
+
+RobotTest(15, doc, good, bad)
+
 
 class NetworkTestCase(unittest.TestCase):
 

Modified: python/branches/release31-maint/Lib/urllib/robotparser.py
==============================================================================
--- python/branches/release31-maint/Lib/urllib/robotparser.py	(original)
+++ python/branches/release31-maint/Lib/urllib/robotparser.py	Sun Aug  1 22:57:27 2010
@@ -66,7 +66,9 @@
     def _add_entry(self, entry):
         if "*" in entry.useragents:
             # the default entry is considered last
-            self.default_entry = entry
+            if self.default_entry is None:
+                # the first default entry wins
+                self.default_entry = entry
         else:
             self.entries.append(entry)
 
@@ -118,7 +120,7 @@
                         entry.rulelines.append(RuleLine(line[1], True))
                         state = 2
         if state == 2:
-            self.entries.append(entry)
+            self._add_entry(entry)
 
 
     def can_fetch(self, useragent, url):

Modified: python/branches/release31-maint/Misc/NEWS
==============================================================================
--- python/branches/release31-maint/Misc/NEWS	(original)
+++ python/branches/release31-maint/Misc/NEWS	Sun Aug  1 22:57:27 2010
@@ -84,6 +84,9 @@
 Library
 -------
 
+- Issue #4108: In urllib.robotparser, if there are multiple 'User-agent: *'
+  entries, consider the first one.
+
 - Issue #8397: Raise an error when attempting to mix iteration and regular
   reads on a BZ2File object, rather than returning incorrect results.
 


More information about the Python-checkins mailing list