[Python-checkins] bpo-41048: mimetypes should read the rule file using UTF-8, not the locale encoding (GH-20998)

Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి) webhook-mailer at python.org
Mon Jun 29 04:37:08 EDT 2020


https://github.com/python/cpython/commit/7f569c9bc0079906012b3034d30fe8abc742e7fc
commit: 7f569c9bc0079906012b3034d30fe8abc742e7fc
branch: master
author: Srinivas Reddy Thatiparthy (శ్రీనివాస్  రెడ్డి తాటిపర్తి) <thatiparthysreenivas at gmail.com>
committer: GitHub <noreply at github.com>
date: 2020-06-29T11:36:48+03:00
summary:

bpo-41048: mimetypes should read the rule file using UTF-8, not the locale encoding (GH-20998)

files:
A Misc/NEWS.d/next/Library/2020-06-20-10-16-57.bpo-41048.hEXB-B.rst
M Lib/mimetypes.py
M Lib/test/test_mimetypes.py
M Misc/ACKS

diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py
index 61bfff1635911..f3343c805452d 100644
--- a/Lib/mimetypes.py
+++ b/Lib/mimetypes.py
@@ -372,7 +372,7 @@ def init(files=None):
 
 def read_mime_types(file):
     try:
-        f = open(file)
+        f = open(file, encoding='utf-8')
     except OSError:
         return None
     with f:
diff --git a/Lib/test/test_mimetypes.py b/Lib/test/test_mimetypes.py
index 9cac6ce0225e1..683d393fdb491 100644
--- a/Lib/test/test_mimetypes.py
+++ b/Lib/test/test_mimetypes.py
@@ -67,6 +67,18 @@ def test_read_mime_types(self):
             mime_dict = mimetypes.read_mime_types(file)
             eq(mime_dict[".pyunit"], "x-application/x-unittest")
 
+        # bpo-41048: read_mime_types should read the rule file with 'utf-8' encoding.
+        # Not with locale encoding. _bootlocale has been imported because io.open(...)
+        # uses it.
+        with support.temp_dir() as directory:
+            data = "application/no-mans-land  Fran\u00E7ais"
+            file = pathlib.Path(directory, "sample.mimetype")
+            file.write_text(data, encoding='utf-8')
+            import _bootlocale
+            with support.swap_attr(_bootlocale, 'getpreferredencoding', lambda do_setlocale=True: 'ASCII'):
+                mime_dict = mimetypes.read_mime_types(file)
+            eq(mime_dict[".Français"], "application/no-mans-land")
+
     def test_non_standard_types(self):
         eq = self.assertEqual
         # First try strict
diff --git a/Misc/ACKS b/Misc/ACKS
index 87f0dede365c2..641ef0cace00e 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1706,6 +1706,7 @@ Mikhail Terekhov
 Victor Terrón
 Pablo Galindo
 Richard M. Tew
+Srinivas Reddy Thatiparthy
 Tobias Thelen
 Christian Theune
 Févry Thibault
diff --git a/Misc/NEWS.d/next/Library/2020-06-20-10-16-57.bpo-41048.hEXB-B.rst b/Misc/NEWS.d/next/Library/2020-06-20-10-16-57.bpo-41048.hEXB-B.rst
new file mode 100644
index 0000000000000..2595900137d69
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2020-06-20-10-16-57.bpo-41048.hEXB-B.rst
@@ -0,0 +1,2 @@
+:func:`mimetypes.read_mime_types` function reads the rule file using UTF-8 encoding, not the locale encoding.
+Patch by Srinivas Reddy Thatiparthy.
\ No newline at end of file



More information about the Python-checkins mailing list