SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

zljubisicmob at gmail.com zljubisicmob at gmail.com
Sat May 9 06:31:08 EDT 2015


Steven,

please do look at the code bellow:

# C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents
# [Dir]
# ROOTDIR = C:\Users\zoran\hrt


import os
import shutil
import configparser
import requests
import re

Config = configparser.ConfigParser()
Config.optionxform = str # preserve case in ini file
cfg_file = os.path.join('C:\\Users\\zoran\\PycharmProjects\\mm_align\\hrt3.cfg' )
Config.read(cfg_file)



ROOTDIR = Config.get('Dir', 'ROOTDIR')

print(ROOTDIR)

html = requests.get("http://radio.hrt.hr/prvi-program/arhiva/ujutro-prvi-poligraf-politicki-grafikon/118/").text

art_html = re.search('<article id="aod_0">(.+?)</article>', html, re.DOTALL).group(1)
for p_tag in re.finditer(r'<p>(.*?)</p>', art_html, re.DOTALL):
    if '<strong>' not in p_tag.group(1):
        title = p_tag.group(1)

title = title[:232]
title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
                    .replace('"', "_").replace(':', "_").replace(',', "_").replace('"', '')\
                    .replace('\n', '_').replace('&#39', '')

print(title)

src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')

print(len(src_file), src_file)
print(len(dst_file), dst_file)

with open(src_file, mode='w', encoding='utf-8') as s_file:
    s_file.write('test')


shutil.move(src_file, dst_file)

It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
As you can see ROOTDIR contains \U.

Regards.



More information about the Python-list mailing list