SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
zljubisicmob at gmail.com
zljubisicmob at gmail.com
Sat May 9 06:31:08 EDT 2015
Steven,
please do look at the code bellow:
# C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents
# [Dir]
# ROOTDIR = C:\Users\zoran\hrt
import os
import shutil
import configparser
import requests
import re
Config = configparser.ConfigParser()
Config.optionxform = str # preserve case in ini file
cfg_file = os.path.join('C:\\Users\\zoran\\PycharmProjects\\mm_align\\hrt3.cfg' )
Config.read(cfg_file)
ROOTDIR = Config.get('Dir', 'ROOTDIR')
print(ROOTDIR)
html = requests.get("http://radio.hrt.hr/prvi-program/arhiva/ujutro-prvi-poligraf-politicki-grafikon/118/").text
art_html = re.search('<article id="aod_0">(.+?)</article>', html, re.DOTALL).group(1)
for p_tag in re.finditer(r'<p>(.*?)</p>', art_html, re.DOTALL):
if '<strong>' not in p_tag.group(1):
title = p_tag.group(1)
title = title[:232]
title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
.replace('"', "_").replace(':', "_").replace(',', "_").replace('"', '')\
.replace('\n', '_').replace(''', '')
print(title)
src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')
print(len(src_file), src_file)
print(len(dst_file), dst_file)
with open(src_file, mode='w', encoding='utf-8') as s_file:
s_file.write('test')
shutil.move(src_file, dst_file)
It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
As you can see ROOTDIR contains \U.
Regards.
More information about the Python-list
mailing list