From george.sakkis at gmail.com Thu Dec 3 01:49:02 2009 From: george.sakkis at gmail.com (George Sakkis) Date: Thu, 3 Dec 2009 02:49:02 +0200 Subject: [PyAthens] Filename corruption Message-ID: <91ad5bf80912021649u79b22198ua7461f57335e7f55@mail.gmail.com> ??? ????? ??????? ?? Python ?? ????????, ???? ???? ??? ????? ???????? ??? encodings ??? unicode ???? ????????? ????????? ???? ?? ???? ?? ??????? ??? ??????. ???? ??? ???????? ???????? ??????? ??? ??? ????? ?? ???? (NTFS ??? ?? ???) ???? Linux ?????? ?? ?????? ?? charset ??? mount ?? ?????????? ?????? ?????? ?? ?????????? ?????????? ?? ?? ??????? ?????. ?????? ??????? ???????????????? ?? ??????????? ('?'), ???? ?? ?????? ??????? ??? ???? ?????? ???????? (?? ????? ??? ????? ??? ?? ???? ???? ?? ????? ?? ??? ?? copy ??? ????? ??? ?? ??? ??? ?????? ?? mounting options ? ???? ???? ???????). ???????? ??? ???????????????? ?? ??????????? ??? ????? ???????????? ???? ?? ????? ?? ???????? ?? ???????? ?? ???????. ???????? ?? ??????? ?? encoding ?? ??? Universal Encoding Detector (http://chardet.feedparser.org/) ??? ??? ????? utf-8 ??? ?? ???????????, ??? ??????? ?? ?????? ???????????. ???'??? ???? ?? ??????? ?????. ???????? ????????? ????? ?????? ??? ??? ??????? directory (???? ????? ???? extract ???? ??? ????? ???? ???? ???????). ?? ??????? ???????? ?? ????????????????? ?? ?????????? ??????? ???? free ?????????/?????/????? ??? ??????? meeting ;-) >>> import os >>> words = sorted(set(f.split()[0] for f in os.listdir('.'))) >>> words ['00021.wmv', 'KET\xe2\x95\xac\xd0\xa8', '\xce\x95\xcf\x80\xce\xb9\xcf\x83\xce\xba\xce\xb5\xcf\x80\xcf\x84\xce\xae\xcf\x81\xce\xb9\xce\xbf', '\xe2\x95\xac\xd0\xb0\xe2\x95\xac\xe2\x94\x90\xe2\x95\xac\xe2\x95\x97\xe2\x95\xac\xe2\x95\xa3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x95\xa1\xe2\x95\xac\xd0\xbf\xe2\x95\xac\xe2\x96\x92', '\xe2\x95\xac\xd0\xb3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x96\x92'] >>> import chardet >>> for w in words: print w, chardet.detect(w) ... 00021.wmv {'confidence': 1.0, 'encoding': 'ascii'} KET?? {'confidence': 0.8191677051323929, 'encoding': 'IBM855'} ???????????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} ???????????????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} ?????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} From tzot at sil-tec.gr Fri Dec 4 14:51:11 2009 From: tzot at sil-tec.gr (Christos Georgiou) Date: Fri, 04 Dec 2009 15:51:11 +0200 Subject: [PyAthens] Filename corruption In-Reply-To: <91ad5bf80912021649u79b22198ua7461f57335e7f55@mail.gmail.com> References: <91ad5bf80912021649u79b22198ua7461f57335e7f55@mail.gmail.com> Message-ID: <4B1913CF.3000000@sil-tec.gr> George Sakkis wrote: > ??? ????? ??????? ?? Python ?? ????????, ???? ???? ??? ????? ???????? > ??? encodings ??? unicode ???? ????????? ????????? ???? ?? ???? ?? > ??????? ??? ??????. > ???????? ????????? ????? ?????? ??? ??? ??????? directory (???? ????? > ???? extract ???? ??? ????? ???? ???? ???????). ?? ??????? ???????? ?? > ????????????????? ?? ?????????? ??????? ???? free > ?????????/?????/????? ??? ??????? meeting ;-) ?? ?????? ??? ?????? ????, ?? ?????! >>>> import os >>>> words = sorted(set(f.split()[0] for f in os.listdir('.'))) >>>> words > ['00021.wmv', > 'KET\xe2\x95\xac\xd0\xa8', > '\xce\x95\xcf\x80\xce\xb9\xcf\x83\xce\xba\xce\xb5\xcf\x80\xcf\x84\xce\xae\xcf\x81\xce\xb9\xce\xbf', > '\xe2\x95\xac\xd0\xb0\xe2\x95\xac\xe2\x94\x90\xe2\x95\xac\xe2\x95\x97\xe2\x95\xac\xe2\x95\xa3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x95\xa1\xe2\x95\xac\xd0\xbf\xe2\x95\xac\xe2\x96\x92', > '\xe2\x95\xac\xd0\xb3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x96\x92'] > ??? ?? ?????? ??? ???? ??? ????? ????? mount ?? UTF-8, ??? ?? ?????? ??? ???????? ?????????????? ???? UTF-8, ?,?? ?? ?? ????? ??? ????. ? ????? ??? ????? ????? ????? ?????? ??????, ?????? ??? ?????? ???? ?????? ?????? ????? ????????? ?????????????? ???? ??? ???? filename? ??? ??????????, ?? '\xe2\x95\xac\xd0\xb0\xe2\x95\xac\xe2\x94\x90\xe2\x95\xac\xe2\x95\x97\xe2\x95\xac\xe2\x95\xa3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x95\xa1\xe2\x95\xac\xd0\xbf\xe2\x95\xac\xe2\x96\x92' ????? >>> print s.decode('utf_8').encode('cp855', 'replace').decode('utf_8', replace') ????????? ????? ?? ?????, ?? ??????, ??? ?? ??? ????? ???????????? ?? filenames (?.?. ?? ?????????????????? ??? ?????? ?? ????? "???????? ??????? ???????? ??? ?????? ?????? ??? ???? ????????????? ??????.wmv"). ???? ??? ?? ???? ?? ??? ??? ?????? ??? ?????? ??? ???? ????? mount ?? UTF_8, ??? ???? ??? find . -print | gzip -9 >/tmp/filenames.gz , ?? ????? /tmp/filenames.gz ??????? ?? ????? ??? ?????? ??? ?? link. >>>> import chardet >>>> for w in words: print w, chardet.detect(w) > ... > 00021.wmv {'confidence': 1.0, 'encoding': 'ascii'} > KET?? {'confidence': 0.8191677051323929, 'encoding': 'IBM855'} > ???????????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} > ???????????????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} > ?????? {'confidence': 0.98999999999999999, 'encoding': 'utf-8'} > _______________________________________________ From dan at car.gr Fri Dec 4 15:49:42 2009 From: dan at car.gr (Daniel Dourvaris) Date: Fri, 4 Dec 2009 16:49:42 +0200 Subject: [PyAthens] Filename corruption In-Reply-To: <4B1913CF.3000000@sil-tec.gr> References: <91ad5bf80912021649u79b22198ua7461f57335e7f55@mail.gmail.com> <4B1913CF.3000000@sil-tec.gr> Message-ID: 2009/12/4 Christos Georgiou : > George Sakkis wrote: > ??? ?? ?????? ??? ???? ??? ????? ????? mount ?? UTF-8, ??? ?? ?????? ??? > ???????? ?????????????? ???? UTF-8, ?,?? ?? ?? ????? ??? ????. > > ? ????? ??? ????? ????? ????? ?????? ??????, ?????? ??? ?????? ???? ?????? > ?????? ????? ????????? ?????????????? ???? ??? ???? filename? > ??? ??????????, ?? > '\xe2\x95\xac\xd0\xb0\xe2\x95\xac\xe2\x94\x90\xe2\x95\xac\xe2\x95\x97\xe2\x95\xac\xe2\x95\xa3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x95\xa1\xe2\x95\xac\xd0\xbf\xe2\x95\xac\xe2\x96\x92' > ????? > >>>> print s.decode('utf_8').encode('cp855', 'replace').decode('utf_8', >>>> replace') > ????????? cp866 From andreoua at gmail.com Fri Dec 4 19:17:58 2009 From: andreoua at gmail.com (Andreas Andreou) Date: Fri, 4 Dec 2009 20:17:58 +0200 Subject: [PyAthens] =?iso-8859-7?b?0/Xm3vTn8+cg4+nhIDPnINP17dzt9Ofz5w==?= In-Reply-To: <4B0FBA99.1060607@gmail.com> References: <4B0FBA99.1060607@gmail.com> Message-ID: <745369620912041017h3c93f762if11e4656c04b42f@mail.gmail.com> Telika isxiei gia aurio? 2009/11/27 Aravanis Konstantinos : > ?? ?? ?????? ? 3? ????????? ?? ????? ?? ??????? 5/12...?! > -- > Aravanis Kostas / sbos-x > My web page: www.AravanisKostas.com > An easy way to learn Python: www.TasPython.eu > > _______________________________________________ > PyAthens mailing list > PyAthens at python.org > http://mail.python.org/mailman/listinfo/pyathens > -- Andreas Andreou - andyhot at apache.org - http://blog.andyhot.gr Tapestry / Tacos developer Open Source / JEE Consulting From george.sakkis at gmail.com Fri Dec 4 20:43:00 2009 From: george.sakkis at gmail.com (George Sakkis) Date: Fri, 4 Dec 2009 21:43:00 +0200 Subject: [PyAthens] Filename corruption In-Reply-To: <4B1913CF.3000000@sil-tec.gr> References: <91ad5bf80912021649u79b22198ua7461f57335e7f55@mail.gmail.com> <4B1913CF.3000000@sil-tec.gr> Message-ID: <91ad5bf80912041143u377237e2k18eec6a897d6ffd5@mail.gmail.com> 2009/12/4 Christos Georgiou : > ? ????? ??? ????? ????? ????? ?????? ??????, ?????? ??? ?????? ???? ?????? > ?????? ????? ????????? ?????????????? ???? ??? ???? filename? > ??? ??????????, ?? > '\xe2\x95\xac\xd0\xb0\xe2\x95\xac\xe2\x94\x90\xe2\x95\xac\xe2\x95\x97\xe2\x95\xac\xe2\x95\xa3\xe2\x95\xa7\xd0\x94\xe2\x95\xac\xe2\x95\xa1\xe2\x95\xac\xd0\xbf\xe2\x95\xac\xe2\x96\x92' > ????? > >>>> print s.decode('utf_8').encode('cp855', 'replace').decode('utf_8', >>>> replace') > ????????? ? ???????, ???? ????????? ?? ????????. > ????? ?? ?????, ?? ??????, ??? ?? ??? ????? ???????????? ?? filenames (?.?. > ?? ?????????????????? ??? ?????? ?? ????? "???????? ??????? ???????? ??? > ?????? ?????? ??? ???? ????????????? ??????.wmv"). ???? ??? ?? ???? ?? ??? > ??? ?????? ??? ?????? ??? ???? ????? mount ?? UTF_8, ??? ???? ??? find . > -print | gzip -9 >/tmp/filenames.gz , ?? ????? /tmp/filenames.gz ??????? ?? > ????? ??? ?????? ??? ?? link. ??? ????, ??? ?? ????? ???? ???????????? filename ??? ?????????? ??????? ;-) ?????? ??????? ??? http://www.datafilehost.com/download-754f9873.html ?? ?????? ?? ?? non ascii filenames (???? basenames ????? extension) ??? ?? ????????? ??? ?? ????????. ?????? ????? ????????? (???? ??? 1-2 ???????) ???? ?? ??????????? ????? (? ???? ??????) ????????. Btw ?? ?????? ??? ????? mission critical ???????, ????? ?? ????, ????? ????? ???? ?? ??? ????? ?????? ???????? ?? ?????? (? ??????? ??? ??? ?????? ???? ;-)).