From ed at pythoncharmers.com Sun Mar 1 18:55:05 2020 From: ed at pythoncharmers.com (Ed Schofield) Date: Mon, 2 Mar 2020 10:55:05 +1100 Subject: [melbourne-pug] Melbourne Python meetup: tonight (Monday 2nd March) Message-ID: Hi everyone! We're looking forward to our March meeting of the Melbourne Python user group tonight, Monday 3rd February. Do come and join us! We have these three talks planned: *1. Duy Trinh: Visualization of weather and cancer data sets* Duy will present how she uses *Pandas* and *OpenPyXL* to organise complex data to develop interactive dashboards that allow visualisation of weather and cancer data sets. *2. Ed Schofield: Linear regression in Python* Linear regression is the simplest and most foundational machine learning technique. It is more powerful than people often assume. In this talk I'll present an overview of linear regression and how it's implemented in the Python packages S*tatsmodels* and S*cikit-Learn*. *3. Announcements and pizza* *When:* 5.45pm for mingling; talks from 6pm; pizza afterwards *Where:* Level 2, 17 Hardware Lane, Melbourne CBD *How to get there:* Walk 8 minutes from Flinders Street station or 5 minutes from Melbourne Central *Sponsorship:* many thanks to Outcome Life for providing the venue, Biarri for sponsoring pizzas, and Python Charmers for organisation and meetup sponsorship. *RSVP:* Please respond accurately on Meetup.com so we can track numbers: https://www.meetup.com/Melbourne-Python-Meetup-Group/ Do come along! We hope to see you there! :-) Best wishes, Ed -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From miked at dewhirst.com.au Sun Mar 8 01:30:18 2020 From: miked at dewhirst.com.au (Mike Dewhirst) Date: Sun, 8 Mar 2020 17:30:18 +1100 Subject: [melbourne-pug] Superscript chars are a pain Message-ID: <07304156-61a3-8e41-b7a7-af5726957ac3@dewhirst.com.au> I'm now exclusively Python 3.6+ thank heavens but ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 6500: invalid start byte It just so happens that is the superscript 3 character.? It also happens that superscript 3 displays correctly and works properly on Windows 10 but causes the above error on Ubuntu 18.04. I'm not paid enough to understand why - hence this email if anyone can help. My current pain is because I'm pumping data into a database (PostgreSQL) which needs such measures as 5?g/m? and Python hates me. I think there is a valid argument for the Python utf-8 codec to "special-case" subscript and superscript numeral unicode collisions with ASCII or whatever Windows 10 uses. That would cover maths and chemistry both. And save me a lot of pain. Thanks for any sympathy and many, many thanks for help on getting past this. Cheers Mike PS: I use superscript and subscript numbers all the time because I'm involved with chemical data. Here is how I usually deal with it ... from django.utils.encoding import smart_text from django.utils.safestring import mark_safe def subscript_to_ascii(raw=None): ??? """Swap subscript unicode chars into ordinary numbers for ??? synonym searches. ??? """ ??? formula = "" ??? clear = True ??? if raw is not None: ??????? # for char in str(raw): ??????? for char in raw: ??????????? if char == "[": ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? clear = True ??????????? if clear: ??????????????? if char == "\u2082": ??????????????????? char = "2" ??????????????? elif char == "\u2083": ??????????????????? char = "3" ??????????????? elif char == "\u2084": ??????????????????? char = "4" ??????????????? elif char == "\u2085": ??????????????????? char = "5" ??????????????? elif char == "\u2086": ??????????????????? char = "6" ??????????????? elif char == "\u2087": ??????????????????? char = "7" ??????????????? elif char == "\u2088": ??????????????????? char = "8" ??????????????? elif char == "\u2089": ??????????????????? char = "9" ??????????????? elif char == "\u2081": ??????????????????? char = "1" ??????????????? elif char == "\u2080": ??????????????????? char = "0" ??????????? formula += char ??? return smart_text(formula) def subscript(raw=None): ??? """Swap ordinary numbers for subscript unicode chars.""" ??? formula = "" ??? clear = True ??? if raw is not None: ??????? for char in raw: ??????????? if char == "[": ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? clear = True ??????????? if clear: ??????????????? if char == "2": ??????????????????? char = "\u2082" ??????????????? elif char == "3": ??????????????????? char = "\u2083" ??????????????? elif char == "4": ??????????????????? char = "\u2084" ??????????????? elif char == "5": ??????????????????? char = "\u2085" ??????????????? elif char == "6": ??????????????????? char = "\u2086" ??????????????? elif char == "7": ??????????????????? char = "\u2087" ??????????????? elif char == "8": ??????????????????? char = "\u2088" ??????????????? elif char == "9": ??????????????????? char = "\u2089" ??????????????? elif char == "1": ??????????????????? char = "\u2081" ??????????????? elif char == "0": ??????????????????? char = "\u2080" ??????????? formula += char ??? return smart_text(formula.encode("utf8")) lc50 = subscript(LC50) ld50 = subscript(LD50) def safesubscript(raw=None, ascii=False): ??? """Uses marksafe to subscript instead of unicode chars. This looks ??? better on screen but cannot be used in places. ??? """ ??? formula = "" ??? clear = True ??? if raw is not None: ??????? for char in raw: ??????????? if char == "[": ??????????????? # don"t process any more digits just add to formula ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? # start processing again ??????????????? clear = True ??????????? if clear: ??????????????? if char == "2" or char == "\u2082": ??????????????????? char = "2" ??????????????? elif char == "3" or char == "\u2083": ??????????????????? char = "3" ??????????????? elif char == "4" or char == "\u2084": ??????????????????? char = "4" ??????????????? elif char == "5" or char == "\u2085": ??????????????????? char = "5" ??????????????? elif char == "6" or char == "\u2086": ??????????????????? char = "6" ??????????????? elif char == "7" or char == "\u2087": ??????????????????? char = "7" ??????????????? elif char == "8" or char == "\u2088": ??????????????????? char = "8" ??????????????? elif char == "9" or char == "\u2089": ??????????????????? char = "9" ??????????????? elif char == "1" or char == "\u2081": ??????????????????? char = "1" ??????????????? elif char == "0" or char == "\u2080": ??????????????????? char = "0" ??????????? formula += char ??? if ascii: ??????? formula = formula.replace("", "").replace(" From miked at dewhirst.com.au Sun Mar 8 01:45:14 2020 From: miked at dewhirst.com.au (Mike Dewhirst) Date: Sun, 8 Mar 2020 17:45:14 +1100 Subject: [melbourne-pug] Superscript chars are a pain In-Reply-To: <07304156-61a3-8e41-b7a7-af5726957ac3@dewhirst.com.au> References: <07304156-61a3-8e41-b7a7-af5726957ac3@dewhirst.com.au> Message-ID: Oh well ... maybe it isn't Python's fault. I just looked at the data input file and found the ? character in all places had been turned into a box. When I edited the boxes back into ? it all went well. I used Filezilla to get the input files across so I'll focus on that next. Sorry to interrupt your long weekend. Cheers Mike On 8/03/2020 5:30 pm, Mike Dewhirst wrote: > I'm now exclusively Python 3.6+ thank heavens but ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position > 6500: invalid start byte > > It just so happens that is the superscript 3 character.? It also > happens that superscript 3 displays correctly and works properly on > Windows 10 but causes the above error on Ubuntu 18.04. I'm not paid > enough to understand why - hence this email if anyone can help. > > My current pain is because I'm pumping data into a database > (PostgreSQL) which needs such measures as 5?g/m? and Python hates me. > > I think there is a valid argument for the Python utf-8 codec to > "special-case" subscript and superscript numeral unicode collisions > with ASCII or whatever Windows 10 uses. That would cover maths and > chemistry both. And save me a lot of pain. > > Thanks for any sympathy and many, many thanks for help on getting past > this. > > Cheers > > Mike > > PS: I use superscript and subscript numbers all the time because I'm > involved with chemical data. Here is how I usually deal with it ... > > > > from django.utils.encoding import smart_text > from django.utils.safestring import mark_safe > > > def subscript_to_ascii(raw=None): > ??? """Swap subscript unicode chars into ordinary numbers for > ??? synonym searches. > ??? """ > ??? formula = "" > ??? clear = True > ??? if raw is not None: > ??????? # for char in str(raw): > ??????? for char in raw: > ??????????? if char == "[": > ??????????????? clear = False? # permits [1] footnote references > ??????????? elif char == "]": > ??????????????? clear = True > ??????????? if clear: > ??????????????? if char == "\u2082": > ??????????????????? char = "2" > ??????????????? elif char == "\u2083": > ??????????????????? char = "3" > ??????????????? elif char == "\u2084": > ??????????????????? char = "4" > ??????????????? elif char == "\u2085": > ??????????????????? char = "5" > ??????????????? elif char == "\u2086": > ??????????????????? char = "6" > ??????????????? elif char == "\u2087": > ??????????????????? char = "7" > ??????????????? elif char == "\u2088": > ??????????????????? char = "8" > ??????????????? elif char == "\u2089": > ??????????????????? char = "9" > ??????????????? elif char == "\u2081": > ??????????????????? char = "1" > ??????????????? elif char == "\u2080": > ??????????????????? char = "0" > ??????????? formula += char > ??? return smart_text(formula) > > > def subscript(raw=None): > ??? """Swap ordinary numbers for subscript unicode chars.""" > ??? formula = "" > ??? clear = True > ??? if raw is not None: > ??????? for char in raw: > ??????????? if char == "[": > ??????????????? clear = False? # permits [1] footnote references > ??????????? elif char == "]": > ??????????????? clear = True > ??????????? if clear: > ??????????????? if char == "2": > ??????????????????? char = "\u2082" > ??????????????? elif char == "3": > ??????????????????? char = "\u2083" > ??????????????? elif char == "4": > ??????????????????? char = "\u2084" > ??????????????? elif char == "5": > ??????????????????? char = "\u2085" > ??????????????? elif char == "6": > ??????????????????? char = "\u2086" > ??????????????? elif char == "7": > ??????????????????? char = "\u2087" > ??????????????? elif char == "8": > ??????????????????? char = "\u2088" > ??????????????? elif char == "9": > ??????????????????? char = "\u2089" > ??????????????? elif char == "1": > ??????????????????? char = "\u2081" > ??????????????? elif char == "0": > ??????????????????? char = "\u2080" > ??????????? formula += char > ??? return smart_text(formula.encode("utf8")) > > > lc50 = subscript(LC50) > ld50 = subscript(LD50) > > > def safesubscript(raw=None, ascii=False): > ??? """Uses marksafe to subscript instead of unicode chars. This looks > ??? better on screen but cannot be used in places. > ??? """ > ??? formula = "" > ??? clear = True > ??? if raw is not None: > ??????? for char in raw: > ??????????? if char == "[": > ??????????????? # don"t process any more digits just add to formula > ??????????????? clear = False? # permits [1] footnote references > ??????????? elif char == "]": > ??????????????? # start processing again > ??????????????? clear = True > ??????????? if clear: > ??????????????? if char == "2" or char == "\u2082": > ??????????????????? char = "2" > ??????????????? elif char == "3" or char == "\u2083": > ??????????????????? char = "3" > ??????????????? elif char == "4" or char == "\u2084": > ??????????????????? char = "4" > ??????????????? elif char == "5" or char == "\u2085": > ??????????????????? char = "5" > ??????????????? elif char == "6" or char == "\u2086": > ??????????????????? char = "6" > ??????????????? elif char == "7" or char == "\u2087": > ??????????????????? char = "7" > ??????????????? elif char == "8" or char == "\u2088": > ??????????????????? char = "8" > ??????????????? elif char == "9" or char == "\u2089": > ??????????????????? char = "9" > ??????????????? elif char == "1" or char == "\u2081": > ??????????????????? char = "1" > ??????????????? elif char == "0" or char == "\u2080": > ??????????????????? char = "0" > ??????????? formula += char > ??? if ascii: > ??????? formula = formula.replace("", "").replace(" ??? return mark_safe(smart_text(formula)) > > > > > > > > > > > _______________________________________________ > melbourne-pug mailing list > melbourne-pug at python.org > https://mail.python.org/mailman/listinfo/melbourne-pug -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at montagesoftware.com.au Sun Mar 8 04:31:34 2020 From: dave at montagesoftware.com.au (David Micallef) Date: Sun, 8 Mar 2020 19:31:34 +1100 Subject: [melbourne-pug] Superscript chars are a pain In-Reply-To: References: <07304156-61a3-8e41-b7a7-af5726957ac3@dewhirst.com.au> Message-ID: Hi Mike I could be missing something though is there an opportunity to set the encoding when your reading the file. The default is utf-8 though you can set enociding to be the actual encoding of the file that you are reading. These file could be ISO-8859-1 or another variant. Cheers Dave On Sun, 8 Mar 2020 at 17:45, Mike Dewhirst wrote: > Oh well ... maybe it isn't Python's fault. I just looked at the data input > file and found the ? character in all places had been turned into a box. > When I edited the boxes back into ? it all went well. > > I used Filezilla to get the input files across so I'll focus on that next. > > Sorry to interrupt your long weekend. > > Cheers > > Mike > > > On 8/03/2020 5:30 pm, Mike Dewhirst wrote: > > I'm now exclusively Python 3.6+ thank heavens but ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 6500: > invalid start byte > > It just so happens that is the superscript 3 character. It also happens > that superscript 3 displays correctly and works properly on Windows 10 but > causes the above error on Ubuntu 18.04. I'm not paid enough to understand > why - hence this email if anyone can help. > > My current pain is because I'm pumping data into a database (PostgreSQL) > which needs such measures as 5?g/m? and Python hates me. > > I think there is a valid argument for the Python utf-8 codec to > "special-case" subscript and superscript numeral unicode collisions with > ASCII or whatever Windows 10 uses. That would cover maths and chemistry > both. And save me a lot of pain. > > Thanks for any sympathy and many, many thanks for help on getting past > this. > > Cheers > > Mike > > PS: I use superscript and subscript numbers all the time because I'm > involved with chemical data. Here is how I usually deal with it ... > > > > from django.utils.encoding import smart_text > from django.utils.safestring import mark_safe > > > def subscript_to_ascii(raw=None): > """Swap subscript unicode chars into ordinary numbers for > synonym searches. > """ > formula = "" > clear = True > if raw is not None: > # for char in str(raw): > for char in raw: > if char == "[": > clear = False # permits [1] footnote references > elif char == "]": > clear = True > if clear: > if char == "\u2082": > char = "2" > elif char == "\u2083": > char = "3" > elif char == "\u2084": > char = "4" > elif char == "\u2085": > char = "5" > elif char == "\u2086": > char = "6" > elif char == "\u2087": > char = "7" > elif char == "\u2088": > char = "8" > elif char == "\u2089": > char = "9" > elif char == "\u2081": > char = "1" > elif char == "\u2080": > char = "0" > formula += char > return smart_text(formula) > > > def subscript(raw=None): > """Swap ordinary numbers for subscript unicode chars.""" > formula = "" > clear = True > if raw is not None: > for char in raw: > if char == "[": > clear = False # permits [1] footnote references > elif char == "]": > clear = True > if clear: > if char == "2": > char = "\u2082" > elif char == "3": > char = "\u2083" > elif char == "4": > char = "\u2084" > elif char == "5": > char = "\u2085" > elif char == "6": > char = "\u2086" > elif char == "7": > char = "\u2087" > elif char == "8": > char = "\u2088" > elif char == "9": > char = "\u2089" > elif char == "1": > char = "\u2081" > elif char == "0": > char = "\u2080" > formula += char > return smart_text(formula.encode("utf8")) > > > lc50 = subscript(LC50) > ld50 = subscript(LD50) > > > def safesubscript(raw=None, ascii=False): > """Uses marksafe to subscript instead of unicode chars. This looks > better on screen but cannot be used in places. > """ > formula = "" > clear = True > if raw is not None: > for char in raw: > if char == "[": > # don"t process any more digits just add to formula > clear = False # permits [1] footnote references > elif char == "]": > # start processing again > clear = True > if clear: > if char == "2" or char == "\u2082": > char = "2" > elif char == "3" or char == "\u2083": > char = "3" > elif char == "4" or char == "\u2084": > char = "4" > elif char == "5" or char == "\u2085": > char = "5" > elif char == "6" or char == "\u2086": > char = "6" > elif char == "7" or char == "\u2087": > char = "7" > elif char == "8" or char == "\u2088": > char = "8" > elif char == "9" or char == "\u2089": > char = "9" > elif char == "1" or char == "\u2081": > char = "1" > elif char == "0" or char == "\u2080": > char = "0" > formula += char > if ascii: > formula = formula.replace("", "").replace(" return mark_safe(smart_text(formula)) > > > > > > > > > > > _______________________________________________ > melbourne-pug mailing listmelbourne-pug at python.orghttps://mail.python.org/mailman/listinfo/melbourne-pug > > > _______________________________________________ > melbourne-pug mailing list > melbourne-pug at python.org > https://mail.python.org/mailman/listinfo/melbourne-pug > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miked at dewhirst.com.au Fri Mar 13 09:01:28 2020 From: miked at dewhirst.com.au (Mike Dewhirst) Date: Sat, 14 Mar 2020 00:01:28 +1100 Subject: [melbourne-pug] Superscript chars are a pain In-Reply-To: Message-ID: <48f5S46plpznfRG@mail.python.org> DavidThanks for responding. And sorry to all - I should have reported that Javier solved my problem.Windows 10 has some sort of gotcha with utf16 which it automatically imposes if you don't watch closely. Just transferring a utf8 file using ftp does it.?The solution was to edit the file after transfer to Linux and save again as utf8.Microsoft is the pain, not superscripts.CheersMikePs. This android phone top-posts automatically and I can't be bothered figuring out how to defeat that.? -------- Original message --------From: David Micallef Date: 8/3/20 19:31 (GMT+10:00) To: Melbourne Python Users Group Subject: Re: [melbourne-pug] Superscript chars are a pain Hi MikeI could be missing something though is there an opportunity?to set the encoding when your reading the file. The default is utf-8 though you can set enociding to be the actual encoding of the file that you are reading. These file could be?ISO-8859-1 or another variant.CheersDaveOn Sun, 8 Mar 2020 at 17:45, Mike Dewhirst wrote: Oh well ... maybe it isn't Python's fault. I just looked at the data input file and found the ? character in all places had been turned into a box. When I edited the boxes back into ? it all went well. I used Filezilla to get the input files across so I'll focus on that next. Sorry to interrupt your long weekend. Cheers Mike On 8/03/2020 5:30 pm, Mike Dewhirst wrote: I'm now exclusively Python 3.6+ thank heavens but ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 6500: invalid start byte It just so happens that is the superscript 3 character.? It also happens that superscript 3 displays correctly and works properly on Windows 10 but causes the above error on Ubuntu 18.04. I'm not paid enough to understand why - hence this email if anyone can help. My current pain is because I'm pumping data into a database (PostgreSQL) which needs such measures as 5?g/m? and Python hates me. I think there is a valid argument for the Python utf-8 codec to "special-case" subscript and superscript numeral unicode collisions with ASCII or whatever Windows 10 uses. That would cover maths and chemistry both. And save me a lot of pain. Thanks for any sympathy and many, many thanks for help on getting past this. Cheers Mike PS: I use superscript and subscript numbers all the time because I'm involved with chemical data. Here is how I usually deal with it ... ? from django.utils.encoding import smart_text from django.utils.safestring import mark_safe def subscript_to_ascii(raw=None): ??? """Swap subscript unicode chars into ordinary numbers for ??? synonym searches. ??? """ ??? formula = "" ??? clear = True ??? if raw is not None: ??????? # for char in str(raw): ??????? for char in raw: ??????????? if char == "[": ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? clear = True ??????????? if clear: ??????????????? if char == "\u2082": ??????????????????? char = "2" ??????????????? elif char == "\u2083": ??????????????????? char = "3" ??????????????? elif char == "\u2084": ??????????????????? char = "4" ??????????????? elif char == "\u2085": ??????????????????? char = "5" ??????????????? elif char == "\u2086": ??????????????????? char = "6" ??????????????? elif char == "\u2087": ??????????????????? char = "7" ??????????????? elif char == "\u2088": ??????????????????? char = "8" ??????????????? elif char == "\u2089": ??????????????????? char = "9" ??????????????? elif char == "\u2081": ??????????????????? char = "1" ??????????????? elif char == "\u2080": ??????????????????? char = "0" ??????????? formula += char ??? return smart_text(formula) def subscript(raw=None): ??? """Swap ordinary numbers for subscript unicode chars.""" ??? formula = "" ??? clear = True ??? if raw is not None: ??????? for char in raw: ??????????? if char == "[": ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? clear = True ??????????? if clear: ??????????????? if char == "2": ??????????????????? char = "\u2082" ??????????????? elif char == "3": ??????????????????? char = "\u2083" ??????????????? elif char == "4": ??????????????????? char = "\u2084" ??????????????? elif char == "5": ??????????????????? char = "\u2085" ??????????????? elif char == "6": ??????????????????? char = "\u2086" ??????????????? elif char == "7": ??????????????????? char = "\u2087" ??????????????? elif char == "8": ??????????????????? char = "\u2088" ??????????????? elif char == "9": ??????????????????? char = "\u2089" ??????????????? elif char == "1": ??????????????????? char = "\u2081" ??????????????? elif char == "0": ??????????????????? char = "\u2080" ??????????? formula += char ??? return smart_text(formula.encode("utf8")) lc50 = subscript(LC50) ld50 = subscript(LD50) def safesubscript(raw=None, ascii=False): ??? """Uses marksafe to subscript instead of unicode chars. This looks ??? better on screen but cannot be used in places. ??? """ ??? formula = "" ??? clear = True ??? if raw is not None: ??????? for char in raw: ??????????? if char == "[": ??????????????? # don"t process any more digits just add to formula ??????????????? clear = False? # permits [1] footnote references ??????????? elif char == "]": ??????????????? # start processing again ??????????????? clear = True ??????????? if clear: ??????????????? if char == "2" or char == "\u2082": ??????????????????? char = "2" ??????????????? elif char == "3" or char == "\u2083": ??????????????????? char = "3" ??????????????? elif char == "4" or char == "\u2084": ??????????????????? char = "4" ??????????????? elif char == "5" or char == "\u2085": ??????????????????? char = "5" ??????????????? elif char == "6" or char == "\u2086": ??????????????????? char = "6" ??????????????? elif char == "7" or char == "\u2087": ??????????????????? char = "7" ??????????????? elif char == "8" or char == "\u2088": ??????????????????? char = "8" ??????????????? elif char == "9" or char == "\u2089": ??????????????????? char = "9" ??????????????? elif char == "1" or char == "\u2081": ??????????????????? char = "1" ??????????????? elif char == "0" or char == "\u2080": ??????????????????? char = "0" ??????????? formula += char ??? if ascii: ??????? formula = formula.replace("", "").replace(" From ed at pythoncharmers.com Tue Mar 17 23:49:56 2020 From: ed at pythoncharmers.com (Ed Schofield) Date: Wed, 18 Mar 2020 14:49:56 +1100 Subject: [melbourne-pug] April Python meetup cancelled Message-ID: Hi everyone! We're going to cancel the next Python meetup (which was scheduled for Monday 6th April) as a precaution against Covid-19. We'll monitor the situation and keep you posted about future meetups. We hope to see you again soon! Stay well, Ed -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com -------------- next part -------------- An HTML attachment was scrubbed... URL: