From jeffrey.fischer at gmail.com Fri Sep 16 16:07:46 2022 From: jeffrey.fischer at gmail.com (Jeff Fischer) Date: Fri, 16 Sep 2022 13:07:46 -0700 Subject: [Baypiggies] BayPiggies meeting next Thursday (Sept 22): Debugging, Scraping, and NLP Message-ID: BayPiggies Sept 22, 2022 7:00 pm - 8:30 pm PDT (online) This month, we'll have a lightning talk from Ryan Kuhl on debugging and a full talk from Stephen McInerney on Web scraping and NLP. We hope that you can join us! *Lightning Talk: Debugging with ipdb* *Speaker:* Ryan Kuhl *Speaker Bio:* Ryan is a Miami based software engineer at Tatari, co-founder of Public Sector ML, and student at Georgia Institute of Technology. Ryan has been programming professionally with python for 9 years and loves to build performant APIs and chunky SQL queries! When not programming for work he's studying machine learning and quantum computing. Connect to Ryan via email at ryan at kuhl.dev, LinkedIn at linkedin.com/in/kuhl or GitHub at GitHub.com/lame. *Main Talk: NLP, Topic Modeling and Scraping of conference talks to find which topics are hot and not* *Speaker:* Stephen McInerney NLP (Natural Language Processing) and Topic Modeling are subdomains of Machine Learning which are core technologies for Python data scientists; and the automated collection of data by Scraping (in a TOS-compliant, ethical way) is a rarely-discussed practice. Outline: - Review the basic steps, present a typical pipeline for Scraping+NLP+Topic Modeling and cover packages used - As a motivating example, we investigate changes in Python conference topics 2016-2022, and statistically extract conclusions on what's hot and not, as of 2022 - We also handle foreign-language abstracts and outline how machine translation can be used for Topic Modeling - We illustrate best practices in Scraping on text data, maximally preserving and augmenting with metadata - Review the basic steps, present a typical pipeline (segmentation, handling Unicode, Levenshtein distance, word-vectors, Transformer, NER, IE). - Overview of related NLP/ML/Deep Learning packages we use both for prototyping and production. - Topic Modeling using LDA is a highly iterative clustering process to "learn" which topics seem to be similar/related/identical/different - In this specific case, we augment conference abstracts with whatever metadata is helpful to topic-modeling e.g. speaker interests, affiliation, links to Twitter - Example: "token" means an entirely different topic when it co-occurs with "crypto"/"blockchain"/"web3" versus when it co-occurs with "API"/"authentication"/"appsec"/"2FA"/"Oauth". But how do we automatically learn hundreds and then thousands of such cases? *Speaker Bio:* Stephen McInerney Data scientist and NLP specialist for over a decade, specializing in domain-specific (biotech/legal/financial) and multilingual NLP, in both startups and large companies. Kaggle competitor; have led "Kaggle Together" classes. Former Data Science co-chair of SF Bay Area ACM and organizer of multiple Data Science Camps. Passionate about open-source. www.linkedin.com/in/stephenmcinerney *RSVP* We will conduct the meeting via Zoom meeting. To RSVP, go to https://www.meetup.com/baypiggies/events/288471326/. When you RSVP "Yes" to this event, the link to the Zoom meeting will become visible in MeetUp. *Code of Conduct* https://baypiggies.net/pages/code_of_conduct.html Interactions online have less nuance than in-person interactions. Please be Open, Considerate and Respectful. Also, please refrain from discussing topics unrelated to the Python community or the technical content of the meeting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffrey.fischer at gmail.com Mon Sep 19 16:55:37 2022 From: jeffrey.fischer at gmail.com (Jeff Fischer) Date: Mon, 19 Sep 2022 13:55:37 -0700 Subject: [Baypiggies] An upcoming talk on "productionizing Pandas" Message-ID: Hi everyone, Tomorrow (Tuesday) night, an engineering team from my company (C3.ai) is giving a talk on technology they developed to use Pandas as the basis of production machine learning (rather than rewriting to something like SparkSQL). If you are interested, here is the link on Meetup: https://www.meetup.com/c3-ai-enterprise-ai/events/288213225/ Thanks, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen at glenjarvis.com Fri Sep 23 00:30:40 2022 From: glen at glenjarvis.com (Glen Jarvis) Date: Fri, 23 Sep 2022 04:30:40 +0000 Subject: [Baypiggies] Zoom bombing / apologies Message-ID: There was an individual who Zoom bombed us tonight for our meeting. I'm usually good at muting stray microphones, kicking bad users (usually before they get disruptive), spotlighting the speakers so their camera shows on the video, etc. But, whoever was doing this Zoom bomb was able to elude me, unfortunately. They masked their activity as another user (so it was harder to kick them), they were able to get audio when it was disabled, etc. I also was removing the screen annotations as soon as they were being put up -- but, they were able to keep putting them up. I want to deeply apologize as, at least once, there was something written with a Zoom annotation that wasn't just juvenile but was offensive. We ended the meeting early. Why don't we use Webinar Format? Because many of our members originally did not? like the idea of registering their identity just to attend a meeting as well as signing NDAs when we were in the physical world, I've been trying to respect that as much as possible in the virtual world. Real meetings are also more interactive and engaging. However, because of this event, we may be forced to require registrations and go back to Webinar format. I have an open ticket with Zoom support (#15460891) for a root cause analysis and security suggestions. It is always a struggle to strike that real balance between a completely open environment and enforcing good behavior. Some of our original open source systems were high trust and assumed good behavior. Rarely was it wrong. Kindest Regards, Glen Jarvis -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen at glenjarvis.com Fri Sep 23 14:08:58 2022 From: glen at glenjarvis.com (Glen Jarvis) Date: Fri, 23 Sep 2022 18:08:58 +0000 Subject: [Baypiggies] Zoom bombing / apologies In-Reply-To: References: Message-ID: <3Adspyq4BSGTGXKT8jSogYrGGYEBEa2pH3rhKLMCJWgkidOCY7ZW8FI2brX_wD1Q0BJG8nLg9A6axyrgAMaz89aPqshDalBLlLewsSIxAQc=@glenjarvis.com> After this was all over last night, trying to make sure the speakers were okay, trying to make sure the other organizers were okay, trying to make sure the audience was okay, reviewing security settings, opening tickets with Zoom, etc, I realized that I wasn't feeling so hot myself. This morning, I picked up an old classic that I love "Daring Greatly" and I suddenly remembered "The Man in the Arena" speech. I err and come up short again and again. But, I do it daring greatly :) I've modified it to be gender neutral: > ?It is not the critic who counts, not the one who points out how the strong person stumbled or how the doer of deeds might have done them better. The credit belongs to the one who is actually in the arena, whose face is marred with sweat and dust and blood; who strives valiantly; who errs and comes up short again and again; who knows the great enthusiasms, the great devotions, and spends oneself in a worthy cause; who, if he or she wins, knows the triumph of high achievement; and who, if fails, at least fails while daring greatly, so that his or her place shall never be with those cold and timid souls who know neither victory nor defeat.? Kindest Regards, Glen ------- Original Message ------- On Thursday, September 22nd, 2022 at 9:30 PM, Glen Jarvis via Baypiggies wrote: > There was an individual who Zoom bombed us tonight for our meeting. I'm usually good at muting stray microphones, kicking bad users (usually before they get disruptive), spotlighting the speakers so their camera shows on the video, etc. > > But, whoever was doing this Zoom bomb was able to elude me, unfortunately. They masked their activity as another user (so it was harder to kick them), they were able to get audio when it was disabled, etc. I also was removing the screen annotations as soon as they were being put up -- but, they were able to keep putting them up. > > I want to deeply apologize as, at least once, there was something written with a Zoom annotation that wasn't just juvenile but was offensive. We ended the meeting early. > > Why don't we use Webinar Format? > > Because many of our members originally did not? like the idea of registering their identity just to attend a meeting as well as signing NDAs when we were in the physical world, I've been trying to respect that as much as possible in the virtual world. Real meetings are also more interactive and engaging. > > However, because of this event, we may be forced to require registrations and go back to Webinar format. I have an open ticket with Zoom support (#15460891) for a root cause analysis and security suggestions. > > It is always a struggle to strike that real balance between a completely open environment and enforcing good behavior. Some of our original open source systems were high trust and assumed good behavior. Rarely was it wrong. > > Kindest Regards, > > Glen Jarvis -------------- next part -------------- An HTML attachment was scrubbed... URL: