How to test characters of a string

2QdxY4RzWzUUiLuE at potatochowder.com 2QdxY4RzWzUUiLuE at potatochowder.com
Wed Jun 8 14:13:16 EDT 2022


On 2022-06-09 at 03:18:56 +1000,
Chris Angelico <rosuav at gmail.com> wrote:

> On Thu, 9 Jun 2022 at 03:15, <2QdxY4RzWzUUiLuE at potatochowder.com> wrote:
> >
> > On 2022-06-08 at 08:07:40 -0000,
> > De ongekruisigde <ongekruisigde at news.eternal-september.org> wrote:
> >
> > > Depending on the problem a regular expression may be the much simpler
> > > solution. I love them for e.g. text parsing and use them all the time.
> > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > > like these:
> > >
> > >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> > >   avahi:x:997:996:avahi-daemon privilege separation user:/var/empty:/run/current-system/sw/bin/nologin
> > >   sshd:x:998:993:SSH privilege separation user:/var/empty:/run/current-system/sw/bin/nologin
> > >   geoclue:x:999:998:Geoinformation service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> > >
> > > Compare a regexp solution like this:
> > >
> > >   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
> > >   >>> print(g.groups())
> > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> > >
> > > to the code one would require to process it manually, with all the edge
> > > cases. The regexp surely reads much simpler (?).
> >
> > Uh...
> >
> >     >>> import pwd # https://docs.python.org/3/library/pwd.html
> >     >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> >     [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', pw_shell='/sbin/nologin')]
> 
> That's great if the lines are specifically coming from your system's
> own /etc/passwd, but not so much if you're trying to compare passwd
> files from different systems, where you simply have the files
> themselves.

In addition to pwent to get specific entries from the local password
database, POSIX has fpwent to get a specific entry from a stream that
looks like /etc/passwd.  So even POSIX agrees that if you think you have
to process this data manually, you're doing it wrong.  Python exposes
neither functon directly (at least not in the pwd module or the os
module; I didn't dig around or check PyPI).

IMO, higher level functions to process such data is way better than a
[insert your own adjective/expletive here] regular expression that
collects the pieces into numbered groups rather than labeled fields.
Readability counts.

Yes, absolutely, use a regular expression when all else fails.  Don't
forget to handle all the edge cases!  (I assume that sane OSes preclude
colons in paths that are likely to come up in the local password
database, but I don't know what happens, e.g., when there's a reason for
GECOS to contain a colon.)


More information about the Python-list mailing list