Newbie: Check first two non-whitespace characters

Jussi Piitulainen harvesting at is.invalid
Fri Jan 1 03:16:29 EST 2016


otaksoftspamtrap at gmail.com writes:

> I need to check a string over which I have no control for the first 2
> non-white space characters (which should be '[{').
>
> The string would ideally be: '[{...' but could also be something like 
> '  [  {  ....'.
>
> Best to use re and how? Something else?

No comment on whether re is good for your use case but another comment
on how. First, some test data:

  >>> data = '\r\n  {\r\n\t[ "etc" ]}\n\n\n')

Then the actual comment - there's a special regex type, \S, to match a
non-whitespace character, and a method to produce matches on demand:

  >>> black = re.compile(r'\S')
  >>> matches = re.finditer(black, data)

Then the demonstration. This accesses the first, then second, then third
match:

  >>> empty = re.match('', '')
  >>> next(matches, empty).group()
  '{'
  >>> next(matches, empty).group()
  '['
  >>> next(matches, empty).group()
  '"'

The empty match object provides an appropriate .group() when there is no
first or second (and so on) non-whitespace character in the data:

  >>> matches = re.finditer(black, '\r\t\n')
  >>> next(matches, empty).group()
  ''
  >>> next(matches, empty).group()
  ''



More information about the Python-list mailing list