Is there a way to subtract 3 from every digit of a number?

Stestagg stestagg at gmail.com
Sun Feb 21 11:02:39 EST 2021


 With numpy and pandas, it's almost always best to start off with the
simple, obvious solution.

In your case, I would recommend defining a function, and calling
`Series.map`, passing in that function.

Sometimes, however, with large datasets, it's possible to use some
pandas/numpy tricks to significantly speed up execution.

Here's an example that runs about 5x faster than the previous example on a
1-million item array. (non-scientific testing).  The downside of this
approach is you have to do some mucking about with numpy's data types to do
it, which can end up with hard-to-read solutions:

def sub3(series):
  chars = series.values.astype(str)
  intvals = chars.view('int8')
  subbed = np.mod(intvals - ord('0') - 3, 10) + ord('0')
  subbed_chars = np.where(intvals, subbed, intvals).view(chars.dtype)
  return pd.Series(subbed_chars)

For example, let's create a series with 1 million items:

num_lengths = np.power(10, np.random.randint(1, 7, size=1_000_000))
test_series = pd.Series(np.random.randint(0, num_lengths,
size=1_000_000)).astype(str)

calling:  sub3(test_series)
takes ~ 600ms and returns the array.

Whereas the lambda technique takes 3s for me:

test_series.map(lambda a: int("".join(map(lambda x: str((int(x)-3)%10)
,list(str(a))))))

How the numpy approach works:

1. series.values returns a numpy array of the series data
2. .astype(str) - makes numpy turn the array into a native string array (as
opposed to an array of python string onbjects).  Numpy arrays are
fixed-width (dynamically chosen width to hold the largest string in the
array), so each element is zero-byte padded to be the correct width.  In
this case, this is fine, as we're dealing with numbers
3. chars.view('int8') - creates a zero-copy view of the array, with each
byte as its own element. I'm assuming you're using ascii numbers here, so
this is safe.  For example the digit '1' will be represented as an array
element with value 49 (ascii '1').
4. To convert, say, '49' to '1', we can subtract the ascii value for '0'
(48) from each element.  Most mathematical operations in numpy are
performed element-wise, so for example 'my_array - 3' subtracts 3 from each
item
5. The actual per-character maths is done using element-wise operations,
before we add the ord('0') back to each number to convert the decimal value
back to equivalent ascii character.
6. Now we have a large 1-d array of the correctly adjusted digits, but we
need to reconstruct the original joined-up numbers.
7. np.where(intvals, a, b) is a really nice numpy builtin:
https://numpy.org/doc/stable/reference/generated/numpy.where.html.  It's
used here to put back the 'zero byte' padding values from the original
numpy array
8. .view(chars.dtype) creates a fixed-size string view, of the corect
dimensions based on the original chars array.
9. Finally convert the numpy array back to a pandas series object and
return.

On Sun, Feb 21, 2021 at 3:37 PM Avi Gross via Python-list <
python-list at python.org> wrote:

> Ah, that is an interesting, Mike,  but not an informative answer. My
> question is where the specific problem came from.  Yes, someone used to R
> and coming to Python might work at adjusting to what is different and how
> to
> get things done. I do that all the time as one hobby is learning lots of
> languages and assessing them against each other.
>
> So care to share your solution in R which I would assume you could do
> easily?
>
> My quick and dirty attempt, of no interest to the python community, using
> the R form of integer that has a maximum, and the pipe operator that will
> not be in standard R for another iteration, is this:
>
> ## R code using pipes to convert an integer
> ## to another integer by subtracting 3
> ## from each digit and wrapping around from
> ## 2 to 9 and so on, meaning modulo 10
>
> ## Load libraries to be used
> library(dplyr)
>
> ## define function to return subtraction by N mod 10
> rotdown <- function(dig, by=3) (dig -by) %% 10
>
> start <- 123456789L
>
> ## Using pipes that send output between operators
>
> start %>%
>   as.character %>%
>   strsplit(split="") %>%
>   unlist %>%
>   as.integer %>%
>   rotdown %>%
>   as.character %>%
>   paste(collapse="") %>%
>   as.integer
>
> When run:
>
>   > start %>%
>   +   as.character %>%
>   +   strsplit(split="") %>%
>   +   unlist %>%
>   +   as.integer %>%
>   +   rotdown %>%
>   +   as.character %>%
>   +   paste(collapse="") %>%
>   +   as.integer
> [1] 890123456
>
> The above is not meant to be efficient and I could do better if I take more
> than a few minutes but is straightforward and uses the vectorized approach
> so no obvious loops are needed.
>
>
>
>
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
> Behalf Of C W
> Sent: Sunday, February 21, 2021 9:48 AM
> To: Chris Angelico <rosuav at gmail.com>
> Cc: Python <python-list at python.org>
> Subject: Re: Is there a way to subtract 3 from every digit of a number?
>
> Hey Avi,
>
> I am a long time R user now using Python. So, this is my attempt to master
> the language.
>
> The problem for me is that I often have an idea about how things are done
> in
> R, but not sure to what functions are available in Python.
>
> I hope that clears up some confusion.
>
> Cheer!
>
> On Sun, Feb 21, 2021 at 9:44 AM Chris Angelico <rosuav at gmail.com> wrote:
>
> > On Mon, Feb 22, 2021 at 1:39 AM Avi Gross via Python-list
> > <python-list at python.org> wrote:
> > > But you just moved the goalpost by talking about using a data.frame
> > > as
> > that
> > > (and I assume numpy and pandas) are not very basic Python.
> >
> > Given that the original post mentioned a pd.Series, I don't know how
> > far the goalposts actually moved :)
> >
> > ChrisA
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


More information about the Python-list mailing list