[Pandas-dev] pandas new infrastructure (OVH donation)

Marc Garcia garcia.marc at gmail.com
Tue Nov 15 03:05:34 EST 2022


Quick update about the new infrastructure.

- New hosting for the website seems to be working just fine, no issues
detected. I just stopped nginx in the old server, in case there is anything
there still being used we hopefully realize. But if there are no issues and
no objections, I'll be switching off the server in few days.

- We should be able to start using dedicated hardware for the benchmarks
from our OVH cloud account in December. It'll work as regular cloud
instances, but with dedicated servers. We'll be doing some tests to try to
get more stability in the benchmarks, and hopefully we can get something
even better than until now when the OVH hardware is ready.

On Thu, Nov 10, 2022 at 9:31 PM Marc Garcia <garcia.marc at gmail.com> wrote:

> Oh, I forgot we were not using the rendered asv website from the old
> server. We're using nginx, so I can easily make pandas.pydata.org/speed
> show the content from that url. But I guess we can also check them directly
> in the github pages url, not sure if it makes a difference.
>
> Let me know if it's useful, and I'll set it up. Thanks for the info!
>
> On Thu, Nov 10, 2022, 21:18 Richard Shadrach <rhshadrach at gmail.com> wrote:
>
>> > Besides the open PR, the only missing thing are the benchmarks at (
>> pandas.pydata.org/speed). The link is not working now, since I didn't
>> move the benchmarks yet. But before moving this, we should also make the
>> changes in the benchmarks repo, so benchmark results start to synchronize
>> with the new server. Can someone with access to the server take care of it
>> please (DM for the new server info).
>>
>> The link https://asv-runner.github.io/asv-collection/pandas/ is being
>> automatically updated. Can we point to this URL for now, given that we may
>> be changing how the benchmarks are run? If it's desirable to have the
>> benchmarks results on the docs server and our current solution is deemed to
>> be the long term one, I can work on the synchronization. However I'm
>> resistant to putting in that work if it's just going to go away given the
>> easier solution.
>>
>> Best,
>> Richard
>>
>>
>> On Wed, Nov 9, 2022 at 11:50 PM Marc Garcia <garcia.marc at gmail.com>
>> wrote:
>>
>>> Some updates (the ones shared in yesterday's call, and some new ones.
>>>
>>> The cloud (bucket) storage didn't seem convenient for different reasons,
>>> so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores,
>>> 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance,
>>> and since we've just got static file serving, the instance seems to be more
>>> than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage
>>> in the time I've been monitoring the resources). I've got a PR open
>>> (#49614) to start syncing our web/docs with the new server. In few hours
>>> I'll stop the nginx in the old server (I confirmed there is no traffic
>>> already, since we use cloudflare our dns changes are immediate). And in few
>>> days I'll switch off the instance in rackspace.
>>>
>>> Besides the open PR, the only missing thing are the benchmarks at (
>>> pandas.pydata.org/speed). The link is not working now, since I didn't
>>> move the benchmarks yet. But before moving this, we should also make the
>>> changes in the benchmarks repo, so benchmark results start to synchronize
>>> with the new server. Can someone with access to the server take care of it
>>> please (DM for the new server info).
>>>
>>> On running the benchmarks in OVH, the VM instances don't seem to be
>>> stable enough to keep track of performance over time, as it was likely.
>>> Full results of the tests I did are in this repo:
>>> https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is
>>> checking the best way to give us access to dedicated hardware, will
>>> continue with that once we've got it. In parallel to that, I'm planning to
>>> do some tests to see if it could be feasible to use valgrind's cachegrind
>>> (or equivalent) to instead of monitor time, we monitor CPU cycles. That
>>> should make benchmarking much easier and faster, as any hardware would
>>> work, and benchmarks could be run in parallel. With a dedicated server
>>> we're likely to only be able to use a single core to have stable results,
>>> which means that we can only run one benchmark suite per server every 3
>>> hours. But implementing it can be tricky.
>>>
>>> About CIrun, as you say Joris, it's like a middle man between our
>>> hardware (the OVH openstack API to create/delete instances) and GitHub
>>> actions. We need to add an extra yaml file with the CIrun configuration,
>>> and other than that we should be able to use OVH hardware directly from our
>>> current CI jobs without changes (except one entry to say what instance we
>>> want to use for the jobs running in OVH I assume).
>>>
>>> Please let me know of any feedback. In particular if you see any problem
>>> with our website that could be caused by the migration.
>>>
>>> Cheers,
>>>
>>> On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche <
>>> jorisvandenbossche at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc at gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> pandas has received a donation from OVHcloud
>>>>> <https://www.ovhcloud.com/> to support the project infrastructure,
>>>>> with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a
>>>>> period of one year). OVH is open to sponsor longer term and also other
>>>>> projects of the ecosystem (or NumFOCUS as a whole), but we started with
>>>>> this to have feedback at a smaller scale first.
>>>>>
>>>>> The credits will be used initially for:
>>>>> - Hosting of the pandas website
>>>>> - Running the pandas benchmarks
>>>>> - Speeding up the project CI
>>>>>
>>>>> I detail next what I have in mind to set up for each. If anyone is
>>>>> interested in getting involved, or has ideas, comments... please let me
>>>>> know. I'll publish updates here as there is progress on this.
>>>>>
>>>>>
>>>>> Website: I'm planning to experiment on splitting the website in two
>>>>> (it'll be transparent for users). The website and the stable docs which
>>>>> receive most of the traffic can probably be stored in Cloudflare pages.
>>>>> We're already using Cloudflare as a CDN, so instead of using it as a cache,
>>>>> we can publish the documents there. The rest of the docs (old versions and
>>>>> the dev version) can be hosted in bucket storage of the OVHcloud. Response
>>>>> times may be a bit slower, but our website is bigger than the Cloudflare
>>>>> quota, and having old docs rarely accessed in a CDN seems unnecessary
>>>>> anyway.
>>>>>
>>>>
>>>> Splitting like that makes sense! (_if_ it is within quota, we could
>>>> maybe consider keeping the dev docs, and only move old docs to bucket
>>>> storage?)
>>>>
>>>>
>>>>>
>>>>> - Benchmarks: OVHcloud instances have guaranteed hardware, and we'll
>>>>> be checking if this is enough for the results of the benchmarks to be
>>>>> consistent over runs, or if there is too much variability and we need
>>>>> dedicated hardware. If consistency is good enough that would be great,
>>>>> since our benchmarks mostly use one core, and using dedicated hardware is
>>>>> likely to be a decent waste of resources, since most servers will likely
>>>>> have 16 cores or more. We'll discuss with OVH if dedicated hardware is
>>>>> needed, as at the moment their public cloud doesn't offer it (there is an
>>>>> alpha for providing dedicated instances, but we need to check with them).
>>>>>
>>>>> - Faster CI: Our GitHub runners are small, and most builds take around
>>>>> one hour or more to finish. We should be able to use bigger OVH instances
>>>>> for our existing CI pretty easily, via their OpenStack API and CIrun.
>>>>>
>>>>
>>>> I am not familiar with CIrun, but quickly checking it, that would
>>>> basically be using our current github actions but through their
>>>> "self-hosted" runner feature?
>>>>
>>>>
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>
>>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20221115/6abc6ba7/attachment.html>


More information about the Pandas-dev mailing list