[adelie-project] Re: Proposal: Replacing Integricloud with Scaleway

From: Christine Dodrill <chrissycadey_at_icloud.com>
Date: Sat, 13 Jul 2019 06:09:24 -0400

This sounds reasonable. I approve of this plan.

Sent from my iPad

> On Jul 13, 2019, at 05:56, A. Wilcox <awilfox(a)adelielinux.org> wrote:
>
> == Table of Contents ==
>
> * Executive Summary
>
> * How Did We Get Here?
>
> * Reliability Is Not Availability
>
> * Enter Scaleway
>
> * Hard Numbers
>
> * Conclusion
>
> * References
>
>
>
> == Executive Summary ==
>
> This is a formal proposal to retire the dedicated server we have with
> Integricloud and replace it with a set of virtual servers from Scaleway.
>
> We originally chose Integricloud's dedicated server offering primarily
> for reliability and security. While it has proven secure, and the
> hardware itself is reliable, its availability leaves something to be
> desired.
>
> Scaleway offers a similar level of reliability, and has a higher level
> of availability based on our current account with them. They
> additionally offer servers that are not based on the x86 architecture,
> so we are still protected from the numerous issues that plague x86.
>
> This will also reduce our hosting costs by almost 90%, and should reduce
> downtime by nearly 100%.
>
>
> == How Did We Get Here? ==
>
> In early January 2019, we were notified that both of our dedicated
> servers at Rack911 were being retired, with very little notice. For
> some additional information, reference adelie-devel_at_ post with message
> ID <ba35ebd3-54b4-f18f-b65f-d327e9d0af80(a)adelielinux.org>
> (archived at [1]).
>
> After our sponsorship was pulled in October 2018, we had done a bit of
> investigation into replacement hosting providers in the event that this
> would happen. Our requirements at the time were:
>
> * non-x86 based (due to the plethora of x86 bugs being discovered)
>
> * at least 8 GB RAM minimum
>
> * dedicated hardware preferred
>
> * at least 3 IPv4 addresses
>
> We evaluated Packet.net for ARM64 based systems[2] and Integricloud for
> PPC64 based systems[3]. We found Integricloud to be approximately 60%
> of the cost of Packet.net[4]. Additionally, we had a professional
> working relationship with their parent company, Raptor Engineering, who
> make the Talos and Blackbird family of computers. In fact, the
> Integricloud system we were offered was to be a rack-mounted Talos II.
> Since we already had a Talos II in use as a build server, we felt this
> would be close to ideal, as any hardware oddities have already been
> worked out.
>
> We chose their 4-core (16-thread) PowerPC system with 8 GB RAM and 2 x 1
> TB NVMe disk storage. One 1 TB NVMe disk is dedicated to
> mirrormaster.adelielinux.org. The other 1 TB NVMe disk is an LVM group,
> shared between the various KVM-based virtual servers run on it.
>
>
> == Reliability Is Not Availability ==
>
> The Integricloud dedicated server, chloe.adelielinux.org, has has no
> hardware issues in over eight months of service. The hardware itself
> has been fast, stable, and very reliable. However, there have been
> multiple issues regarding availability.
>
> Integricloud has a single homed fibre infrastructure; per a public
> looking glass, it is run via Mediacom[5]. This has caused an unforeseen
> and consistent issue regarding availability.
>
> 2019-04-16 13:17 down
> 2019-04-16 22:24 9 hours, 7 minutes
>
> 2019-04-17 00:10 down
> 2019-04-17 12:29 12 hours, 19 minutes
>
> 2019-07-09 06:25 down
> 2019-07-09 20:01 13 hours, 37 minutes
>
> 2019-07-10 15:14 down
> 2019-07-10 15:39 25 minutes
>
> 2019-07-12 16:35 down
> 2019-07-12 16:43 8 minutes
>
> This has resulted in a 97% uptime for April, and a 98% uptime for July -
> and we are only 13 days into July, so this number could go down further.
>
> Additionally, many ISPs are not accepting Mediacom's IPv6 route
> announcements. This has caused mirrormaster to be inaccessible to many
> of our users, and even one of the members of our own Infra Team[6].
>
> Finally, while yours truly was trying to show an Adélie Web page to
> someone while on public Wi-Fi at a well-known place in Broken Arrow, OK,
> I was greeted with an error page[7]:
>
>
> Sonicwall Network Security Appliance
>
> This site has been blocked by the network administrator.
>
> Block reason: Gateway GEO-IP Filter Alert
>
> IP address: 23.155.224.64
>
> Connection initiated towards country: Unknown
>
>
> If a car dealership's firewall is blocking us, who knows what other
> firewalls are blocking us. How many people are unable discover us, and
> how many corporate sponsors are we missing out on, because they can't
> even connect to our Web site? And why can they not connect to our Web
> site? It could be the IPv6 peering issue, or a firewall blocking our
> IPv4 space, or because Mediacom has suffered another "fibre cut".
>
>
> == Enter Scaleway ==
>
> We have had a working relationship with Scaleway for almost a year and a
> half. We launched our 32-bit ARM builder on the Scaleway ARM cloud in
> March 2018, and have had no downtime in that time:
>
> awilcox on erin [pts/0 Sat 13 9:33] ~: uptime
> 09:33:02 up 489 days, 5:59, load average: 0.00, 0.00, 0.00
>
> The network has never suffered any outages, either. Since the Scaleway
> cloud features ARM servers, we would additionally still be able to avoid
> the x86 architecture and all of its failings.
>
> We have continually been limited by our lack of IPv4 space at
> Integricloud. Currently, we "proxy" every server via athdheise, a
> virtual server on our Integricloud dedicated system that has both an
> IPv4 and IPv6 address. All of our main systems are IPv6-only (wiki,
> bts, next, etc), and when an IPv4 system attempts to connect to any of
> these services, they have to be proxied via athdheise.
>
> If we use Scaleway virtual servers, every system gets its own dedicated
> IPv4 address, which drastically simplifies our administration.
>
> Additionally, we would receive a lot more RAM per virtual server.
> Currently, athdheise - the aforementioned Web server and proxy - has 256
> MB RAM. It has 34 MB of available RAM. When documentation changes are
> made and the Git hook runs to cause athdheise to rebuild the
> documentation site (at help.adelielinux.org), sometimes the process runs
> out of memory. This means one of us has to log in, stop the web server,
> run the make process, and then restart the web server. The minimum RAM
> at Scaleway is 2 GB per virtual server. This is an extreme amount of
> overhead, and would even allow us to play with memcached (or other
> caching solutions) to reduce latency across our infrastructure.
>
> Finally, we would save a dramatic amount of money. We currently pay
> 225$/mo pre-tax for Integricloud.
>
>
> == Hard Numbers ==
>
> The current systems we run on Integricloud are:
>
> enfys (postgresql) 768 MB RAM 30 GB disk
>
> rarity (these mailing lists) 1536 MB RAM 30 GB disk
>
> mirrormaster 256 MB RAM 1 TB disk
>
> bts (Bugzilla issue tracking) 512 MB RAM 8 GB disk
>
> athdheise (Web server/proxy) 256 MB RAM 4 GB disk
>
> wiki 512 MB RAM 8 GB disk
>
> annwyn (Nextcloud) 512 MB RAM 100 GB disk
>
> chatterbox (Quassel IRC) 512 MB RAM 40 GB disk
>
>
>
> Since Scaleway tops out at 500 GB disk, we will need to consider
> alternate hosting for mirrormaster. I believe we can run this on the
> Hetzner dedicated server that is being sponsored by Alyx at Leuhta Labs.
>
>
>
> And this is what we could pay per virtual system on Scaleway:
>
> 4 ARM CPUs, 2 GB RAM, 50 GB disk - 2.99€/mo
>
> 6 ARM CPUs, 4 GB RAM, 100 GB disk - 5.99€/mo
>
> 8 ARM CPUs, 8 GB RAM, 200 GB disk - 11.99€/mo
>
>
>
> By my approximation, we would be able to put every single system except
> annwyn on the smallest server, and annwyn on the second-smallest.
>
> 6× 2.99€ = 17.94€ per month
>
> 1× 4.99€ + 17.94€ = 22.93€ per month total cost, or approximately
> 25.81$. This is a savings of nearly 90% after tax.
>
>
> == Conclusion ==
>
> I believe that retiring our Integricloud dedicated server and replacing
> it with Scaleway virtual ARM servers makes business sense. It will
> allow us to spend less time down, dramatically improve the architecture
> of our infrastructure, and reach more people. This will allow us to
> have an even greater reach, and allow us to grow into a larger, more
> healthy Linux distribution that can genuinely improve the world.
>
>
> I do not want to leave this proposal without a separate smaller proposal
> for how this could be effected easily. I believe that we can simply
> start by migrating the wiki server, since it is the least used service.
> We can feel out Scaleway's ARM offering for a while, and make sure that
> it will genuinely work for our needs. After we are satisfied, we can
> change the DNS for the wiki and begin work on another server. Assuming
> all goes well, we will eventually be able to quietly power off the
> Integricloud dedicated system with zero further downtime.
>
>
> Thank you so much for reading this proposal. I welcome any comments or
> questions you may have. You may respond here or poke me on IRC. I'll
> post a summary email in response with any important notes from IRC.
>
> Best,
> --arw
>
>
> == References ==
>
> [1]:
> https://lists.adelielinux.org/hyperkitty/list/adelie-devel(a)lists.adelielinux.org/thread/5QZCLXCVL7H2DOCDUOURWRVTZ52CMRPS/
>
> [2]: https://www.packet.com/cloud/servers/c1-large-arm/
>
> [3]: https://www.integricloud.com/
>
> [4]: The Packet.net ARM box runs at 360$/mo. Integricloud is 220$/mo.
>
> [5]: https://bgp.he.net/AS46246
>
> [6]:
>
> <aranea> awilfox: Looks like my routing issues are Mediacom's (that's
> Raptor's only upstream) fault. I doubt I'll have any success contacting
> them; this needs to come from a customer. I'll try contacting tpearson
> again with more details; if he doesn't respond, I may have to ask you to
> file an outage report or sth.
> <aranea> Short version: Mediacom doesn't follow some standard industry
> practices, and thus many of their peers aren't accepting the routes they
> announce on behalf of their customers (and guess what, Raport is their
> only IPv6 customer.)
>
> [7]: https://i.imgur.com/khmebJ5.png
>
> --
> A. Wilcox (awilfox)
> Project Lead, Adélie Linux
> https://www.adelielinux.org
>
> _______________________________________________
> Ad?lie Open Governance mailing list -- adelie-project(a)lists.adelielinux.org
> To unsubscribe send an email to adelie-project-leave(a)lists.adelielinux.org
Received on Sat Jul 13 2019 - 10:28:31 UTC

This archive was generated by hypermail 2.4.0 : Sat May 08 2021 - 22:54:40 UTC