ASL3 performance on VPS vs. dedicated servers

N2DYI · August 5, 2024, 5:10am

Greetings:

I run a multi-mode system, which currently incorporates seven regional nodes. Three of these are on rPi4’s running HamVoIP, and the rest are older ASL versions.
Two of the rPi4’s (the ones with the heaviest usage) are in data centers. One is on a multi-gigabit fiber connection in someone’s apartment, though it doesn’t carry nearly the weight of the other two.
The central point, at the moment, is a Linode Nanode.
I’ve been using linode (now Linode/Akamai) VPSs for a long time, and they’ve generally been OK for VoIP stuff, so long as you avoid the Atlanta data center, at least until your physical host gets oversold. However, I’m not hosting large numbers of connections on these VPSs anymore. For about 1.5 years, everything was on a single ASL 1.01 node running on a Linode VPS, not the bottom tier, but a couple up from that.
On this setup, things started to get pretty flaky after maybe 50 connections or so.
Thus, I got a bunch of VPSs in different parts of the country and world, and spread people out by region. This helped some, but moving the most active regions to dedicated rPi4’s helped even more.

One issue I’ve had with hosting VoIP on VPSs is the noisy neighbor problem, wherein resources are shared, and if other hosts are going wild, your VoIP traffic could suffer.
I am currently experiencing this to a small degree with my setup, where all the nodes are connected together using a low-end Linode VPS, and there is almost always a tiny drop-out of audio at the top of every hour for about a second or two, most likely caused by cronjobs running on other guests on the same physical host.
I am hoping to procure another couple of well-placed rPi’s to perhaps mitigate this problem for my own system.
However, this still leads to the question;
With ASL3, what has the general experience been running high connection hubs on VPSs compared to older versions of ASL? What providers in the United States have people found to work best?

A club I sometimes consult with currently has their repeater on an rPi4 running HamVoIP with some internal node balancing, along with Supermon 7.4 and a bunch of connections. This is stressing the poor thing out. Due to routing from their gigabit AT&T U-Verse connection, some people in the area, particularly those on wireless ISPs, don’t have good routes to their server, especially at night. However, they can b rerouted through one of my Linodes, or even a physical box 3000 miles away, and everything is just fine. So, we’re looking at solutions to maximize throughput/latency/resources there as well. Another rPi, or even something better, could be placed in the vault to strictly host public connections, but that won’t necessarily do anything for the routing.
They’re thinking of using a VPS running ASL3, then allowing the rPi to only maintain the physical connection to the repeater. In this case, it’s a club in Northern California, so something in the bay area would probably be preferred. I’m not sure if they would go for the cost of a dedicated server, though that’s what I would do if I could, most likely.

Has anyone majorly stress-tested an ASL3 hub with no other multi-connection hubs connected to it in various configurations to see how much it can take before audio falls off a cliff, and things start dying?

Just looking for thoughts in general on this topic.

K6CRS · August 6, 2024, 4:32pm

CrownCloud has done me well for ASL nodes and Asterisk PBX instances for more than 5 years. I’ve never had an issue with them They even have a unique “snapshot” function as part of their UI so in case i really screw something up, I can roll back to my last snapshot and everything’s rosy.

I think I pay $25/year for a couple of them (got them on special a year or so ago) and $50/year for another. Either way, it’s cheap for a good server.

As far as your ASL loading issues, have you actually gone in when it’s exhibiting problems and looked with top to see what the actual loading on the CPUs is, and who is taking the most cycles?

Making a Pi a hub is stressful enough, but then hanging it on a consumer data connection is begging for trouble, gigabit or not. The junk router AT&T gives out for consumer connections isn’t going to hold a candle to the carrier-class switches you’ll be hooked up to in a datacenter, even if you are 1 of 35 instances sharing the same physical iron.

So, more to think about.

N2DYI · August 6, 2024, 7:03pm

Hi Carl:

Yeah, avoiding consumer connections is kind of what I’m after. The repeater vault in question uses a PFSense router, but it still goes through AT&T’s thing in DMZ mode.
On one of the nets on this system, this Pi has sometimes hosted around 100 connections, not including adjacent nodes, as well as running Supermon and simpleusb, which is really more than should be allowed on such hardware. I ultimately want to separate the repeater controller and hub, maybe even going so far as to connect the two using USRP instead of IAX to further insure local stability no matter what is happening on the other side. My thought is that, with lots of IAX traffic even to a hub not connected directly to the repeater, a chance of local repeater audio breaking exists. So, to avoid that, take IAX out of the picture entirely, as far as the repeater is concerned, at least to the outside world.

To do this, I would connect a private node hosting the repeater’s URI to another private node on the same system, then connect that private node to the remote hub via USRP in and out. Then, theoretically, the worst can happen to the hub, and the repeater itself won’t care. This might add a little delay, though.

This would still leave options to drop the hub if needed, or, as is needed a few times a year, put internet connections in monitor only mode, but the mechanism would be different. Still scriptable using includes (modern Asterisk calls that something different now) to set parameters of the USRP channel driver, then restarting Asterisk.

As for your question about resource load on existing VPSs under high connection counts, I can’t really answer that question, since the traffic to those on my system is pretty light compared to the rPi’s, which I can tell you from experience bottleneck on CPU at hosting around 141 connections. That’s why I’m looking for someone who has really loaded down a single ASL3 instance without a bunch of nodes connected to other adjacent nodes on any kind of X64 platform.

On ASL 1.01, with the bottom tier Linode plan, I started seeing a lot of core dumps when connection counts got above 40 or so. Increasing the RAM and CPU made that happen less often, but it still did. I’m pretty sure that will be less of a problem now, at any rate.

I’m far less concerned about internet routing and throughput on just about any VPS than the unpredictable nature of a bunch of systems running on hypervisors. Even on a mostly idle VPS, with other guests on the same system, you can look at top and watch available system resources change when activity is high on those other systems, not yours, at least in my experience.
I hate those little small audio glitches that inevitably happen with VoIP. While there is no way to completely avoid that (you can’t control every part of the path), minimizing the possibility that such things occur as a result of something that can be controlled is something I am definitely interested in. When my system was hosted all on a single VPS, this was much more of a problem than it is now.

BTW, Linode also has a snapshot feature, which has saved me more than once. Definitely a fan.