Linode AllStarLink Node 100% CPU for 2 hours

ky0l · January 19, 2025, 2:06pm

I have an AllStarLink node setup on a Linode Debian 12 computer. Almost 100% of the time, the AllStarLink node is just idling with a maximum of 1 or 2 users. Periodically, I get an email from the Linode system “Your Linode, has exceeded the notification threshold (90) for CPU Usage by averaging 100.0% for the last 2 hours.” This has also occurred when there are zero users connected. I get the email alert of high CPU usage on the minimal CPU shared computer from 1 to 4 times per month. As far as I know, there’s almost no activity on the AllStarLink node almost 100% of the time. Yet, I keep periodically receiving the high CPU alerts from Linode. I wonder if anyone else that has a Linode AllStarLink Node on the smallest shared Linode Debian Linux 12 computer, with almost zero AllStarLink usage, also receives the 100% sustained CPU alerts. What could be causing the extreme CPU usage? Is there any way to adjust the Linux Debian 12 operating system, or ASL3, so that the intermittent extreme CPU conditions do not occur? If some kind of periodic computing process is occurring, perhaps the CPU priorty level could be lowered.

Many years ago, I experienced extreme CPU load while transcoding video files on a home Windows computer. The load was so intense that the mouse and keyboard could not be used at all. There are 7 CPU priorty levels with Windows, and the video transcoding process was set to the mid range CPU priority. To fix the problem, I changed the video transcoding process from normal priorty to low priorty. That fixed the keyboard and mouse lockup problem. It also caused the video transcoding to take a little longer time to process. This only took one line of code to change the pirority.

Later, I ran into the same problem at my work administrating Windows servers. My company’s application ran a batch job to run a daily report which locked up all the servers users for long time period while the batch job was running. I gave our company’s developers the one line of code to lower the batch job priorty. After the developers lowerer the batch job priorty, the server user lockup problem ended and never reoccurred. This fixed the user lockup problem on 100 Windows servers that were running the company’s custom application. The change took a daily report about 1/3 longer to run, but that didn’t matter because it was a daily report. But it immediately fixed hundreds of end user complants.

Perhaps there is a periodic batch job type process that’s running in Debian 12 to control a batch job in the ASL3 application that intermittently spikes the CPU consumption for two hours. If that’s the cause for the extended CPU consumption, then it may be possible to fix that problem by lowering the CPU priorty level for that batch job.

WA3WCO · January 19, 2025, 2:35pm

Is the /var/log/asterisk/messages.log file REALLY big?

ky0l · January 19, 2025, 3:59pm

Here’s the /var/log/asterisk/messages.log files. Most are around 5,000, but two of them are 146,335 and 171,871.

root@localhost:/var/log/asterisk# ls -l messages*
-rw-r----- 1 asterisk asterisk 12361 Jan 19 15:41 messages.log
-rw-r----- 1 asterisk asterisk 146335 Jan 18 23:56 messages.log.1
-rw-r---- 1 asterisk asterisk 5421 Nov 16 23:50 messages.log.10.gz
-rw-r----- 1 asterisk asterisk 5389 Nov 9 23:55 messages.log.11.gz
-rw-r---- 1 asterisk asterisk 5242 Nov 2 23:45 messages.log.12.gz
-rw-r---- 1 asterisk asterisk 5322 Oct 26 23:55 messages.log.13.gz
-rw-r----- 1 asterisk asterisk 5195 Oct 19 23:55 messages.log.14.gz
-rw-r— 1 asterisk asterisk 5282 Oct 12 23:55 messages.log.15.gz
-rw-r— 1 asterisk asterisk 5288 Oct 5 23:40 messages.log.16.gz
-rw-r----- 1 asterisk asterisk 5417 Sep 28 23:55 messages.log.17.gz
-rw-r— 1 asterisk asterisk 5124 Sep 21 23:55 messages.log.18.gz
-rw-r----- 1 asterisk asterisk 5251 Sep 14 23:55 messages. 10g.19.gz
-rw-r–r-- 1 asterisk asterisk 5799 Sep 7 23:55 messages.log.20.gz
-rw-r----- 1 asterisk asterisk 171871 Jan 11 23:51 messages.log.2.gz
-rw-r----. 1 asterisk asterisk 5428 Jan 4 23:56 messages.1og.3.gz
-rw-r---- 1 asterisk asterisk 5251 Dec 28 23:51 23:51 messages.1og.4.gz
-rw-r---- 1 asterisk asterisk 5254 Dec 2123:56 messages.log.5.gz
-rw-r---- 1 asterisk asterisk 6375 Dec 14 23:51 messages.log.6.gz

rw-r— 1 asterisk asterisk 5334 Dec 723:50 messages.log.7.gz
-rw-r---- 1 asterisk asterisk 5361 Nov 3023:55 messages.log.8.gz
-rw-r----- 1 asterisk asterisk 5335 Nov 23 23:55 messages.log.9.gz

WA3WCO · January 19, 2025, 4:15pm

Given that the two much larger files are compressed I suspect that your node ran into GitHub app_rpt issue #420. Our next release includes a fix.

N8EI · January 19, 2025, 4:28pm

Even bug #420 I’ve never seen it create CPU spikes.

If you run top -d1 and watch it for awhile, do you ever see the asterisk process take more than 15% in the %CPU column? You can also capture a log-running vmstat 1 > /tmp/vmstat and then look under “cpu” at the “us” and “sy” fields.That will tell you overall if the box is actually busy.

Finally, what’s the output of uptime say, specifically the “load average” counters.

N8RAW · January 20, 2025, 1:14am

Could this just be logrotate sucking up CPU while compressing the files?
Do the file timestamps line up with the CPU spike?

Mike · January 20, 2025, 1:37am

Can’t really say it is true in this case, but I have seen many times cpu utilization report incorrectly.

Look at/use

top

ky0l · January 20, 2025, 2:57am

The asterick process is generally between 0.9% to 1.9%.
In vmstat the us is 7 - 55 and sy is 17 - 82
I’m wondering if a possible solution is to delete the log files, then reboot.
I think I could schedule a job to do that on a daily or weekly basis with a cron job. The 610480 AllStarLink node is a running on a Linode Nanode 1 GB, the smallest shared Linode cloud computer. The specs are 1 gig ram, a CPU, 25 gigs storage, 1 TB transfer, network 40 gigs down, 1 gig up. I receive the high CPU for two hours email alert from Linode 1 to 4 times per month.

Here are the screenshots for the top -d1 command:

and the vmstat 1 command:

N8EI · January 20, 2025, 3:48am

Your problem isn’t Asterisk/ASL … it is whatever “Quantar” is. But that’s not an ASL program.

The logs files are a complete red herring.

w5mgm · January 20, 2025, 2:35pm

Shopping Cart - RackNerd LLC Not trying to advertise, but for less than $19 a year, you can get a machine from racknerd that will run ASL 3 just fine.

Mike · January 20, 2025, 2:57pm

It appears that quantar+ software, likely used to program the radio continues to run in the background, perhaps as a monitoring or log service ?

Find the PID and kill it if it is not needed.

My bets are it will reload on reboot so you need to find and disable the service and/or uninstall/remove it.
Killing the PID would be a first logical step.

N2DYI · January 20, 2025, 3:41pm

Quantar Bridge is part of DVSwitch Server. You don’t need it unless you are interfacing to a Quantar repeater.
You can safely disable the service:
sudo systemctl stop quantar_bridge
sudo systemctl disable quantar_bridge

N2DYI · January 20, 2025, 3:51pm

How is Racknerd about overselling their hosts?
I have a Racknerd, and find it has some strange network routes and hehavior compared to my Linodes. I don’t run ASL on it. Almost my entire stack runs on dedicated servers these days, since I was getting really tired of the oversold shared resources of VPS’s. Less of a problem when you move to higher price brackets, but hosting a high number of connections, I found the bottom tier plans to struggle a bit when noisy neighbors on the same host were doing heavy tasks. Since CPU is dynamic, you don’t always have the same resources available. A spike on someone else’s virtual machine will decrease the availability of resources on your own machine. You can see this fluctuate while running a task where the CPU is pegged at a specific point.

I tried running ASL on Linode, Digital Ocean and Vultr, and saw the same thing over and over again.
Lots of tasks running on neighboring virtual hosts every 15 minutes, especially at the top of the hour, would cause audible jitter. Sometimes, moving to another physical host would fix it for a while, until that host eventually became saturated. Moving up to the next price bracket usually helped, because there are less guest machines running on the hosts the higher up in price you go.

I do still use a couple of Linode Nanodes for light traffic, and they are OK for that.

N3FAA · January 21, 2025, 1:15am

Racknerd is definitely oversold. There’s no other way to offer a VPS for $20/year. The owners are also felons. Do with that what you will.

Mike · January 21, 2025, 8:15am

This issue in topic is certainly not an issue of a oversold rack.
Perhaps start a new thread for that topic. .