ASL3-Ecolink asterisk Crashing

At 3PM and 5PM we have or host a net 30 to 60 nodes plus. We had echolink on the cloud nodes. This is going on with a few nodes so its not isolated. Rebuilt the nodes by 2 different people. My next experiment was to put echolink on the home node. I connected to echolink during the 5 pm net and watched asterisk crash on supermon. Again this has happend on other cloud hubs. I had connected off and on over the last two days. Not a problem till lots of nodes. This was a fresh build too. I spun up an asl2.0 last night but haven't put it online or moved ecolink as of yet. My cloud hub is 4 core with 6 gigs of ram. Home node is a wyse 3040 with a sa818 usb.

New users cant upload. The asterisk log is long. This makes it more interesting.

Dave
N3DMC

I adjusted your account so you should be able to upload. Are you asking a question about ASL2 or ASL3? There has been a ton of work around chan_echolink in ASL3 and many people are running it successfully with large user bases without issue.

This is asl3. Hub and home nodes. This is the asterisk log of the echolink connection during the net. I tried a couple of times and had to stop. This log is off of the home node 512950

I understand all the work. Thank you for everything done on asl3 and echolink. Its a great improvement. Let me know if you need more logs or suggestions.

Asterisk log .pdf (231.3 KB)

Have you run updates recently? Your log is reporting a very old bug that was fixed a long time ago.

root@node512950:~# apt update && apt upgrade -y
Hit:1 Index of /debian bookworm InRelease
Get:2 Index of /debian bookworm-updates InRelease [55.4 kB]
Get:3 Index of /debian/bookworm/ bookworm InRelease [36.9 kB]
Get:4 Index of /debian-security bookworm-security InRelease [48.0 kB]
Hit:5 Index of /DVSwitch_Repository bookworm InRelease
Hit:6 Index of /public/ bookworm InRelease
Get:7 Index of /debian-security bookworm-security/main Sources [152 kB]
Fetched 292 kB in 3s (109 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
libnl-3-200 libnl-genl-3-200 libnl-route-3-200
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

root@node512950:~# sudo apt update
sudo apt upgrade -y
Hit:1 Index of /debian-security bookworm-security InRelease
Get:2 Index of /debian/bookworm/ bookworm InRelease [36.9 kB]
Hit:3 Index of /debian bookworm InRelease
Hit:4 Index of /debian bookworm-updates InRelease
Hit:5 Index of /DVSwitch_Repository bookworm InRelease
Hit:6 Index of /public/ bookworm InRelease
Fetched 36.9 kB in 2s (15.8 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
libnl-3-200 libnl-genl-3-200 libnl-route-3-200
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

#6 was allstar

It looks like your have more than just a standard node since you installed DVSwitch. Does this behavior exist on a standard Pi Appliance with a basic node configuration and an Echolink channel defined? That error you're seeing is usually generated by a mis-configured special-integration to IAX2 or PJSIP that is "working but not really working".

Hey Jason, understanding why you asked if it's a standard node, and just as an "aside"...

I saw an oddball crash under some load of "something" this week but I haven't dug in the logs yet... and yes, it was on an ASL3 node with DVSwitch installed... latest code on all...

I'll see if I can do some log digging... I was away from the house when it did it... Net going on, probably 3-5 users (couldn't look at the time, rural/bad cellular) via EchoLink, no AllStar Link users, and the one local USRP link to DVSwitch doing BM/DMR and ASL handling a link radio via simpleusb... pi 4...

How the crash acted was: Users were chatting away both from DMR and the radio side during the net... "something" crashed hard (I know I know, useless without logs!)... and both EchoLink audio quit and the DMR link went silent while radio traffic on analog side continued (where the majority of this Net's users are)... but then VERY interestingly ALL recovered completely in about 1.5 minute's time... while I was attempting to log in to my stuff remotely from one bar of beautiful Verizon LTE in rural America... gave up and hurried thru my rural neighborhood to try to look... hahaha...

Just adding this comment for "it is interesting but don't worry too much without logs... I don't even know which modules crashed and recovered!" info... and personal stuff has me so busy I likely won't get time to dig too hard unless it happens again/becomes repetitive...

Node has handled a number of Nets in roughly this usage and number of users, recently updated to the (somewhat newly released) ASL3 stuff on the repo... likely the first Net since then, is the only reason this crash is even mildly interesting...

If time permits I'll try to reproduce, but getting that many EchoLink users on for testing outside of that Net... difficult... no idea how I can stress test it for a regression test between the versions... I definitely don't have that many Echolink logins! Hahahahaha...

This aside brought to you by nearly worthless troubleshooting reports, and the $0.01 worth of coffee sipped while catching up on posts this week... hahaha...

(That it recovered at all, and in such a long-ish timeframe was truly weird. If it was packet loss it would have come back quicker... I was texting with another interested party of the Net right before it happened that I was about to pull in my 'hood and carry an armload of crap inside... and that convo switched to I would drop it all on the counter or garage, and go log in and see what croaked...

... when the mobile Echolink app and DMR rig I could hear down the hall had their "miraculous" restart of audio!

hahaha... annoying...

And yeah, I need to give that other interested party remote access to see what's going on... so he could have looked instead of me flying down my dirt road and driveway to try to look! LOL...

Nate WY0X

p.s. For those reading along, almost 100% chance this is unrelated to the above IAX misconfig and unrelated to pretty much everything, carry on! Disregard all after "good morning", over!

I couldn't tell you on a pi. I got away from the pi setups. My cloud has the same issue but no dvswitch server. I just checked if it was up to date. I can put it back on the server. I would have to reload a pi shortly. I will put -r on the pi. Lets give it a try but will be Monday before I know.

1 Like

Interesting. Mine hasn't crashed using dvswitch. Anyways I will watch for your logs to see what you get..

Well, or a Dell 3040 or a VPS without DVSwitch, Supermon, or anything else non-standard installed. Just want to see if you're having that problem due to something with Echolink or something external is triggering it. It could be an app_rpt problem, but we need to narrow down where it's coming from.

Its moved to the vps. No dvswitch or bridges. We have one of the big nets at 5. Its a holiday weekend so I don't expect it to be big enough. We will see shortly. I just connected to test its working.

Here it is from VPS.
ASL3 503600.pdf (228.9 KB)

I did a fresh build. Debian 12, asl3 and echolink. No dashboards. On a 3040. Auto connect via rpt to the main hub. No scripts or crontab. I cant strip it down much more. It could take the main hub down when it does it. Ill capture its logs Monday too.

Asterisk will also go unresponsive in cases of severe congestion; it will appear to "crash" but will generate no core file. Eventually the system will respond once the network traffic subsides. This log resembles what I have seen in that case.

We have tried seperating the nodes up to take the load off and this has been no relief. The nodes do fine till someone connects via echolink. I had alread looked at the same issue and thinking the same way.

Im not challenging your input but questioning whats the fix? It seems no matter where echolink goes its causing the issue. We have separate nodes. Moved echolink. The issue fallows echolink just from my observation. My only answer is shutdown echolink during the nets.

Dave N3DMC

The fix is to rearchitect the system or fix bottlenecks if you have a bandwidth problem. I'm not saying this log is identical to what I saw, I'm saying it resembles my log. What your log shows for sure is asterisk dumping all its connections without necessarily restarting.

The salient point is that your crash may not be a crash. To determine this, check for core files, in this case /var/lib/asterisk/core.

So I have a node that didn't crash during the net today. 2 of them that did. I'm waiting to hear from Steve NU5D what node its on. I have access to pull but we have a lot of nodes here. N3DMC was the node I built last night very basic. Stripped down. W5ZDN hasn't been updated and never crashed. I grabbed what might help. I do run a ssh key script that auto updates keys from my github. I can get you temp access if you have a public key to share. The stripped node doesnt have the script but a password access. I don't plan on keeping it online but you can sure look around.

N3DMC-L Node.txt (262.5 KB)
W5ZDN-R Node.txt (7.6 KB)

I did notice audio issues in W5ZDN-R

You can just type ls -l /var/lib/asterisk/core from the command prompt, and if the file's there, it will have a timestamp from the approximate time of the crash.

N3DMC -rw------- 1 asterisk asterisk 101670912 Apr 20 17:48 /var/lib/asterisk/core

W5ZDN ls: cannot access '/var/lib/asterisk/core': No such file or directory
W5ZDN Is working with issues.

Hmm. Let's find out what you're running...can you run asterisk -V on both machines and let us know? Also, if you can run the following two commands on the N3DMC node, and attach (no paste por favor) the output files from /tmp to your response, it would be appreciated:

sudo apt install asl3-asterisk-dbgsym asl3-asterisk-modules-dbgsym
sudo /var/lib/asterisk/scripts/ast_coredumper /var/lib/asterisk/core