I tried the dev version of ASL3 for a few weeks. I noticed that quite frequently many of the connected clients to our hub node would simulteneously drop out of the connected list on allmon3 and then reconnect (for those with 'permanent' connections anyway). But not all of them, seems they would switch around in batches.
This would happen at seemingly random times, sometimes many successively minutes apart, sometimes 3 hours apart.
I was not able to correlate this with anything in particular. I did not see any suspiciuous messages in the asterisk cli, or in the asterisk log file.
Our hub node typically hosts 20-29 clients, and runs on a pi3.
I had to revert back to version 3.3.0 since this was too disruptive. So far after going back to 3.3.0 I am not seeing this as of yet with 15 hours of uptime.
Probably not the most helpful report, since I won't be able to test 3.4.3+ on this particular hub at this time. I just want to put this out there if anyone else has seen this on similar client count and hardware.
Clearly, we want ASL3 nodes to be rock solid. Any information you can collect and report back to us will be helpful.
Things to check for would be crash reports (e.g. does systemctl status asterisk show a different PID? and/or do you have a /var/lib/asterisk/core file?).
Things that would be good to know include whether you are using an arm64 (e.g. Raspberry Pi) or amd64 (Intel, AMD) system? SimpleUSB, USBRadio, "hub", ...? Do you have any insight into what type(s) of connections are possibly triggering the failures (WT connections, DVSM/IAX, EchoLink, ...)? What extras to you have running on this system?
Also, if you have logs, configurations, or crash reports to share then it would be best to create a GitHub issue (app_rpt) and "attach" the files.
There are no core files that went along. I did reboot last night after the downgrade, so my logs are gone . This node is a 'hub' node.
One thing that seemed odd, was that one node seemed to never disconnect/reset connection. I know for a fact that node is running ASL3 3.3.0.
One other node on the local LAN that acts as a client for our USB dongle also never disconnected/reset connection. That node is runnning on a separate pi3, and is running ASL 3.4.3.
I would suspect that most of the nodes that connect to this hub node are still on hamvoip.
I also noticed that ASL3 3.4.3 on this hub node, per the output of 'top', used about 10-25 percent more cpu when in keyup than ASL3 3.3.0 on the pi3.
I have used up all of my capital trying out 3.4.3, I will probably have to give this some time yet on 3.3.0 on this particular node.
Thank you for your reply, I will refer back in the future to provide more useful info. I am going to let 3.3.0 go for a few days and see if anything similar happens.
node 452381 is the hub running latest dev of ASL3, on a pi3.
node 452384 is connected to the hub and has a simpleusb hotspot running dev of ASL3, on a pi3.
node 2495 is also connected to provide some traffic, and that node is running ASL3 3.3.0, on a pi3.
24 other nodes connected to the hub running hamvoip are set to private nodes in the range 1975-1998, on pi3's.
I saw a bunch of the nodes disconnect/reconnect in sequence. It looked like what I had observed previously.
I have attached the printable output from the hub node 452381 from putty, from ASL3 asterisk console set to debug=5.
The last part of the putty output is a dump of the /var/log/asterisk file.
The disconnect/reconnect sequences started at about 12:36PM or so of the asterisk console output. There was TX activity being generated by connected node 2495 at that time as well.
I also looked at allmon3 on the connected node 2495 mentioned above, and that node's allmon indicated my hub node 452381 was 'LAST RECV" at time 12:36:30, despite my hub node and all of the test nodes connected to it, were never in TX purposefully.
I have started another console dump and tcpdump from my hub node 452381 and one of the test nodes and waiting to see if this will happen again.