RTCM will not stay connected to ASL3 box

I am continuing to troubleshoot the archivedir problem whereby the audio stream is being sampled too slowly so I set up a new Debian 12 box and installed ASL3 from source so I can do some code debugging. I followed this procedure exactly in order to install the source code and compile it: Source-Based Installation - AllStarLink Manual).

I overwrote four of the /etc/asterisk/xxxx.conf files with those from my production ASL3 box and pointed the RTCM to the new ASL3 box. The RTCM attaches right away but then the connection disconnects, reconnects, disconnects, etc. It is “flapping”. If I point the RTCM back to the production ASL3 box it works just fine; no flapping.

Does anyone know what might be causing this? I had the identical problem once when I tried to use an older kernel with a newer ASL3 build. I’m pretty certain the problem has something to do with the code I compiled on the new ASL3 box but I’m not sure how to troubleshoot this problem. I went into the Asterisk CLI and and turned on rpt debugging but I’m not finding anything useful.

Here is what is showing on the Asterisk CLI console screen:

Host Connection established (Pri) (207.154.x.y)
ERROR! Host response timeout
Host Connection Lost (Pri) (207.154.x.y)
Host Connection established (Pri) (207.154.x.y)
ERROR! Host response timeout
Host Connection Lost (Pri) (207.154.x.y)
.
.
.

I’ve had this happen a couple times but on ASL 2. I did a full reboot of the server and that RTCM stayed connected just fine. I’ve heard another way to fix is to reboot the host and then do a reboot of RTCM’s one by one.

Since this setup is for debugging I’ve kept it as simple as possible. It is just one RTCM. I’ve rebooted both the new ASL3 box as well as the RTCM to no avail.

Did you change the port to 1667 as specified in the manual?

Yes, I did. Thank you.

Well that is very interesting. I just launched two servers this last weekend on ASL3 with both running multiple RTCM’s with no problems what so ever. I’m assuming the host and client are on the same LAN?

I’ve seen it before when doing a firmware update on networking equipment, changing UDP ports, changing IP addresses, changing ASL hosts, etc. There is weirdness in the VOTER implementation and/or how modern networking equipment handles it. The issue you are describing is not new in ASL3.

Reboot your network router(s), firewall(s), and switch(es), your ASL host, and finally your VOTERs/RTCMs.

Host and client are not on the same LAN. That would be ideal, but all my RTCMs are deployed in remote locations. The RTCM had been connected with no errors to my ASL3 where ASL3 was installed via the package system. I then wiped that machine and re-loaded Debian 12 and compiled ASL3 from source code and then this problem started. Everything else in the network is identical; only the way ASL3 was installed on the computer was changed.

I tried rebooting everything other than the switch where the repeater is since it is at 4000’ and takes a long time to get there and conditions are non-ideal right now due to rain last week. So, I don’t want to reboot any networking gear on the hill at the moment (and the switch is the only thing in between other than a microwave radio). I did, however, reboot the ASL3 box, my local firewall/router/switch and the RTCM on the hill but the problem did not go away.

I’m guessing the problem has to do with having built ASL3 from source. I’ll try to dig up a spare machine at work tomorrow and load ASL3 on it from package system so I have two boxes sitting next to each other (one built from package and one from source) and can switch the ethernet cable between the two to see what happens. If the problem only happens when the RTCM is attached to the ASL3 box built from source and stops when I plug it into the one built from a package, I’ll assume I’ve got a software issue on the ASL3 box (which is my gut feeling at the moment).

Still curious is there is some debugging I could do to “see” what is happening.

I have VOTERs on LAN and on WAN as well. Every time I update the firmware on the Unifi Gateway in front of the the ASL server, I lose my connections and no amount of rebooting or rebuilding on the server side of the network will fix it. I have to go and physically reboot the microwave PtP radios for the LAN sites and the Internet modems for the WAN sites. It makes no sense, there is some kind of sessionization happening with the VOTER protocol that is undocumented. I do not understand it, I just live with it.

Unfortunately the debugging info in the VOTER system is not very good. There are some useful messages printed to the console of the VOTER/RTCM client, but it it not saved or recorded anywhere so the console must be actively opened via RS232 or Telnet connection to capture it.

The server side debugging in chan_voter is even worse. Master and slave client disconnections are logged to the Asterisk logger, but that’s it. It does not show a reason for disconnection, there is no info about out of bounds or out of sequence packets, etc. The best thing you can do is turn up the logging level until you see every raw packet coming in from all of your VOTER clients, but that information is impossible to read or make sense of. There are several improvements needed in the source code by someone who knows C and knows what they are doing.