Permanent/persistent connections, an example

Hi Ken,

I have 4 private nodes in my network. To simplify link management I added a hub node. The hub node brings up permanent connections to all radio-interfaces nodes that should stay linked. My rpt.conf has the following link control commands defined:

811 = ilink,11 ; Disconnect a previously permanently connected link
812 = ilink,12 ; Permanently connect specified link – monitor only
813 = ilink,13 ; Permanently connect specified link – tranceive

My hub node (1300) has the following startup macro to connect to 1315, 1310, 1301, and 1308:

startup_macro=*8131315 *8131310 *8131301 *8131308

We use private microwave for our connections so uptime is much better than the Internet normally is. We’ve not seen issues up to about 2 weeks. At 2 weeks, we’ve seen some odd audio path behaviors that will need troubleshooting someday. In the meantime, rebooting all the nodes every week keeps things stable.

Let me know if you have questions.

73,

Dave
WA1JHK

I have a similar setup using a hub node to bridge my other nodes. I’m not using ilink,13 because all the nodes are on the same server as the 2530 hub. I also don’t find the spaces in the startup_macro necessary.

startup_macro = *32501*32503*32521*32523*32525*32526

I see a similar issue after around 400 hours of connect time. Most nodes don’t see this long of a connect time (without retries) for a number of reasons including poor IP links. However, in cases like yours, where the IP link is stable various issues may arise. In my case the long connect times happen between nodes on the same server. I’ve resorted to rebooting Asterisk on the 1st and 15th of each month.

The failure varies I think due to different parts of memory getting trashed. I’ve seen problems including the connect time go negative, bad audio and Asterisk crashes.

I’ve reported this issue to Jim Dixon and others and submitted to GitHub. So far no one can find the bug. It’s a hard bug to duplicate given the long time between failures and the fact that it’s intermittent. Sometimes no problems are detected after many weeks.

I have noticed this as well. Agree that something must be going on in memory.
At times I may not be able to ever disconnect a perm connection etc.

By chance, I found a asterisk ‘reload’ and not restart every 2 weeks is enough.
29999 currently running 1100 hrs without error.
But my first big find was hacks that were slipping past the radar so to speak.
And that probably would not effect anyone not enabling SIP & Echolink.

My radio node server is now on solar and ‘clean power’ has made all the difference there because many memory issues are created by ‘unclean’ power. But I do a lot of telephony with it so it gets restarted a bunch at times when I start editing my call blocking/forwarding stuff. Might move that stuff to a VPS.

Lots of nodes run one or both of those. Anything you want to share about those hacks?

@wd6awp I have been trying to make sense enough of it to take notes and make solutions.

I will make a separate PM with you and you can test what I have done thus far.
Many things I do not want to post in public.

I just had a loop on 29999 from a attempted connect of IAXrpt
[May 26 12:46:25] WARNING[575]: chan_iax2.c:3101 __attempt_transmit: Max retries exceeded to host ###### on IAX2/iaxrpt-9794 (type = 6, subclass = 11, ts=449991, seqno=99)
I did try to tx before fully connected (was not paying attention to slowness)
But the loop prevented any control.
So… there are strange things going on in memory with time.