Tracing a segfault

amidst working on 12 other things I spun up a X86 ASL install to play with while I learn more about Quantars

all seems ok but I saw this frequently recurring segfault
this is a Lenovo M93 mini running a I5 with 16Gb of ram and a 120G SSD
I do hear a bit of gurgle in the audio here and there and I had been attributing it to network
issues on the far end
but saw the system log getting bigger and bigger
the segfaults appear random but they do seem to run in clusters
I just am not smart enough to trace this

Aug 12 10:32:13 x86repeater kernel: asterisk[5690]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:32:13 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00
Aug 12 10:32:44 x86repeater kernel: asterisk[5706]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:32:44 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00
Aug 12 10:32:46 x86repeater kernel: asterisk[5708]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:32:46 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00
Aug 12 10:33:17 x86repeater kernel: asterisk[5710]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:33:17 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00
Aug 12 10:33:19 x86repeater kernel: asterisk[5712]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:33:19 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00
Aug 12 10:33:50 x86repeater kernel: asterisk[5728]: segfault at c8 ip 00007fe28d4dc118 sp 00007fe276d8b2a0 error 4 in app_rpt.so[7fe28d4cf000+39000]
Aug 12 10:33:50 x86repeater kernel: Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00

i guess I could do a fresh install
but has anybody seen this? or have a clue how to trace it?

de k9wkj

ohh
this is running the full DSP channel driver

I see this has been noted before early last year

and someone was poking at the code
and looked like there was a commit or at least a request
is this not rolled up in the current distribution?
can I just scab the new app_rpt.so in or will it want a recompile
ugggg

de k9wkj

I did as Tim suggested from last year
Temporary work around is to comment each rpt.conf occurrence of statpost_url=http://stats.allstarlink.org/uhandler.php ; Status updates
that seems to have stopped the almost flood of segfault reports

will wait to see if this gets any more love

btw this is not affecting the RPi3 install of 2.0B6 that we have in the field

de k9wkj

bump, same issue. VM on Proxmox, Debain 9, tried and 10 tired various CPU and network settings.

Code: 3e 3d ff ff 48 8d 54 24 18 be 02 00 20 00 48 89 df 31 c0 e8 fa 39 ff ff 48 89 df e8 b2 31 ff ff e8 0d 31 ff ff 48 8b 44 24 18 <81> 38 c8 00 00 00 74 40 4c 8d 05 a9 14 03 00 48 8d 0d 4a 48 03 00

side note it adds [ AstP: 4572 ; bindport and bindaddr may be specified ]
and will not poll the weather.
Did work on one of the installs, but works a few seconds sorta once then quits. then gives same error

I am seeing 2 issues:

  1. My node sometimes loses network connectivity after running fine for > 1 day.
  2. Similar segfault messages as described above.

I see there’s a PR that should fix the segfault issue, and there are a couple other open PRs that look helpful also …

Node is running ASL 2.0.0-beta.6. PC is a Dell 3040 Intel Atom thin-client with 2GB RAM 16GB eMMC flash. The node works fine in general, however I have noticed several times now it lost IP connectivity - after it had run for a full day no issues, in the morning I would not be able to connect to it by http or SSH.

First couple times I just assumed it was a ‘glitch’ on my home network and restarted the node and then everything was fine but today I plugged in a monitor and keyboard to see what was going on. If I tried to do anything relating to networking such as pinging the router it would give a “Network is unreachable” error. I looked in the syslog and could not find any obvious cause. If anyone has any ideas on how to further debug that please LMK.

The segfault messages do not seem to cause any issue and I don’t think is related to the loss of networking issue but just thought I would mention it.

My bigger concern is why would debian be losing network connectivity after ~1 day? My home network is standard DHCP. I have many other devices on the network such as VOIP phones and a NAS drive, that never lose connectivity. I’ve never had to set up any special network configuration for anything. I have a router at 192.168.1.1 in standard DHCP configuration.

p.s. - To get the latest code and some PRs that look helpful I did the following on this node:

git clone git@github.com:AllStarLink/ASL-Asterisk.git ASL-all-PRs
cd ASL-all-PRs/
git pull origin pull/34/head
git pull origin pull/41/head
git pull origin pull/42/head

Then compiled & installed. Likely none of this has anything to do with the node losing it’s IP address after ~1 day but at least I’ll be running the latest stuff.

UPDATE: I resolved the DHCP issue. See this bug report for details: Invalid command in /etc/network/if-up.d/firewall causes networking service to exit · Issue #40 · AllStarLink/ASL-Live-Build · GitHub