Prevent SayAlpha from sending over linked nodes?

wojo · January 11, 2025, 4:54am

I’m using SayAlpha with custom functions, but that audio plays back over connected nodes (such as dvlink). Is it possible to have those play local only?

I know that I can do that with static audio files, but really would like to retain the use for SayAlpha for example to speak an IP address, or confirm a DMR TG.

Mason10198 · January 11, 2025, 7:33am

Why not just use ASL3s TTS? It’s made exactly for things like this.

Mike · January 11, 2025, 3:08pm

sayalfa is a function of asterisk.
Asterisk does not know the difference of the connection type exactly as they are all conference call extensions.
While there may be a way to send it to a particular extension (node), I’m not sure how that works within asterisk 20+.
Understanding this, Google may be your friend.

But I would turn Allison off and any other telemetry of dvswitch nodes in any case.

wojo · January 13, 2025, 6:57am

asl3-tts is very slow, I didn’t want every response to take a minimum of 7 seconds to generate.

To get around this I started by creating a little tool that is wrapper over asl3-tts and caches results, prunes the cache, etc. Works well if responses are fairly static.

I then had an idea to combine the existing sound files together to form more complex responses on the fly. That ended going a bit overboard as I added on-demand TTS, phrase matching, phonetics, etc. Now I can generate responses on the fly in <500ms as long as words/phrases can be reused.

Maybe it’ll help someone, still very experimental and probably overkill. I used it mostly as a way to play with Python dev as it’s not my native tongue.

N2DYI · January 13, 2025, 9:33am

This is interesting. I’m curious what your benchmark system was for the initial response of Piper. I found it to be pretty responsive on an old fifth generation Core I3 Intel NUC, and much less so on a Raspberry Pi 4. It generates audio in just milliseconds on a Zeon 8 core workstation that I’m using for one of my hubs.

Until this project came around, I had been, still use sometimes, a version of DECtalk, which is formant-based, using no samples, so it’s super responsive on even the slowest system. I wrote my own script similar to what asl-tts does, but I customized the formants to sound clearer on radios than it does already.

I am very familiar with DECtalk’s arpabet/phoneme system, having used it in various capacities for over 25 years, so I can make it pronounce anything correctly, with just the right amount of stress on any given syllable as necessary.

N8EI · January 13, 2025, 1:58pm

As the author of asl3-tts, I can confirm it can be slow depending on the type of system powering it. A RPi3 will be a lot slower than an RPi5 and x86_64-based systems seem to be a lot faster regardless.

Any code you’d like to contribute would be welcome and, if you have a keen interest in it, the project would welcome your direct collaboration.

wojo · January 13, 2025, 2:00pm

Benchmark was a simple usage of asl-tts:

# time asl-tts -t "hello there" -n 1 -f x

real	0m4.208s
user	0m6.480s
sys	0m0.395s

Similar for piper directly:

# time echo "hello there" | piper --model /var/lib/piper-tts/en_US-amy-low.onnx --output_file out.wav
[2025-01-13 08:59:19.372] [piper] [info] Loaded voice in 2.578918776 second(s)
[2025-01-13 08:59:19.375] [piper] [info] Initialized piper
out.wav
[2025-01-13 08:59:20.439] [piper] [info] Real-time factor: 1.0111676728515626 (infer=1.035435697 sec, audio=1.024 sec)
[2025-01-13 08:59:20.440] [piper] [info] Terminated piper

real	0m4.011s
user	0m6.473s
sys	0m0.562s

That’s on a Pi 3B stock. I’m sure it’s much better on faster hardware.

DECTalk is digging way back, good stuff! Do you have any scripts to share? Be neat to play with that on the Pi and see how it performs.

For basic announcements of call signs phonetically, numbers, and simple words/phases, it’s nice to have something that is nearly instant. There’s probably a ton of corner cases with my concatenation hack, but it works for those use cases at least

wojo · January 13, 2025, 3:09pm

@N8EI asl3-tts itself doesn’t seem to be the bottleneck as you can see in the timed runs, piper itself is the long pole in the tent. Probably would require significant optimizations to support ARM instruction sets and such to squeeze out more, and probably not going to hit something good enough for near real-time (let’s say <1s).

N8EI · January 13, 2025, 7:00pm

Well, right, it’s the piper start-up time.