I’m using SayAlpha with custom functions, but that audio plays back over connected nodes (such as dvlink). Is it possible to have those play local only?
I know that I can do that with static audio files, but really would like to retain the use for SayAlpha for example to speak an IP address, or confirm a DMR TG.
sayalfa is a function of asterisk.
Asterisk does not know the difference of the connection type exactly as they are all conference call extensions.
While there may be a way to send it to a particular extension (node), I’m not sure how that works within asterisk 20+.
Understanding this, Google may be your friend.
But I would turn Allison off and any other telemetry of dvswitch nodes in any case.
asl3-tts is very slow, I didn’t want every response to take a minimum of 7 seconds to generate.
To get around this I started by creating a little tool that is wrapper over asl3-tts and caches results, prunes the cache, etc. Works well if responses are fairly static.
I then had an idea to combine the existing sound files together to form more complex responses on the fly. That ended going a bit overboard as I added on-demand TTS, phrase matching, phonetics, etc. Now I can generate responses on the fly in <500ms as long as words/phrases can be reused.
Maybe it’ll help someone, still very experimental and probably overkill. I used it mostly as a way to play with Python dev as it’s not my native tongue.
This is interesting. I’m curious what your benchmark system was for the initial response of Piper. I found it to be pretty responsive on an old fifth generation Core I3 Intel NUC, and much less so on a Raspberry Pi 4. It generates audio in just milliseconds on a Zeon 8 core workstation that I’m using for one of my hubs.
Until this project came around, I had been, still use sometimes, a version of DECtalk, which is formant-based, using no samples, so it’s super responsive on even the slowest system. I wrote my own script similar to what asl-tts does, but I customized the formants to sound clearer on radios than it does already.
I am very familiar with DECtalk’s arpabet/phoneme system, having used it in various capacities for over 25 years, so I can make it pronounce anything correctly, with just the right amount of stress on any given syllable as necessary.
As the author of asl3-tts, I can confirm it can be slow depending on the type of system powering it. A RPi3 will be a lot slower than an RPi5 and x86_64-based systems seem to be a lot faster regardless.
Any code you’d like to contribute would be welcome and, if you have a keen interest in it, the project would welcome your direct collaboration.
That’s on a Pi 3B stock. I’m sure it’s much better on faster hardware.
DECTalk is digging way back, good stuff! Do you have any scripts to share? Be neat to play with that on the Pi and see how it performs.
For basic announcements of call signs phonetically, numbers, and simple words/phases, it’s nice to have something that is nearly instant. There’s probably a ton of corner cases with my concatenation hack, but it works for those use cases at least
@N8EIasl3-tts itself doesn’t seem to be the bottleneck as you can see in the timed runs, piper itself is the long pole in the tent. Probably would require significant optimizations to support ARM instruction sets and such to squeeze out more, and probably not going to hit something good enough for near real-time (let’s say <1s).