

This is circular.
if an attacker compromises the F-Droid app on your device, they can… load malicious apps onto your device
Could be rewritten:
if an attacker compromises your device, they can compromise your device
You’ve already lost when they put the first malicious app on.
I don’t have as much experience with HASS, but I did use Mycroft for quite a while (stopped only because I had multiple big moves, and ended up in a place small enough voice control didn’t really make sense any more). There were a few intent parsers used with/made for that:
https://github.com/MycroftAI/adapt https://github.com/MycroftAI/padatious https://github.com/MycroftAI/padaos
In my experience, Adapt was far and away the most reliable. If you go the route of rolling your own solution, I’d recommend checking that out, and using the absolute minimum number of words to design your intents. E.g. require “off” and an entity, and nothing else, so that “AC off,” “turn off the AC,” and “turn the AC off” all work. This reduces the number of words your STT has to transcribe correctly, and allows flexibility in command phrasing.
If you borrow a little more from Mycroft, they had “fallback” skills that were triggered when an intent couldn’t be matched. You could use the same idea, and use https://github.com/seatgeek/thefuzz to fuzzy match entities and keywords, to try to handle remaining cases where STT fails. I believe that is what this community made skill attempted to do: https://github.com/MycroftAI/skill-homeassistant (I think there were more than one HASS skill implementations, so I could be conflating this with another).
Another comment mentioned OVOS/Neon - those forked off of Mycroft, so you may see overlap if you investigate those as well.