Hi, I would like to use monogatari for an educational language-based visual novel. To do so, I would like to have it so that instead of text input, the user has to speak, which is then passed into a separate ASR (speech recognition) API (to generate a text string of the audio). Then, when a character responds, I would like to dynamically generate their response and then pass it to a TTS engine (speech synthesis) to make an audio file that plays their 'voice' saying this response.
Right now my questions basically boil down the the following. How could I add to monogatari:
- Ask the user for audio input
- Passing audio or text from monogatari to a separate API
Any advice is appreciated. I am not as familiar with JavaScript as I am with python, but I think I would prefer to use monogatari over other engines since the output is an .html file which can be easily integrated into my envisioned end goal.