Integrating ASR and TTS

tigerninjaman

Hi, I would like to use monogatari for an educational language-based visual novel. To do so, I would like to have it so that instead of text input, the user has to speak, which is then passed into a separate ASR (speech recognition) API (to generate a text string of the audio). Then, when a character responds, I would like to dynamically generate their response and then pass it to a TTS engine (speech synthesis) to make an audio file that plays their 'voice' saying this response.

Right now my questions basically boil down the the following. How could I add to monogatari:

Ask the user for audio input
Passing audio or text from monogatari to a separate API

Any advice is appreciated. I am not as familiar with JavaScript as I am with python, but I think I would prefer to use monogatari over other engines since the output is an .html file which can be easily integrated into my envisioned end goal.

tigerninjaman

Posting here since this will show up on web searches - using Mozilla's mediastream API is easy, I was able to essentially paste the web-dictaphone code into main.js as a custom Component and get the recording working. See this link https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API/Using_the_MediaStream_Recording_API