How to use the Speech to Text model in Ozeki AI Server
This guide demonstrates how to set up and use a Speech to Text model in Ozeki AI Server. You will learn how to download a compatible Whisper model from Hugging Face, configure it in Ozeki AI Studio, create an AI chatbot that uses it, and test the transcription by sending an audio file and receiving the converted text in response.
What is the Whisper Speech to Text model?
Whisper is an open-source speech recognition model developed by OpenAI, available in several sizes ranging from tiny to large. It is trained on a broad dataset of multilingual audio and delivers accurate transcription across a wide range of languages, accents, and audio conditions.
Steps to follow
- Download the Speech to Text model
- Create a Speech to Text model in AI Studio
- Create an AI chatbot and assign the model
- Test the Speech to Text transcription
Speech to Text model
You can use the following Speech to Text model:
https://huggingface.co/ggerganov/whisper.cpp
Test audio file
You can use the following audio file to test the AI speech to text model:
Required audio format:
16 bits
16 kHz sample rate
Mono channel
How to download the Speech to Text model video
The following video shows how to download the Whisper Speech to Text model from Hugging Face and place it in the correct directory for Ozeki AI Studio to detect it.
Step 1 - Download the Speech to Text model
Navigate to the Whisper.cpp model page on Hugging Face and download the GGUF model file of your choice. Larger variants offer better accuracy while smaller ones are faster and require less memory (Figure 1).
Once the download is complete, copy the model file to the C:\AIModels
folder on your system (Figure 2).
Step 2 - Create a Speech to Text model in AI Studio
The following video shows how to create a Speech to Text model and an AI chatbot in Ozeki AI Studio and test it by submitting an audio file for transcription.
Open the Ozeki desktop and click the AI Studio icon to launch the AI model management interface. AI Studio is the central hub where you create, configure, and manage all AI models running on your Ozeki AI Server instance (Figure 3).
In AI Studio, click the AI Models button in the toolbar to open the models page, then click Create new AI Model. In the model type selector that appears on the right, select Speech2Txt to create a new Speech to Text model (Figure 4).
In the model configuration panel, locate the model file input field under the
General tab and select the Whisper model file you copied to C:\AIModels
in Step 1. Click OK to save the model configuration (Figure 5).
Step 3 - Create an AI chatbot and assign the model
Return to the AI Studio toolbar and click Chat bots, then click Create new Chat bot. Select AI Chat as the chatbot type to create a new AI-powered chat session that can interact with your configured models (Figure 6).
In the chatbot configuration panel, assign the Speech to Text model you created in Step 2 using the model dropdown. Disable the "Send welcome message." toggle, give the chatbot a descriptive name and click OK to save (Figure 7).
Open the chatbot you just created to access its chat interface. This is where you will submit audio files for transcription and receive the converted text as a response (Figure 8).
Step 4 - Test the Speech to Text transcription
Download the test audio file provided at the top of this page and save it to a known location on your system (Figure 9).
Confirm that the audio file has been saved to your file system and note its full path. You will need to provide this path in the chatbot message to direct the model to the correct file (Figure 10).
In the chatbot interface, send the full file path of the audio file as a message. The chatbot will pass the file to the Speech to Text model for processing (Figure 11).
The Whisper model will process the audio and return the transcribed text as a response in the chat window. A successful result confirms that the Speech to Text model is correctly configured and ready for use in your Ozeki AI Server workflows (Figure 12).
Conclusion
You have successfully set up a Whisper Speech to Text model in Ozeki AI Server, configured it in AI Studio, and verified that it can accurately transcribe audio files into text. This model can now be combined with other components in Ozeki AI Server to build more advanced voice-enabled AI pipelines for your organization.