How to use the Speech to Text model in Ozeki AI Server

This guide demonstrates how to set up and use a Speech to Text model in Ozeki AI Server. You will learn how to download a compatible Whisper model from Hugging Face, configure it in Ozeki AI Studio, create an AI chatbot that uses it, and test the transcription by sending an audio file and receiving the converted text in response.

What is the Whisper Speech to Text model?

Whisper is an open-source speech recognition model developed by OpenAI, available in several sizes ranging from tiny to large. It is trained on a broad dataset of multilingual audio and delivers accurate transcription across a wide range of languages, accents, and audio conditions.

Steps to follow

  1. Download the Speech to Text model
  2. Create a Speech to Text model in AI Studio
  3. Create an AI chatbot and assign the model
  4. Test the Speech to Text transcription

Speech to Text model

You can use the following Speech to Text model:

https://huggingface.co/ggerganov/whisper.cpp

Test audio file

You can use the following audio file to test the AI speech to text model:

icanunderstand.wav

Required audio format:
16 bits
16 kHz sample rate
Mono channel

How to download the Speech to Text model video

The following video shows how to download the Whisper Speech to Text model from Hugging Face and place it in the correct directory for Ozeki AI Studio to detect it.

Step 1 - Download the Speech to Text model

Navigate to the Whisper.cpp model page on Hugging Face and download the GGUF model file of your choice. Larger variants offer better accuracy while smaller ones are faster and require less memory (Figure 1).

Download GGUF model file
Figure 1 - Download the Whisper GGUF model file from Hugging Face

Once the download is complete, copy the model file to the C:\AIModels folder on your system (Figure 2).

Copy model file to C:\AIModels folder
Figure 2 - Copy the model file to the C:\AIModels folder

Step 2 - Create a Speech to Text model in AI Studio

The following video shows how to create a Speech to Text model and an AI chatbot in Ozeki AI Studio and test it by submitting an audio file for transcription.

Open the Ozeki desktop and click the AI Studio icon to launch the AI model management interface. AI Studio is the central hub where you create, configure, and manage all AI models running on your Ozeki AI Server instance (Figure 3).

Open AI Studio
Figure 3 - Open AI Studio from the Ozeki desktop

In AI Studio, click the AI Models button in the toolbar to open the models page, then click Create new AI Model. In the model type selector that appears on the right, select Speech2Txt to create a new Speech to Text model (Figure 4).

Create new Speech to Text model
Figure 4 - Create a new Speech to Text model in AI Studio

In the model configuration panel, locate the model file input field under the General tab and select the Whisper model file you copied to C:\AIModels in Step 1. Click OK to save the model configuration (Figure 5).

Select model file
Figure 5 - Select the Whisper model file

Step 3 - Create an AI chatbot and assign the model

Return to the AI Studio toolbar and click Chat bots, then click Create new Chat bot. Select AI Chat as the chatbot type to create a new AI-powered chat session that can interact with your configured models (Figure 6).

Create new AI Chat bot
Figure 6 - Create a new AI chatbot

In the chatbot configuration panel, assign the Speech to Text model you created in Step 2 using the model dropdown. Disable the "Send welcome message." toggle, give the chatbot a descriptive name and click OK to save (Figure 7).

Select model for use
Figure 7 - Assign the Speech to Text model to the chatbot

Open the chatbot you just created to access its chat interface. This is where you will submit audio files for transcription and receive the converted text as a response (Figure 8).

Open chat bot
Figure 8 - Open the chatbot interface

Step 4 - Test the Speech to Text transcription

Download the test audio file provided at the top of this page and save it to a known location on your system (Figure 9).

Download audio file
Figure 9 - Download the test audio file

Confirm that the audio file has been saved to your file system and note its full path. You will need to provide this path in the chatbot message to direct the model to the correct file (Figure 10).

Audio file in file system
Figure 10 - Locate the audio file in the file system

In the chatbot interface, send the full file path of the audio file as a message. The chatbot will pass the file to the Speech to Text model for processing (Figure 11).

Send audio file location
Figure 11 - Send the audio file path to the chatbot

The Whisper model will process the audio and return the transcribed text as a response in the chat window. A successful result confirms that the Speech to Text model is correctly configured and ready for use in your Ozeki AI Server workflows (Figure 12).

Audio file text received back
Figure 12 - Transcribed text returned by the Speech to Text model

Conclusion

You have successfully set up a Whisper Speech to Text model in Ozeki AI Server, configured it in AI Studio, and verified that it can accurately transcribe audio files into text. This model can now be combined with other components in Ozeki AI Server to build more advanced voice-enabled AI pipelines for your organization.


More information