Blog

All Blog Posts  |  Next Post  |  Previous Post

Additional audio transcribing support in TMS AI Studio v1.2.3.0 and more ...

Today

TMS Software Delphi  Components tmsaistudio


We are pleased to announce the availability of TMS AI Studio v1.2.3.0! 


More audio transcribing capabilities

We have upped the support for audio transcribing in several ways. First of all, we have added two additional services:

  • Mistral : using the Mistral vortex audio transcribing model 
  • Gemini : using the multimodal capability of gemini-2.5-flash

Simply set TMSMCPCloudAI.Service to either aiMistral or aiGemini and call TMSMCPCloudAI.Transcribe() on an existing MP3 file or recorded sound buffer and it will return the text detected from the audio.

Note that for direct microphone audio recording, we have the free Delphi audio library you can retrieve from our Github repository.

OpenAI audio translation and model configuration

OpenAI not only offers the whisper-1 model for audo transcribing, it also introduced gpt-4o-mini-transcribe and gpt-4o-transcribe. You can now select the model via TMSMCPCloudAI.Settings.OpenAITranscribeModel.

Other than this, OpenAI also offers the automatic transcribing of audio to the English language irrespective of the language of the audio. This offers a capability to deal with spoken audio in the English language in your software irrespective of what language the user spoke. This can be done via a new method Translate that can be used with an MP3 sound buffer or MP3 file:

procedure TTMSMCPCloudAI.Translate(SoundFile: string); overload;
procedure TTMSMCPCloudAI.Translate(SoundBuffer: TMemoryStream); overload;

Note that at this moment, this translate capability is only offered through OpenAI.

New TTS / STT demo

We added a new demo that shows the speech to text, text to speech and prompts used for translation capabilities for OpenAI. You can use this demo application as a spoken word translater to the language of choice.

TMS Software Delphi  Components

What we do here is first of all we record the audio from the default computer microphone after a click on the green button and when the user clicks stop, we stop the recording and get the recorded sound as MP3 stream and send it to OpenAI via TMSMCPCloudAI1.Transcribe(s);

var
  s: TMemoryStream;
begin
  if not FIsSpeaking then
  begin
    ar.ClearRecordedData;
    ar.StartRecording;
  end
  else
  begin
    ar.StopRecording;
    s := ar.GetMP3Stream(20500);
    s.Position := 0;
    TMSMCPCloudAI1.Transcribe(s);
    s.Free;
  end;
When OpenAI transcribed the audio, the event TMSMCPCloudAI.OnTranscribeAudio is triggered from where we can retrieve the spoken words as text and create a prompt to perform translation to the selected language:

procedure TForm1.DoTranslate(Text, Language: string);
begin
  TMSMCPCloudAI1.AssistantRole.Text := 'You are a translator that literally translates this text to '+ language;
  TMSMCPCloudAI1.Context.Text := Text;
  TMSMCPCloudAI1.Execute;
end;
When this prolmpt got executed and thus returns the translated text, we can invoke OpenAI's text to speech function to let the computer speak the translated text:

procedure TForm1.TMSMCPCloudAI1Executed(Sender: TObject;
  AResponse: TTMSMCPCloudAIResponse; AHttpStatusCode: Integer;
  AHttpResult: string);
begin
  if AHttpStatusCode div 100 = 2 then  // HTTP status code 200 = success
begin memo2.Lines.Text := AResponse.Content.Text; TMSMCPCloudAI1.Speak(memo2.Lines.Text); end; end;
That is how easy it is to let your computer speak for you in another language.

Usage tracking across requests

When using AI from services as OpenAI, Gemini, Mistral, Claude, Grok, Perplexity, DeepSeek, ... this isn't free. Typically the cost is in direct relation to the tokens used. A token in the context of AI literally means: a single unit of meaning used to represent words, subwords, or punctuation in a piece of text.

Tokens are taken in account for the prompt text as well as for the resulting answer text produced by the LLM. In TMSMCPCloudAI, this number of tokens consumed for a prompt request can be retrieved from the response object:

procedure TForm1.TMSMCPCloudAI1Executed(Sender: TObject;
  AResponse: TTMSMCPCloudAIResponse; AHttpStatusCode: Integer;
  AHttpResult: string);
begin
  // here we can check the tokens used for the request:
  AResponse.TotalTokens: integer  // sum of prompt and completion tokens
  AResponse.PromptTokens: integer   // tokens used for the prompt
  AResponse.CompletionTokens: integer  // tokens used in the response text produced by the LLM
We have now extended the TMSMCPCloudAI class to track the usage across multiple requests via TMSMCPCloudAI.Usage. As long as you do not call TMSMCPCloudAI.Usage.Reset, the number of tokens will be added and so you can check the total number of tokens used during a session.  This result can be retrieved via:

TMSMCPCloudAI.Usage.TotalTokens: integer
TMSMCPCloudAI.Usage.PromptTokens: integer
TMSMCPCloudAI.Usage.CompletionTokens: integer


Get started

If you are new to integrating AI capabilities into your Delphi applications, download the fully functional TMS AI Studio trial version and discover how you can make your software more powerful with it. 
If you have a active TMS ALL-ACCESS subscription, you'll find this product automatically in your toolbox. If you are a student or teacher, we have now also our free academic version of TMS AI Studio!

We are very eager to learn what amazing new functionality you'll integrate with TMS AI Studio in your apps or what other or additional AI powered functionality you would like to see added in next versions of TMS AI Studio. Let us know in the blog comments below or via email!




Bruno Fierens




This blog post has not received any comments yet.



Add a new comment

You will receive a confirmation mail with a link to validate your comment, please use a valid email address.
All fields are required.



All Blog Posts  |  Next Post  |  Previous Post