Blog

All Blog Posts  |  Next Post  |  Previous Post



Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6

Today

TMS Software Delphi  Components tmsaistudioWe are pleased to announce the release of TMS AI Studio v1.6.0.0, introducing support for llama.cpp as a service in TTMSMCPCloudAI. With this update, you can seamlessly connect to a llama.cpp server and integrate it into your existing AI workflow, without changing your components or development model!

The Advantages of Running Models Locally

Local AI is no longer limited to “huge models or nothing”. Smaller and specialized models have improved significantly. For many application scenarios like summarization, rewriting, extraction, classification, or domain-specific assistance a smaller model can be surprisingly capable, especially when combined with good prompt design and tool integration.

Running models locally is becoming a practical and strategic choice, because the benefits are clear:

  • Cost: Cloud AI services are powerful, but usage-based pricing can be difficult to predict in real applications, especially once AI features are used frequently by end users. Running models locally can eliminate these costs entirely and makes AI usage easier to scale without surprises.
  • Privacy and control: Many applications handle sensitive user data, such as documents, internal notes, customer information, or proprietary knowledge. Even with trusted cloud providers, some customers and industries require that prompts and context never leave the local machine or internal network.
  • Offline: This matters not only for fully disconnected environments, but also for reliability. A local model keeps working even if a service is down, rate-limited, or temporarily unreachable.

Ollama or llama.cpp

Ollama uses llama.cpp internally, and TTMSMCPCloudAI already supports Ollama as an AI service. So why add llama.cpp as a separate service and which one is better?

Our goal is simple: provide flexibility. There is no universal “better” option. Ollama focuses on convenience and ease of setup while llama.cpp offers deeper configurability and often better performance.

Both Ollama and llama.cpp can run models purely on the CPU, without requiring a dedicated GPU. For many smaller workloads, this is already sufficient and avoids additional hardware investment. With a consumer GPU, models can run at least 5× faster, making local hosting suitable for chat, coding assistance, and larger workloads.

We ran some models ourselves on our Windows office machines to better understand the differences and see how far someone can get with avarage hardware. We used the prebuilt llama.cpp binaries and observed the following patterns:


CPU (Intel i7-9700)GPU (AMD 7900 XT)
Speed winnerllama.cppNo consistent winner
Speed differencellama.cpp consistently 24% faster than Ollama13–30% difference either way
Stabilityllama.cpp stable; Ollama occasional freezes llama.cpp stable; Ollama ocassional errors
Impact of output lengthLonger outputs reduce speed in both llama.cpp and OllamaNo impact

The differences largely depend on the hardware, workload, and configuration. Systems running Linux with an NVIDIA GPU might deliver better results due to driver maturity and broader optimization support. However, the safest approach is to test both in your own environment and choose the one that fits your setup best!

Conclusion

By supporting multiple local runtimes, TMS AI Studio gives you greater control over deployment decisions, helping you address key requirements such as cost, data privacy, offline availability, and infrastructure independence. You can continue using the same workflow and component interface while gaining even more flexibility in how and where your models run.

As local AI continues to evolve, our goal remains the same: providing developers with practical, flexible, and production-ready tools to integrate AI into real-world applications.



Tunde Keller


  1. Add AI superpower to your Delphi & C++Builder apps - Part 1: intro

  2. Add AI superpower to your Delphi & C++Builder apps - Part 2: function calling

  3. Add AI superpower to your Delphi & C++Builder apps - Part 3: multimodal LLM use

  4. Add AI superpower to your Delphi & C++Builder apps - Part 4: create MCP servers

  5. Add AI superpower to your Delphi & C++Builder apps - Part 5: create your MCP client

  6. Add AI superpower to your Delphi & C++Builder apps - Part 6: RAG

  7. Introducing TMS AI Studio: Your Complete AI Development Toolkit for Delphi

  8. Automatic invoice data extraction in Delphi apps via AI

  9. AI based scheduling in classic Delphi desktop apps

  10. Voice-Controlled Maps in Delphi with TMS AI Studio + OpenAI TTS/STT

  11. Creating an n8n Workflow to use a Logging MCP Server

  12. Supercharging Delphi Apps with TMS AI Studio v1.2 Toolsets: Fine-Grained AI Function Control

  13. AI-powered HTML Reports with Embedded Browser Visualization

  14. Additional audio transcribing support in TMS AI Studio v1.2.3.0 and more ...

  15. Introducing Attributes Support for MCP Servers in Delphi

  16. Using AI Services securely in TMS AI Studio

  17. Automate StellarDS database operations with AI via MCP

  18. TMS AI Studio v1.4 is bringing HTTP.sys to MCP

  19. Windows Service Deployment Guide for the HTTP.SYS-Ready MCP Server Built with TMS AI Studio

  20. Extending AI Image Capabilities in TMS AI Studio v1.5.0.0

  21. Try the Free TMS AI Studio RAG App

  22. Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6



This blog post has not received any comments yet.



Add a new comment

You will receive a confirmation mail with a link to validate your comment, please use a valid email address.
All fields are required.



All Blog Posts  |  Next Post  |  Previous Post