Breaking the language barrier on LLMs: TILDE’s Vision for a Multilingual Europe

Powered by AI-BOOST and European funding, the Latvian language technology leader TILDE achieved a major milestone by successfully training TildeOpen, a foundational large language model (LLM) designed specifically for European languages.

This breakthrough was made possible through the Large AI Grand Challenge, an initiative led by the AI-BOOST project, in collaboration with the European Commission, and EuroHPC JU.

The problem: the "English-centric" wall

Despite the diversity of the European Union, most current multilingual LLM models are heavily skewed towards English. Over 90% of training data in mainstream models is English-based, leaving less than 10% for the rest of the world’s languages combined.

For Europe, this creates several critical issues:

  • Cultural & nuance gap: Smaller languages lack the cultural depth and nuanced context in AI outputs.
  • Quality disparity: Users of Balto-Slavic and other European languages often receive lower-quality results from global AI tools.
  • Sovereignty & security: Strict European regulations and data concerns make it difficult for industries and the public sector to rely on commercial, closed-source models hosted outside the EU.

For Europe to be truly sovereign in AI, we cannot depend on English-centric models built outside our borders.

The solution: TildeOpen LLM

TILDE’s response to this challenge is TildeOpen, a foundational Large Language Model with 30 billion parameters. Unlike general models, TildeOpen is built with “language equity” at its core. It focuses on equal representation for European languages, including those of candidate countries like Ukraine, Albania, and Serbia, as well as Nordic languages like Norwegian and Icelandic.

By utilising advanced tokenization and an architecture adapted to morphology-rich languages, TILDE has created a model that is significantly more efficient and accurate for European speakers than global giants like GPT-4o or Llama 3 in specific linguistic contexts.

The value: why TILDE is a game-changer

TILDE isn’t just building another chatbot, they are building the foundation for Europe’s Multilingual AI Ecosystem. Their value proposition lies in three key areas:

  • Linguistic excellence: They provide specialised support for Balto-Slavic languages, serving 155 million individuals who are often underserved by mainstream tech.
  • Sovereign & safe AI: TildeOpen enables the creation of sovereign AI applications for the public sector, healthcare, and finance – ensuring data stays under EU protection.
  • Open access: By releasing the model under an open-source license for non-commercial use, TILDE is empowering European researchers and SMEs to build their own custom solutions without starting from scratch.

The boost: the impact of AI-BOOST and F6S

Being part of the AI-BOOST project was the catalyst TILDE needed to turn an ambitious vision into reality. The Large AI Grand Challenge, managed via the F6S platform, provided the essential “rocket fuel” for the project:

  • Financial and technical support: As a winner, TILDE received €250k in prize money and 2 million GPU hours on the LUMI supercomputer in Finland.
  • Unprecedented speed: This massive computing power allowed TILDE to reduce training times from years to just months.
  • Strategic visibility: The competition identified TILDE as one of Europe’s most promising AI companies, providing the strategic backing of the European Commission.

This open-source release was only possible thanks to the AI-BOOST Large AI Grand Challenge, which gave us access to the EuroHPC LUMI supercomputer and enabled our team to create a resource for the whole European AI ecosystem. This participation enabled us to do really innovative work. TildeOpen can now serve as a real foundation for a safe, sovereign, and multilingual AI ecosystem across Europe.

Facebook
LinkedIn
X