Multilingual AI Models for Europe | GPI Translation Curation Corner

Europe’s digital sovereignty agenda saw the introduction of large language models (LLMs) with the launch of a new initiative referred to as OpenEuroLLM, aimed to develop open-source LLMs for all 24 EU languages including those of countries in EU accession talks, like Albania.

OpenEuroLLM involves about 20 organizations and is led by Jan Hajič of Charles University in Prague and Peter Sarlin from Finnish AI lab Silo AI.

The initiative aligns with Europe’s focus on digital sovereignty, aiming to localize critical infrastructure and tools. Major cloud providers are investing in local data management, and the EU has also secured an $11 billion deal to create a satellite network to compete with Starlink.

Developing Multilingual AI Models for Europe

The primary aim of the project is to develop a series of foundation models that reflect the linguistic and cultural diversity of all EU languages. Deliverables are still being determined, but there may be a core multilingual model and smaller versions suited for various applications.

Challenges exist in ensuring all languages receive equal development attention, particularly those with fewer digital resources. However, the previous High Performance Language Technologies (HPLT) project’s data work should provide valuable resources moving forward.

Despite funding discussions and the need for cooperation, OpenEuroLLM’s leaders believe that existing funds and infrastructure will ensure the project’s success, focusing on creating foundational models rather than consumer products. They believe the combination of academic and corporate strengths will foster new developments in European AI. The ultimate goal remains to establish a local, open-source AI foundation contributing to Europe’s digital sovereignty.

To read the full article, please visit Open source LLMs hit Europe’s digital sovereignty roadmap

Developing Multilingual AI Models for Europe

Further GPI Resources on Curation Topics