The Portuguese Prime Minister announced at the Web Summit the development of AMALIA – an Automatic Multimodal Language Assistant with Artificial Intelligence, the first Large-Scale Language Model (LLM) designed specifically for the Portuguese language.
With an investment of 5.5 million euros from the Recovery and Resilience Plan (PRR), this is a pioneering project. The team responsible for developing AMALIA includes members from the Instituto de Telecomunicações, a research center associated with Técnico, Unbabel, a spin-off company from Técnico, the NOVA University of Lisbon, and the Foundation for Science and Technology.
The Center for Responsible AI, established under the Next Gen project [read more], will provide expert oversight, ensuring the application of best practices in LLM development, emphasizing ethical and responsible AI. The beta version of Amália is scheduled for release in 2025, free of charge and in an open-source format.
The Agency for Administrative Modernization (AMA) will oversee the implementation and dissemination of the initiative across public and private entities. Meanwhile, the Foundation for Science and Technology (FCT) will manage the training and development of the model, working closely with research centers. Their tasks include securing the necessary infrastructure for training and hosting the LLM, as well as curating and processing the data required for its development.
AMALIA represents a cornerstone of the National Artificial Intelligence Agenda, set to be unveiled in the first quarter of 2025. This initiative places Portugal at the forefront of LLM development for the Portuguese language, aligning with global advancements in artificial intelligence.
The development of large-scale language models has progressed rapidly over the years. Early models like GPT and BERT, launched in 2018, were trained on corpora containing billions of words. For instance, the original GPT was trained on BookCorpus, which included 985 million words, while BERT utilized a combination of BookCorpus and English Wikipedia, totaling 3.3 billion words. Since then, training datasets have grown exponentially, encompassing hundreds of billions or even trillions of tokens, underscoring the remarkable progress in this field.
André Martins, our researcher and head of the research area at Unbabel, stated to Jornal de Negócios that this model constitutes "an important step for Europe to ensure its sovereignty in the area of Artificial Intelligence (AI), reducing the innovation gap with the United States and China.”
With AMALIA, Portugal is making a significant contribution to the global AI landscape, advancing accessibility and innovation in large-scale models tailored to the Portuguese-speaking community.
Project: