Apertus: A Multilingual and Truly Open-Source LLM

31 Oct 2025

6 min

31 Oct 2025

6 min

In September 2025, Switzerland unveiled Apertus, a multilingual large language model (LLM) developed through a collaboration between ETH Zurich, EPFL, and the Swiss National Supercomputing Centre (CSCS). Unlike most commercial LLMs, Apertus was designed from the ground up as a fully open and transparent project, releasing not only its weights but every component of its creation, from data collection scripts to ethical alignment methods. Its goal is ambitious: to serve as a public, verifiable foundation for research, innovation, and trustworthy AI across Europe and beyond.

Strongest Data Compliance

From a data compliance perspective, the team behind Apertus has taken a particularly rigorous and transparent approach, setting a benchmark for responsible AI development. Their goal was not only to comply with current regulations but also to anticipate upcoming European requirements, such as the EU AI Act, which will come into force in August 2026.

Retroactive respect for opt-outs: While many AI projects respect website policies at the moment of data collection, Apertus goes further by applying these rules retroactively. If a website currently disallows the use of its content for AI training, Apertus removes all past snapshots of that site from its dataset, even if the content was collected at a time when such restrictions were not yet in place. This strict interpretation of consent protects publishers and content creators and sets a new ethical standard for dataset management.
Comprehensive filtering for safety and privacy: All training data undergoes a series of automated filtering processes to improve quality and mitigate risk. This includes the detection and removal of Personally Identifiable Information (PII), as well as the exclusion of toxic or harmful content. Notably, the team has also open-sourced the filtering tools themselves, making their approach auditable and reproducible (a rare move in the field, given the technical complexity of such filters).
Use of the Goldfish objective to prevenet data memorization: Apertus is trained using a variant of the Goldfish objective to limit the memorization and reproduction of the training data. In other words, this particular choice forces the LLM to avoid learning words "by hearth", reducing the risk that data contained in the training set can be exposed in the LLM's answers.
Alignment with EU AI Act requirements: To prepare for upcoming regulation, the Apertus team has published detailed compliance documentation describing the dataset’s composition, data sources, filtering methods, and consent mechanisms. These documents are specifically designed to meet the reporting obligations of the EU AI Act, giving Apertus a strong position ahead of the law’s enforcement in 2026. This proactive approach demonstrates a commitment not only to technical excellence but also to legal and ethical stewardship.

Multilingual and Ethical

Apertus has been trained on text data covering more than 1,800 languages, positioning it among the most linguistically diverse large language models available today. This extensive coverage not only reflects Switzerland’s multilingual identity but also addresses the global need for AI systems that serve speakers of both widely spoken and underrepresented languages.

Beyond its impressive language reach, Apertus introduces a unique ethical framework that guides its behavior and responses. At the heart of this framework lies the Swiss AI Charter, a set of guiding principles inspired by Switzerland’s constitutional values (including neutrality, consensus-building, federalism, multilingualism, cultural diversity, and a strong commitment to privacy).

The Swiss AI Charter consists of 11 core principles that define how the model should interact with users and process information:

Response Quality — Providing clear, accurate, and useful answers.
Knowledge and Reasoning Standards — Relying on verified facts and sound reasoning.
Respectful Communication — Ensuring fairness, courtesy, and accessibility.
Preventing Harm — Protecting safety and rejecting harmful or illegal requests.
Resolving Value Conflicts — Navigating trade-offs transparently while upholding core principles.
Professional Competence Boundaries — Educating without offering regulated or licensed advice.
Collective Decision-Making — Supporting fair and constructive group processes.
Autonomy and Personal Boundaries — Respecting user choice, privacy, and consent.
Long-term Orientation and Sustainability — Considering future risks and societal impacts.
Human Agency — Keeping humans in control of decisions and outcomes.
AI Identity and Limits — Being transparent about the model’s nature and constraints.

Each principle is further detailed through a set of clauses (typically three to nine per principle) that provide concrete guidance for the model’s behavior in practice.

Notably, these principles were not defined solely by researchers or policymakers. Instead, they were validated through a survey in which Swiss residents evaluated each principle on a scale, indicating how strongly they believed Apertus should adhere to it. This participatory approach grounds the model’s ethical alignment in democratic values and public consensus.

Truly Open-Source

When many AI companies today describe their large language models as “open-source,” what they actually mean is usually open-weights: the final trained parameters are released, but the data, code, and processes that produced them remain closed and undisclosed. Apertus breaks decisively from this trend by embracing full-stack openness.

Every component of Apertus’ creation has been released to the public under permissive open-source licenses, and is available in the technical report:

Data collection and cleaning: all scripts used to gather, filter, and preprocess the training corpus are freely available.
Model architecture and training code: from the base model definition to optimization strategies, every line of code needed to train Apertus is open.
Validation and benchmarking tools: including evaluation datasets and metrics, allowing external researchers to verify performance claims.
Ethical alignment framework: the design and implementation of the Swiss AI Charter principles, including the surveys and fine-tuning methods used to embed these values into the model.

This level of transparency is unprecedented. It means that any researcher, policymaker, or developer can not only inspect the final model but also reproduce its entire lifecycle, from the raw data to the final aligned weights. Such radical openness enables true scientific reproducibility, a cornerstone of credible AI research, and provides the foundation for open science in machine learning.

The release of Apertus marks a pivotal moment for AI in Europe. While most discussions around language models focus on benchmark scores and performance metrics, Apertus represents something deeper: a commitment to open science, accountability, and AI sovereignty. By making every layer of its development open and reproducible, it sets a new standard for transparency in machine learning and provides a model for how nations and institutions can collaboratively build AI systems that reflect their cultural values and regulatory principles.

In doing so, Apertus challenges the status quo of opaque, corporate-controlled AI and opens the door to a future where foundational models are treated as public infrastructure: tools to be trusted, inspected, and improved by all.

Strongest Data Compliance

Multilingual and Ethical

Truly Open-Source

Recommended by Dhiria

Link

Contacts