CLARIN-CH Day 2025

This is the second edition of the CLARIN-CH Day. If you want to read more on the first edition, you can find information on its programme, order of events, book of abstracts, as well as the LinkedIn page of CLARIN-CH and more here:

Introduction

The CLARIN-CH Day 2025 continued the dialogue initiated during the 2024 edition. By bringing together researchers, data specialists, and legal experts, the evolving landscape of Open Research Data in linguistics and related disciplines is explored.

The theme of the event was “From challenges to progress: FAIR compliance and digital methods across the lifecycle of language data”.

Advancing FAIR compliance (Findable, Accessible, Interoperable, Reusable) for language data and tools, as well as integrating digital methods and tools to support researchers at every stage of the data lifecycle, were the central themes of the event, aiming for a holistic approach from data creation and curation to sharing and reuse.

The event revisited the key challenges identified in 2024 and provided a platform for researchers to present newly emerging challenges encountered in working with language data, both within and beyond linguistic research.

Through pitches and collaborative discussions, the CLARIN-CH Day 2025 aimed to deepen the exchange between communities, spotlight innovative tools and solutions, and strengthen the infrastructure and support network for responsible and sustainable data practices within the CLARIN-CH ecosystem.

Detailed programme and slides

Location: Anthropole, Chavannes-près Renens, room: 3185

Time	Session	Content
9:30-10:00	Registration and Welcome Coffee	Arriving at UNIL Registration and coffee for all participants
10:00-10:10	Introduction	Welcome and introduction to the CLARIN-CH Day 2025
10:10-11:00	Keynote Talk	FAIR compliance and digital sovereignty, Krister Lindén, FIN-CLARIN (session chair: Cristina Grisot)
11:00-12:30	Pitches Session 1	Analytical Tools and FAIR compliance (session chair Anita Auer; 10min + 5min per contribution) 1. Infrastructure for Small Languages: Open, FAIR, and Accessible Corpora with the LiRI Platform, Noah Bubenhofer & Not Battesta Soliva (University of Zurich) 2. GallRom: An Interconnected System for Linguistic & Philological Research, Nikolina Rajovic & Jonathan Schaber (University of Zurich) 3. From Keystrokes to Sentences: Processing Dynamic Writing Process Data, Margo Ulasik & Cerstin Mahlow (Zurich Uninversity of Applied Sciences) 4. Archiving and Dissemination of Data Through a Selective Approach. The case of the Equatoguinean Spanish Corpus, José Luis Losada Palenzuela & Sandra Schlumpf-Thurnherr (University of Basel) 5. Corv: a new free secure transcription tool, Hugo Hueber (University of Lausanne) General Discussion (15min)
12:30-13:30	Lunch onsite
13:30-15:15	Pitches Session 2	DMP and RDM practices (session chair: Joanna Blochowiak; 10min + 5min per contribution) 1. The CLARIN-CH FAIRification pipeline: concept and examples, Alexandru Craevschi (CLARIN-CH Technical Officer) 2. The Language Repository of Switzerland LaRS & Linguistic Research Infrastructure LiRI: a distributed CLARIN B-center, Christian Futter & Alexandru Craevschi (University of Zurich) 3. DMP and RDM practices and tools at UNIL, Carmen Jambé (University of Lausanne) 4. DMP documentation resources and data stewardship at UZH, Gorka Fraga Gonzalez (University of Zurich) 5. Data anonymization services at ZHAW, Tatiana Feketeova & Reto Bürgin (Zurich University of Applied Sciences) 6. Data stewardship services at UNIGE, Noémi Duperron (University of Geneva) Discussion (15min)
15:15-15:45	Coffee Break
15:45-17:00	Discussion	RDM practices for language data in the CLARIN-CH ecosystem (session chair: Cristina Grisot) 1. CLARIN-CH RDM WG: objectives and proposed actions 2. General discussion: comparative practices within different Swiss HEIs, challenges and synergies 3. Next steps

Keynote Talk

National coordinator of FIN-CLARIN as well as research director of the department of digital humanities at the University of Helsinki, Dr. Krister Lindén presented a talk on data sovereignty and FAIR compliance. Focusing on language technology applications and language resources in research infrastructures in his research, Krister Lindén presented insights on data sovereignty.

Abstract

Data sovereignty aims to ensure that data is available according to national legislation according to the FAIR principles. In addition, personal data may need special legal and ethical considerations represented by the CARE principles. To harmonize and facilitate access to data across Europe, the European Union has enacted a number of directives and regulations aiming to create a single market. However, in many cases the use of data for research purposes was carved out as an exception with some general guidelines and left to the member states resulting in a plethora of legal practices in different member states. So when talking about data sovereignty for EU and international research data used in cross-border projects, we may also need to consider whose data and whose sovereignty we are talking about. One of the goals of CLARIN as a Research Infrastructure is to harmonize access to research data for researchers of Social Sciences and Humanities in the CLARIN member countries.

The keynote introduced topics like language resources, FAIR compliance, CARE principles, and research infrastructures, before presenting how this had been implemented in practice in FIN-CLARIN through the CLARIN licencing scheme in accordance with EU legislation such as the GDPR and the EU Data Mining Exception.

Dr. Krister Lindén is Research Director of Language Technology at the University of Helsinki. He is the national coordinator of CLARIN in Finland and the Chair of the CLARIN National Coordinators Forum.

Pitches

In line with the keynote speech, participants presented their own talks revolving around FAIR compliance and the integration of digital methods and tools. These contributions were in the form of 5-7 minute pitches on one of the two central themes:

1. Advancing FAIR compliance for language data and tools describes the amount a dataset matches the FAIR principles. The FAIR Guiding Principles for scientific data management and stewardship uphold for data to be FAIR data, it should be findable, accessible, interoperable, and reusable. To ensure that data is findable, the data set should be accompanied by comprehensive metadata, as well as deposited in a repository that assigns persistent identifiers. To make data more accessible, a data access statement, describing the people who may use the data and to which conditions, can be written. A data set is interoperable when it is guaranteed to remain compatible with diverse software environments. Choosing a compatible license and providing detailed description of the data ensure its re-usability.
2. Integrating digital methods and tools for data management aims at focusing on the methodological opportunities and hurdles as well as the tools that facilitate ensuring a holistic approach at every stage of the data lifecycle in the digital realm. Starting with data creation via transformation, curation and ending in sharing and reuse, digital methods and tools accompany data every step of the way, making them integral to researchers involved with data, be it its creation or stewardship.

DMPonline and the Corv tool

In this section of the event, two new services developed and managed by research units at UNIL were presented.

To plan the lifecycle of research data, it is vital to create a Data mangement Plan (DMP). The Information Resources and Archives Department (UNIRIS) of UNIL is offering a service called DMPonline which helps its community of researchers to plan their data management (creation, collection, documentation, description, sharing and preservation). Additionally, the tool elaborates on the specific legal issues related to the (re-)use of data.

Clearly sensitive personal information must be adequately protected. At UNIL, the Corv platform offered by the Scientific Computing and Research Support Unit (DCSR) meets the necessary protection measures for transcriptions. Via a specific deletion method, the platform ensures the data is not traceable after download. More types of language data may, in the future, be protected by Corv.

Discussion and conclusion

After the presentations, a 30 minute-discussion for asking questions on the pitches and presentations, as well as discussing conclusions drawn by the speakers, took place. At the end of the section, conclusive remarks by Cristina Grisot rounded off the event.

RDM Working Group kick-off meeting

Early in September 2025, the CLARIN-CH Working Group Research Data Management (RDM WG) for language data launched. It aims to support and strengthen RDM practices across the Swiss linguistic research community. By leveraging the expertise of institutional data stewards and research support services as well as advancing collaborative development of resources, training, and support, the RDM WG will focus on improving awareness, capacity, and consistency in managing language data.

The Working Group had its first on-site kick-off meeting at the end of the CLARIN-CH Day 2025, interested people were invited to join the meeting.

Organizing committee

Anita Auer (UNIL)
Carmen Jambé (UNIL)
Joanna Blochowiak (CLARIN-CH)
Cristina Grisot (UZH, CLARIN-CH national coordinator)