CLARIN-CH Day 2025
Overview
This is the second edition of the CLARIN-CH Day. If you want to read more on the first edition, you can find information on its programme, order of events, book of abstracts, as well as the LinkedIn page of CLARIN-CH and more here:
Introduction
The CLARIN-CH Day 2025 continues the dialogue initiated during the 2024 edition. By bringing together researchers, data specialists, and legal experts, the evolving landscape of Open Research Data in linguistics and related disciplines is explored.
The theme of the event is “From challenges to progress: FAIR compliance and digital methods across the lifecycle of language data”.
Advancing FAIR compliance (Findable, Accessible, Interoperable, Reusable) for language data and tools, as well as integrating digital methods and tools to support researchers at every stage of the data lifecycle, will be the central themes of the event, aiming for a holistic approach from data creation and curation to sharing and reuse.
The event will revisit the key challenges identified in 2024 and will provide a platform for researchers to present newly emerging challenges encountered in working with language data, both within and beyond linguistic research.
Through pitches and collaborative discussions, CLARIN-CH Day 2025 aims to deepen the exchange between communities, spotlight innovative tools and solutions, and strengthen the infrastructure and support network for responsible and sustainable data practices within the CLARIN-CH ecosystem.
Detailed programme
Time | Session | Content |
---|---|---|
9:30-10:00 | Registration and Welcome Coffee | Arriving at UNIL Registration and coffee for all participants |
10:00-10:10 | Introduction | Welcome and introduction to the CLARIN-CH Day 2025 |
10:10-11:00 | Keynote Talk | FAIR compliance and digital sovereignty, Krister Lindén, FIN-CLARIN (session chair: Cristina Grisot) |
11:00-12:30 | Pitches Session 1 | Analytical Tools and FAIR compliance (session chair Anita Auer; 10min + 5min per contribution) 1. Infrastructure for Small Languages: Open, FAIR, and Accessible Corpora with the LiRI Platform, Noah Bubenhofer & Not Battesta Soliva (University of Zurich) 2. GallRom: An Interconnected System for Linguistic & Philological Research, Nikolina Rajovic & Jonathan Schaber (University of Zurich) 3. From Keystrokes to Sentences: Processing Dynamic Writing Process Data, Margo Ulasik & Cerstin Mahlow (Zurich Uninversity of Applied Sciences) 4. Archiving and Dissemination of Data Through a Selective Approach. The case of the Equatoguinean Spanish Corpus, José Luis Losada Palenzuela & Sandra Schlumpf-Thurnherr (University of Basel) 5. Corv: a new free secure transcription tool, Hugo Hueber (University of Lausanne) General Discussion (15min) |
12:30-13:30 | Lunch onsite | |
13:30-15:15 | Pitches Session 2 | DMP and RDM practices (session chair: Joanna Blochowiak; 10min + 5min per contribution) 1. The CLARIN-CH FAIRification pipeline: concept and examples, Alexandru Craevschi (CLARIN-CH Technical Officer) 2. The Language Repository of Switzerland LaRS & Linguistic Research Infrastructure LiRI: a distributed CLARIN B-center, Christian Futter & Alexandru Craevschi (University of Zurich) 3. DMP and RDM practices and tools at UNIL, Carmen Jambé (University of Lausanne) 4. DMP documentation resources and data stewardship at UZH, Gorka Fraga Gonzalez (University of Zurich) 5. Data anonymization services at ZHAW, Tatiana Feketeova & Reto Bürgin (Zurich University of Applied Sciences) 6. Slot for presentation of data stewardship services at other HEI member of CLARIN-CH, TBA Discussion (15min) |
15:15-15:45 | Coffee Break | |
15:45-17:00 | Discussion | RDM practices for language data in the CLARIN-CH ecosystem (session chair: Cristina Grisot) 1. CLARIN-CH RDM WG: objectives and proposed actions 2. General discussion: comparative practices within different Swiss HEIs, challenges and synergies 3. Next steps |
Keynote Talk
Abstract
Data sovereignty aims to ensure that data is available according to national legislation according to the FAIR principles. In addition, personal data may need special legal and ethical considerations represented by the CARE principles. To harmonize and facilitate access to data across Europe, the European Union has enacted a number of directives and regulations aiming to create a single market. However, in many cases the use of data for research purposes was carved out as an exception with some general guidelines and left to the member states resulting in a plethora of legal practices in different member states. So when talking about data sovereignty for EU and international research data used in cross-border projects, we may also need to consider whose data and whose sovereignty we are talking about. One of the goals of CLARIN as a Research Infrastructure is to harmonize access to research data for researchers of Social Sciences and Humanities in the CLARIN member countries.
The keynote will briefly introduce topics like language resources, FAIR compliance, CARE principles, and research infrastructures, before presenting how this has been implemented in practice in FIN-CLARIN through the CLARIN licencing scheme in accordance with EU legislation such as the GDPR and the EU Data Mining Exception.
Dr. Krister Lindén is Research Director of Language Technology at the University of Helsinki. He is the national coordinator of CLARIN in Finland and the Chair of the CLARIN National Coordinators Forum.
Pitches
- Advancing FAIR compliance for language data and tools describes the amount a dataset matches the FAIR principles. The FAIR Guiding Principles for scientific data management and stewardship uphold for data to be FAIR data, it should be findable, accessible, interoperable, and reusable. To ensure that data is findable, the data set should be accompanied by comprehensive metadata, as well as deposited in a repository that assigns persistent identifiers. To make data more accessible, a data access statement, describing the people who may use the data and to which conditions, can be written. A data set is interoperable when it is guaranteed to remain compatible with diverse software environments. Choosing a compatible license and providing detailed description of the data ensure its re-usability.
- Integrating digital methods and tools for data management aims at focusing on the methodological opportunities and hurdles as well as the tools that facilitate ensuring a holistic approach at every stage of the data lifecycle in the digital realm. Starting with data creation via transformation, curation and ending in sharing and reuse, digital methods and tools accompany data every step of the way, making them integral to researchers involved with data, be it its creation or stewardship.
If you would like to submit an abstract and present a pitch at the CLARIN-CH Day 2025, visit the pages below:
DMPonline and the Corv tool
Discussion and conclusion
After the presentations, there will be 30 minutes for asking questions on the pitches and presentations, as well as discussing conclusions drawn by the speakers. At the end of the section, conclusive remarks by will round off the event.
RDM Working Group kick-off meeting
Early in September 2025, the new CLARIN-CH Working Group Research Data Management (RDM WG) for language data launched. It aims to support and strengthen RDM practices across the Swiss linguistic research community. By leveraging the expertise of institutional data stewards and research support services as well as advancing collaborative development of resources, training, and support, the RDM WG will focus on improving awareness, capacity, and consistency in managing language data.
The Working Group will have its first on-site kick-off meeting at the end of the CLARIN-CH Day 2025, interested people are invited to join the meeting.
The kick-off has been fully integrated into the CLARIN-CH Day. All interested colleagues are invited to join the CLARIN-CH Day.
Organizing committee
- Anita Auer (UNIL)
- Carmen Jambé (UNIL)
- Joanna Blochowiak (CLARIN-CH)
- Cristina Grisot (UZH, CLARIN-CH national coordinator)