CLARIN-CH Day 2025

CLARIN-CH Day 2025

Overview

This is the second edition of the CLARIN-CH Day. If you want to read more on the first edition, you can find information on its programme, order of events, book of abstracts, as well as the LinkedIn page of CLARIN-CH and more here:

Introduction

The CLARIN-CH Day 2025 continues the dialogue initiated during the  2024 edition. By bringing together researchers, data specialists, and legal experts, the evolving landscape of Open Research Data in linguistics and related disciplines is explored.

The theme of the event is “From challenges to progress: FAIR compliance and digital methods across the lifecycle of language data”.

Advancing FAIR compliance (Findable, Accessible, Interoperable, Reusable) for language data and tools, as well as integrating digital methods and tools to support researchers at every stage of the data lifecycle, will be the central themes of the event, aiming for a holistic approach from data creation and curation to sharing and reuse.

The event will revisit the key challenges identified in 2024 and will provide a platform for researchers to present newly emerging challenges encountered in working with language data, both within and beyond linguistic research.

Through pitches and collaborative discussions, CLARIN-CH Day 2025 aims to deepen the exchange between communities, spotlight innovative tools and solutions, and strengthen the infrastructure and support network for responsible and sustainable data practices within the CLARIN-CH ecosystem.

Detailed programme

TimeSessionContent
9:30-10:00Registration and Welcome CoffeeArriving at UNIL
Registration and coffee for all participants
10:00-10:10IntroductionWelcome and introduction to the CLARIN-CH Day 2025
10:10-11:00Keynote TalkFAIR compliance and digital sovereignty, Krister Lindén, FIN-CLARIN
(session chair: Cristina Grisot)
11:00-12:30Pitches Session 1Analytical Tools and FAIR compliance
(session chair Anita Auer; 10min + 5min per contribution)
1. Infrastructure for Small Languages: Open, FAIR, and Accessible Corpora with the LiRI Platform, Noah Bubenhofer & Not Battesta Soliva (University of Zurich)
2. GallRom: An Interconnected System for Linguistic & Philological Research, Nikolina Rajovic & Jonathan Schaber (University of Zurich)
3. From Keystrokes to Sentences: Processing Dynamic Writing Process Data, Margo Ulasik & Cerstin Mahlow (Zurich Uninversity of Applied Sciences)
4. Archiving and Dissemination of Data Through a Selective Approach. The case of the Equatoguinean Spanish Corpus, José Luis Losada Palenzuela & Sandra Schlumpf-Thurnherr (University of Basel)
5. Corv: a new free secure transcription tool, Hugo Hueber (University of Lausanne)
General Discussion (15min)
12:30-13:30Lunch onsite
13:30-15:15Pitches Session 2DMP and RDM practices
(session chair: Joanna Blochowiak; 10min + 5min per contribution)
1. The CLARIN-CH FAIRification pipeline: concept and examples, Alexandru Craevschi (CLARIN-CH Technical Officer)
2. The Language Repository of Switzerland LaRS & Linguistic Research Infrastructure LiRI: a distributed CLARIN B-center, Christian Futter & Alexandru Craevschi (University of Zurich)
3. DMP and RDM practices and tools at UNIL, Carmen Jambé (University of Lausanne)
4. DMP documentation resources and data stewardship at UZH, Gorka Fraga Gonzalez (University of Zurich)
5. Data anonymization services at ZHAW, Tatiana Feketeova & Reto Bürgin (Zurich University of Applied Sciences)
6. Slot for presentation of data stewardship services at other HEI member of CLARIN-CH, TBA
Discussion (15min)
15:15-15:45Coffee Break
15:45-17:00DiscussionRDM practices for language data in the CLARIN-CH ecosystem
(session chair: Cristina Grisot)
1. CLARIN-CH RDM WG: objectives and proposed actions
2. General discussion: comparative practices within different Swiss HEIs, challenges and synergies
3. Next steps

Keynote Talk

National coordinator of FIN-CLARIN as well as research director of the department of digital humanities at the University of Helsinki, Dr. Krister Lindén will present a talk on data sovereignty and FAIR compliance. Focusing on language technology applications and language resources in research infrastructures in his research, Krister Lindén will present insights on data sovereignty.
 

Abstract

Data sovereignty aims to ensure that data is available according to national legislation according to the FAIR principles. In addition, personal data may need special legal and ethical considerations represented by the CARE principles. To harmonize and facilitate access to data across Europe, the European Union has enacted a number of directives and regulations aiming to create a single market. However, in many cases the use of data for research purposes was carved out as an exception with some general guidelines and left to the member states resulting in a plethora of legal practices in different member states. So when talking about data sovereignty for EU and international research data used in cross-border projects, we may also need to consider whose data and whose sovereignty we are talking about. One of the goals of CLARIN as a Research Infrastructure is to harmonize access to research data for researchers of Social Sciences and Humanities in the CLARIN member countries.

The keynote will briefly introduce topics like language resources, FAIR compliance, CARE principles, and research infrastructures, before presenting how this has been implemented in practice in FIN-CLARIN through the CLARIN licencing scheme in accordance with EU legislation such as the GDPR and the EU Data Mining Exception.

Dr. Krister Lindén is Research Director of Language Technology at the University of Helsinki. He is the national coordinator of CLARIN in Finland and the Chair of the CLARIN National Coordinators Forum.

Pitches

In line with the keynote speech, participants will present their own talks revolving around FAIR compliance and the integration of digital methods and tools. These contributions will be in the form of 5-7 minute pitches on one of the two central themes:
 
    1. Advancing FAIR compliance for language data and tools describes the amount a dataset matches the FAIR principles. The FAIR Guiding Principles for scientific data management and stewardship uphold for data to be FAIR data, it should be findable, accessible, interoperable, and reusable. To ensure that data is findable, the data set should be accompanied by comprehensive metadata, as well as deposited in a repository that assigns persistent identifiers. To make data more accessible, a data access statement, describing the people who may use the data and to which conditions, can be written. A data set is interoperable when it is guaranteed to remain compatible with diverse software environments. Choosing a compatible license and providing detailed description of the data ensure its re-usability.
    2. Integrating digital methods and tools for data management aims at focusing on the methodological opportunities and hurdles as well as the tools that facilitate ensuring a holistic approach at every stage of the data lifecycle in the digital realm. Starting with data creation via transformation, curation and ending in sharing and reuse, digital methods and tools accompany data every step of the way, making them integral to researchers involved with data, be it its creation or stewardship.

If you would like to submit an abstract and present a pitch at the CLARIN-CH Day 2025, visit the pages below:

DMPonline and the Corv tool

In this section of the event, two new services developed and managed by research units at UNIL will be presented.
 
To plan the lifecycle of research data, it is vital to create a Data mangement Plan (DMP). The Information Resources and Archives Department (UNIRIS) of UNIL is offering a service called DMPonline which helps its community of researchers to plan their data management (creation, collection, documentation, description, sharing and preservation). Additionally, the tool elaborates on the specific legal issues related to the (re-)use of data.
 
Clearly sensitive personal information must be adequately protected. At UNIL, the Corv platform offered by the Scientific Computing and Research Support Unit (DCSR) meets the necessary protection measures for transcriptions. Via a specific deletion method, the platform ensures the data is not traceable after download. More types of language data may, in the future, be protected by Corv.

Discussion and conclusion

After the presentations, there will be 30 minutes for asking questions on the pitches and presentations, as well as discussing conclusions drawn by the speakers. At the end of the section, conclusive remarks by will round off the event.

RDM Working Group kick-off meeting

Early in September 2025, the new CLARIN-CH Working Group Research Data Management (RDM WG) for language data launched. It aims to support and strengthen RDM practices across the Swiss linguistic research community. By leveraging the expertise of institutional data stewards and research support services as well as advancing collaborative development of resources, training, and support, the RDM WG will focus on improving awareness, capacity, and consistency in managing language data.

The Working Group will have its first on-site kick-off meeting at the end of the CLARIN-CH Day 2025, interested people are invited to join the meeting.

The kick-off has been fully integrated into the CLARIN-CH Day. All interested colleagues are invited to join the CLARIN-CH Day.

The organising committee looks forward to meeting you at the CLARIN-CH Day 2025. There are no participation fees for the event and catering will be offered onsite. Participants are invited to cover for their travel expenses. 

Organizing committee

  • Anita Auer (UNIL)
  • Carmen Jambé (UNIL)
  • Joanna Blochowiak (CLARIN-CH)
  • Cristina Grisot (UZH, CLARIN-CH national coordinator)