CLARIN-CH Day 2024: Open Research Data – Challenges and Opportunities

September 9th, University of Neuchâtel

The CLARIN-CH Day 2024 is already in the past! If you want to read a summary or see some impressions from the event in the meantime, you can find them here:

📢 First CLARIN-CH Day in the books

🌐 CLARIN-CH on LinkedIn

The 2024 event was the first of a series of annual meetings of the CLARIN-CH community. It was organized by the CLARIN-CH consortium in cooperation with its member institutions and aims to support the scientific community in their challenges when it comes to Open Research data. It seeked to foster exchange and to enable the encounter between researchers and data management experts, as well as legal experts.

The 2024 edition aimed to bring together experts and researchers to discuss challenges and opportunities, and to open the dialogue on standards and practices of open research data as well as the legal and ethical aspects of processing and sharing linguistic data. The event built on the work done by two CLARIN-CH Working Groups, which address essential topics related to Open Research Data.

The programme and the book of abstracts can be downloaded here:

📄 CLARIN-CH Day 2024 Programme

📄 CLARIN-CH Day 2024 Book of Abstracts

CLARIN-CH Day Flyer

The CLARIN-CH Day 2024 was divided into four parts:

1. ORD Project Presentations and Data Pitches

The four recent ORD projects within the CLARIN-CH ecosystem were presented first:

🔎 UpLORD

🔎 FAIR-FI-LD

🔎 CHORD-talk-in-interaction

🔎 Swiss-AL

📄 Go to slides

📄 Go to slides

📄 Go to slides

📄 Go to slides

2. Data pitches

Researchers then presented the challenges and opportunities they had discovered in handling their research data with respect to Open Science principles and receive insightful feedback from peers and experts.

📁 Data pitches: Presentation slides

Topics

Copyright describes the rights that creators have over their literary and artistic works, including their data. Researchers can encounter several challenges related to copyright when handling their data. These challenges can impact data sharing and reuse. Some key issues include questions around what can be shared, how to attribute sources, and whether specific data can be used freely or requires permission. Copyright considerations also come into play when deciding how to license the data for sharing.
When it comes to data protection and the management of personal and sensitive data, several critical issues arise: Researchers need to find the right balance between sharing data for research purposes and safeguarding individuals’ privacy. De-identification techniques can help here. They come with their own risks, however: It is quite impossible to render a dataset completely anonymous without also jeopardizing data utility, other datasets and additional information can potentially lead to re-identification of individuals and some types of linguistic data are less suited for these techniques. Security measures need to be taken to safe-guard personal and sensitive data, which poses additional problems, e.g. in collaborative projects. Also, linguists collecting data in other countries or from specific target groups need to navigate the complex legal landscape to ensure that the management and sharing of linguistic data comply with all relevant laws and regulations.
With regard to data formats (e.g. audio, video, text) and their technical aspects, linguists can also encounter various challenges. These include integrating diverse data formats for comprehensive linguistic analyses, harmonizing multimodal data, with annotating linguistic data consistently across formats, maintaining uniformity in transcription conventions, in part-of-speech tagging and in semantic labeling, with storing large-scale data (especially video) efficiently and having appropriate retrieval solutions, with optimizing storage formats, indexing and query performance, and with applying standardized formats across various systems.
To store and share their data, linguists are further presented with the issue of selecting appropriate repositories that align with their data type (and the intended audience), of storing the data in appropriate (standardized) formats, and of working collaboratively while effectively with other team members. Additionally, they are faced with the challenges of safeguarding sensitive and personal data while also making it accessible and taking into account expectations and value-systems of participants, as well as ensuring the researchers’ responsibility towards the population from which data has been gathered (according to the CARE principles).

3. Keynote Speech

Suzanna Marazza, jurist and legal consultant from the CCdigitallaw center (USI) held a keynote speech presenting a case study from different theoretical perspectives regarding copyright and data protection:

From closed to open: how to deal with copyright law in linguistic research

The need to reuse texts, videos, interviews, and other works in the field of linguistic research is manifold, and all face the same obstacle: copyright law. My contribution aims to highlight the conflicts between the interests at stake in the cases discussed throughout the morning and to explore the possibilities offered by the Swiss legal system for addressing them in the safest and most satisfactory way possible.

📄 Slides Keynote Speech

4. World Café

Participants took part in discussions with the invited experts:

Brian Kleiner (FORS) - data protection
Simone Mäder (UNIBAS) - data protection
Suzanna Marazza (CCdigitallaw, USI) - copyright and other legal aspects
Gerold Schneider (LiRI, UZH) - data formats and technical issues
Teodora Vukovic (LiRI, UZH) - data formats and technical issues
Christian Futter (UB UZH, Open Science Services and the Language Repository of Switzerland) - data storing and sharing
Stefanie Strebel (UB UZH, Open Science Services and the Language Repository of Switzerland) - data storing and sharing)

The world café format provided time for in-depth discussions, allowing participants to explore solutions to the challenges presented during the data pitches. This interactive setting was designed to foster meaningful exchanges and collaborative brainstorming.

Organizing committee:

Anita Auer (UNIL)
Cristina Grisot (UZH, CLARIN-CH national coordinator)
Martin Hilpert (UNINE)
Julia Krasselt (ZHAW)
Martin Luginbühl (UNIBAS)
Johanna Miecznikowski-Fünfschilling (USI)
Seraina Nadig (CLARIN-CH)
Melanie Röthlisberger (UZH)
Simon van Rekum (ZHAW)

The event was organised with the financial support of the CLARIN-CH Consortium, the Swiss Academy for Humanities and Social Sciences, the Zurich University of Applied Sciences and hosted by the University of Neuchâtel.