User Tools

Site Tools


documentation-platform:start

Documentation Platform


Welcome to the CLARIN-CH Documentation Platform. Here you will find useful information relevant at the different steps of your data life-cycle, which are usually covered by the Data Management Plan.


How to share your language resources?

When sharing language data, the FAIR principles can serve you as a guide in the process of making your resource available to other researchers in a useful way and thereby contribute to facilitating knowledge discovery.

According to the FAIR principles, you should make your data Findable, Accessible, Interoperable and Re-Usable. You have several options to increase the FAIR-ness of your data:

For corpora

1️⃣ Publish and archive the data with SWISSUbase

2️⃣ Include the corpus in the Linguistic Corpus Platform (LCP)

3️⃣ Add a description and metadata about your tool on the SSH Open Marketplace

4️⃣ Add a description and metadata about your corpus on the webpage of the CLARIN Resource Families

For tools

1️⃣ Add your tool to the CLARIN Switchboard

2️⃣ Add a description and metadata about your tool on the SSH Open Marketplace

3️⃣ Add a description and metadata about your tool on the webpage of the CLARIN Resource Families

For lexical resources

1️⃣ Add a description and metadata about your tool on the SSH Open Marketplace

2️⃣ Add a description and metadata about your tool on the webpage of the CLARIN Resource Families


What are the recommended standard data formats?

Using standardized formats ensures that the data can be read/processed with widely used software. This makes your data easier to be integrated into various existing linguistic analysis tools or workflows, enhancing the accessibility and utility of your data.

Additionally, standardized data formats facilitate collaboration among researchers and institutions by reducing compatibility issues and promoting interoperability. This seamless exchange of linguistic data in a common format fosters a more open and collaborative research environment, accelerating the progress of linguistic studies and advancing our understanding of language in diverse contexts.

➡️ Researchers are encouraged to prioritize the use of standardized formats to maximize the impact of their work and contribute to the advancement of their field.

You can consult this CLARIN page on format recommendations to check whether you are using one of the standardized formats.


How to deal with legal aspects when it comes to linguistic data?

From the persective of sharing language data in the spririt of Open Science, the legal issues can be divided into two groups: intelectual property (chief copyright and related rights) and personal data protection.

To learn about these issues, please check the CLARIN ERIC documentation platform and read the documentation produced by the CLARIN Legal and Ethical issues Committee:

To learn about the application of these issues to concrete cases, such as using Twitter or social media data as research data, we recommend to read the following articles:

To learn more about the CLARIN normative layer and the work of the CLARIN Committee for Legal and Ethical Issues, please read this article:


How to deal with sensitive and personal data when it comes to linguistic data?

CLARIN-CH event (September 29, 2023)
When it comes to Open Research Data, the management of sensitive and personal data can be very challenging. In this context, CLARIN-CH organised an event focusing on data collection, protection and preservation and their associated procedures, with respect to different types of linguistic data (e.g., multimodal, historical, experimental, sociolinguistics, data from social media, data from different age groups).

  • Data collection: talk by Dagmar Jung - linguist specialized in the collection of naturalistic data in the field, metadata collection, secure file handling, workflows useful for the archiving process. Access the slides and the recording*. You will learn about the fact that data collection is the key to a successful management of sensitive data.
  • Data protection: talk by Violaine Michel Lange - data scientist, neurolinguist and NLP expert specialized in experimental data and developing NLP pipelines for data protection. Access the slides and the recording*. You will find a discussion about the European GDPR vs. the New Federal Act on Data Protection (nFDAP), and an example of an NLP pipeline for data anonymisation.
  • Data protection: talk by Miecznikowski-Fuenfschilling - professor at the Institute of Italian Studies and the Institute of argumentation, linguistics and semiotics of USI Università della Svizzera italiana, and Nina Profazi - research assistant in the project “Data-sharing skills in corpus-based research on talk-in-interaction”, which is part of the ORD program funded by SwissUniversities. Access the slides and the recording*. You will find a discussion about the process of de-identification of data.
  • Data preservation: talk by Thomas Schmidt - computer scientist specialized in the field of methodology and technology for working with audiovisual language data and in computer-assisted lexicography. Access the slides and the recording*. You will find a discussion about data management and preservation, a use-case about working with sensitive data (the FOLK project), and a method for data anonymisation.

*The recordings are password protected, contact us if you are interested in getting access.

DARIAH ELDAH Consent Form wizard
The DARIAH ELDAH Consent Form Wizard.7. is an online tool that enables researchers to quickly generate a GDPR-compliant consent form for collecting personal data for research purposes, but which can also be used, for example, for creating mailing lists or organizing academic events. Currently the tool is available in English, German, Italian and Croatian, although there are plans to have it translated to other languages. The tools is created by the members of the CLARIN Committee for Legal and Ethical Issues and of the DARIAH ELDAH Ethics and Legality in Digital Arts and Humanities Working Group.


documentation-platform/start.txt · Last modified: 2023/11/27 08:18 by Cristina Grisot

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki