Welcome to the CLARIN-CH Documentation Platform. Here you will find useful information relevant at the different steps of your data life-cycle, which are usually covered by the Data Management Plan.
Overview of topics:
How to share your language resources?
When sharing language data, the FAIR principles can serve you as a guide in the process of making your resource available to other researchers in a useful way and thereby contribute to facilitating knowledge discovery.
According to the FAIR principles, you should make your data Findable, Accessible, Interoperable and Re-Usable. You have several options to increase the FAIR-ness of your data:
What are the recommended standard data formats?
Using standardized formats ensures that the data can be read/processed with widely used software. This makes your data easier to be integrated into various existing linguistic analysis tools or workflows, enhancing the accessibility and utility of your data.
Additionally, standardized data formats facilitate collaboration among researchers and institutions by reducing compatibility issues and promoting interoperability. This seamless exchange of linguistic data in a common format fosters a more open and collaborative research environment, accelerating the progress of linguistic studies and advancing our understanding of language in diverse contexts.
➡️ Researchers are encouraged to prioritize the use of standardized formats to maximize the impact of their work and contribute to the advancement of their field.
You can consult this CLARIN page on format recommendations to check whether you are using one of the standardized formats.
How to deal with legal aspects when it comes to linguistic data?
From the persective of sharing language data in the spririt of Open Science, the legal issues can be divided into two groups: intelectual property (chief copyright and related rights) and personal data protection.
To learn about these issues, please check the CLARIN ERIC documentation platform and read the documentation produced by the CLARIN Legal and Ethical issues Committee:
- Introduction to Copyright and Related Rights
To learn about the application of these issues to concrete cases, such as using Twitter or social media data as research data, we recommend to read the following articles:
- Kamocki P., Hannesschläger V., Hoorn E., Kelli A., Kupietz M., Lindén K. & Puksas A. (2021) Legal issues related to the use of Twitter data in language research. In M Monachini & M Eskevich (eds) , CLARIN Annual Conference Proceedings 2021. CLARIN Annual Conference Proceedings, CLARIN ERIC, Utrecht, pp. 150-153, CLARIN Annual Conference, 27/09/2021.
- Siegert, I., Varod, V.S., Carmi, N. and Kamocki, P., 2020. Personal data protection and academia: Gdpr issues and multi-modal data-collections. Online Journal of Applied Knowledge Management (OJAKM), 8(1), pp.16-31.
- Kamocki P. & Witt A. (2020). Privacy by design and language resources. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3423-3427).
To learn more about the CLARIN normative layer and the work of the CLARIN Committee for Legal and Ethical Issues, please read this article:
- Kamocki P., Kelli A. & Lindén K. (2022) The CLARIN Committee for Legal and Ethical Issues and the Normative Layer of the CLARIN Infrastructure. In CLARIN: The Infrastructure for Language Resources, edited by Darja Fišer and Andreas Witt, Berlin, Boston: De Gruyter, 2022, pp. 457-480.
How to deal with sensitive and personal data when it comes to linguistic data?
CLARIN-CH event (September 29, 2023)
When it comes to Open Research Data, the management of sensitive and personal data can be very challenging. In this context, CLARIN-CH organised an event focusing on data collection, protection and preservation and their associated procedures, with respect to different types of linguistic data (e.g., multimodal, historical, experimental, sociolinguistics, data from social media, data from different age groups).
- Data collection: talk by Dagmar Jung - linguist specialized in the collection of naturalistic data in the field, metadata collection, secure file handling, workflows useful for the archiving process. Access the slides and the recording*. You will learn about the fact that data collection is the key to a successful management of sensitive data.
- Data protection: talk by Violaine Michel Lange - data scientist, neurolinguist and NLP expert specialized in experimental data and developing NLP pipelines for data protection. Access the slides and the recording*. You will find a discussion about the European GDPR vs. the New Federal Act on Data Protection (nFDAP), and an example of an NLP pipeline for data anonymisation.
- Data protection: talk by Miecznikowski-Fuenfschilling - professor at the Institute of Italian Studies and the Institute of argumentation, linguistics and semiotics of USI Università della Svizzera italiana, and Nina Profazi - research assistant in the project “Data-sharing skills in corpus-based research on talk-in-interaction”, which is part of the ORD program funded by SwissUniversities. Access the slides and the recording*. You will find a discussion about the process of de-identification of data.
- Data preservation: talk by Thomas Schmidt - computer scientist specialized in the field of methodology and technology for working with audiovisual language data and in computer-assisted lexicography. Access the slides and the recording*. You will find a discussion about data management and preservation, a use-case about working with sensitive data (the FOLK project), and a method for data anonymisation.
*The recordings are password protected, contact us if you are interested in getting access.
DARIAH ELDAH Consent Form wizard
The DARIAH ELDAH Consent Form Wizard.7. is an online tool that enables researchers to quickly generate a GDPR-compliant consent form for collecting personal data for research purposes, but which can also be used, for example, for creating mailing lists or organizing academic events. Currently the tool is available in English, German, Italian and Croatian, although there are plans to have it translated to other languages. The tools is created by the members of the CLARIN Committee for Legal and Ethical Issues and of the DARIAH ELDAH Ethics and Legality in Digital Arts and Humanities Working Group.