Data sharing

How to share your language resources?

When sharing language data, the FAIR principles can serve you as a guide in the process of making your resource available to other researchers in a useful way and thereby contribute to facilitating knowledge discovery.

You have several options to increase the FAIR-ness of your data:

Sharing corpora

Publish and archive the data with LaRS@SWISSUbase

LaRS@SWISSUbase offers an easy-to-use and reliable platform for sharing your data. It has been established as a cross-disciplinary and FAIR-compliant national research data service in 2022. It includes a searchable catalogue with a growing number of studies and research data sets, for which SWISSUbase provides a solution for long-term storing.

Include the corpus on the Linguistic Corpus Platform (LCP)

The Linguistic Corpus Platform (LCP) is being developed at LiRI as a tool to make corpora searchable through a web interface:

The LCP can be accessed by all CLARIN-CH institutions and will offer the option to upload your own corpus for data exploration and analysis. The LCP uses its own query language which allows for powerful, complex queries on text data and time-aligned multimodal data, such as video recordings of sign language and interactional data.

If you want to find out more about how to use the LCP, have a look at the LCP documentation page.

Add the corpus to the SSH Open Marketplace

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the dataset item category).

Add your corpus on the webpage of the CLARIN Resource Families

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of corpora are listed:

Computer-Mediated Communication Corpora
Corpora of Academic Texts
Historical Corpora
L2 Learner Corpora
Legal Corpora
Literary Corpora
Manually Annotated Corpora
Multimodal Corpora
Newspaper Corpora
Oral History Corpora
Parallel Corpora
Parliamentary Corpora
Reference Corpora
Sign Language Resources
Spoken Corpora

You can contact us if you are interested in listing your corpus in one of these categories.

Sharing tools

Add your tool to the CLARIN Switchboard

The CLARIN Language Resource Switchboard is a tool that helps researchers to find a matching language processing web application for their data. After uploading a file or entering a URL, the Switchboard provides a list of available CLARIN tools to perform the task indicated by the researcher (e.g. Named Entity Recognition, lemmatization, POS-tagging).

Information on how to add your tool to the Switchboard Tool Registry is available on the GitHub page. See the CLARIN Switchboard website for a list of the currently available tools.

Add your tool to the SSH Open Marketplace

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the Tools & services item category).

Add your tool on the webpage of the CLARIN Resource Families

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of tools are currently listed:

Corpus Query Tools
Normalisation
Named Entity Recognition
Part-of-Speech Tagging and Lemmatisation
Tools for Sentiment Analysis

You can contact us if you are interested in listing your tool in one of these categories.

Sharing lexical resources

Add your lexical resource to the SSH Open Marketplace

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the Tools & services item category).

Add your lexical resource on the webpage of the CLARIN Resource Families

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of lexical resources are currently listed:

Language Models
Lexica
Dictionaries
Conceptual Resources
Glossaries
Wordlists

You can contact us if you are interested in listing your tool in one of these categories.

What are the recommended standard data formats?

You can consult this CLARIN page on format recommendations to check whether you are using one of the standardized formats. For converting data or file formats, consider the SSH Conversion Hub in order to find a suitable tool.

In addition, CLARIN-CH provides recommendations for data formats based on a community survey carried out in 2024. More information on standard data formats can be found here:

How can CLARIN-CH support me to increase the FAIRness of my resources?

Our engagement begins with personalized outreach. Researchers are contacted directly and invited to consider the above-mentioned options for making their data FAIR-compliant. Once a researcher expresses interest, we schedule a consultation to review their data and determine the best dissemination pathway. We evaluate the state of the dataset, including format, completeness of metadata, and potential need for preprocessing. For example, preparing a dataset for the LCP may require transforming files into a specific format of a set of relational CSV files, a task that can be challenging for researchers unfamiliar with tools such as Python or R. Where needed, CLARIN-CH provides direct assistance in formatting, metadata enrichment, and the technical steps required for depositing.

In some cases, — especially for concluded projects where researchers lack the time or capacity, — we manage the entire FAIR-ification process on their behalf to avoid the loss of a resource or tool.