How to share your language resources?

When sharing language data, the FAIR principles can serve you as a guide in the process of making your resource available to other researchers in a useful way and thereby contribute to facilitating knowledge discovery.

You have several options to increase the FAIR-ness of your data:

Sharing corpora

LaRS@SWISSUbase offers an easy-to-use and reliable platform for sharing your data. It has been established as a cross-disciplinary and FAIR-compliant national research data service in 2022. It includes a searchable catalogue with a growing number of studies and research data sets, for which SWISSUbase provides a solution for long-term storing.

The Linguistic Corpus Platform (LCP) is being developed at LiRI as a tool to make corpora searchable through a web interface:

The LCP can be accessed by all CLARIN-CH institutions and will offer the option to upload your own corpus for data exploration and analysis. The LCP uses its own query language which allows for powerful, complex queries on text data and time-aligned multimodal data, such as video recordings of sign language and interactional data.
If you want to find out more about how to use the LCP, have a look at the LCP documentation page.

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the dataset item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of corpora are listed:

  • Computer-Mediated Communication Corpora
  • Corpora of Academic Texts
  • Historical Corpora
  • L2 Learner Corpora
  • Legal Corpora
  • Literary Corpora
  • Manually Annotated Corpora
  • Multimodal Corpora
  • Newspaper Corpora
  • Oral History Corpora
  • Parallel Corpora
  • Parliamentary Corpora
  • Reference Corpora
  • Sign Language Resources
  • Spoken Corpora

You can contact us if you are interested in listing your corpus in one of these categories.

Sharing tools

The CLARIN Language Resource Switchboard is a tool that helps researchers to find a matching language processing web application for their data. After uploading a file or entering a URL, the Switchboard provides a list of available CLARIN tools to perform the task indicated by the researcher (e.g. Named Entity Recognition, lemmatization, POS-tagging).

Discover the CLARIN Switchboard

Information on how to add your tool to the Switchboard Tool Registry is available on the GitHub page. See the CLARIN Switchboard website for a list of the currently available tools.

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

Discover the SSH Open Marketplace

In order to register your corpus, you can follow these steps (choose the Tools & services item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of tools are currently listed:

  • Corpus Query Tools
  • Normalisation
  • Named Entity Recognition
  • Part-of-Speech Tagging and Lemmatisation
  • Tools for Sentiment Analysis

Discover the CLARIN Resource Families

You can contact us if you are interested in listing your tool in one of these categories.

Sharing lexical resources

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the Dataset item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of lexical resources are currently listed:

  • Language Models
  • Lexica
  • Dictionaries
  • Conceptual Resources
  • Glossaries
  • Wordlists

Discover the CLARIN Resource Families

You can contact us if you are interested in listing your lexical resource in one of these categories.

What are the recommended standard data formats?

You can consult this CLARIN page on format recommendations to check whether you are using one of the standardized formats. For converting data or file formats, consider the SSH Conversion Hub in order to find a suitable tool.

In addition, CLARIN-CH provides recommendations for data formats based on a community survey carried out in 2024. More information on standard data formats can be found here:

I want to share my data. How can I find a suitable repository?

While there are innumerable options for sharing research data, it makes sense to follow recommendations for repositories that ensure the FAIRness of your data and support open research data practices, such as this list given by the Swiss National Science Foundation (SNSF).

CLARIN-CH recommends the Language Repository of Switzerland (LaRS@SWISSUBase) and the Linguistic Corpus Platform (LCP), which are specifically tailored to linguistic data and free for members of CLARIN-CH institutions. Read more about suitable repositories for archiving language data in Switzerland: