How to share your language resources?

When sharing language data, the FAIR principles can serve you as a guide in the process of making your resource available to other researchers in a useful way and thereby contribute to facilitating knowledge discovery.

You have several options to increase the FAIR-ness of your data:

Sharing corpora

LaRS@SWISSUbase offers an easy-to-use and reliable platform for sharing your data. It has been established as a cross-disciplinary and FAIR-compliant national research data service in 2022. It includes a searchable catalogue with a growing number of studies and research data sets, for which SWISSUbase provides a solution for long-term storing.

The Linguistic Corpus Platform (LCP) is being developed at LiRI as a tool to make corpora searchable through a web interface:

The LCP can be accessed by all CLARIN-CH institutions and will offer the option to upload your own corpus for data exploration and analysis. The LCP uses its own query language which allows for powerful, complex queries on text data and time-aligned multimodal data, such as video recordings of sign language and interactional data.

If you want to find out more about how to use the LCP, have a look at the LCP documentation page.

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the dataset item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of corpora are listed:

  • Computer-Mediated Communication Corpora
  • Corpora of Academic Texts
  • Historical Corpora
  • L2 Learner Corpora
  • Legal Corpora
  • Literary Corpora
  • Manually Annotated Corpora
  • Multimodal Corpora
  • Newspaper Corpora
  • Oral History Corpora
  • Parallel Corpora
  • Parliamentary Corpora
  • Reference Corpora
  • Sign Language Resources
  • Spoken Corpora

You can contact us if you are interested in listing your corpus in one of these categories.

Sharing tools

The CLARIN Language Resource Switchboard is a tool that helps researchers to find a matching language processing web application for their data. After uploading a file or entering a URL, the Switchboard provides a list of available CLARIN tools to perform the task indicated by the researcher (e.g. Named Entity Recognition, lemmatization, POS-tagging).

Information on how to add your tool to the Switchboard Tool Registry is available on the GitHub page. See the CLARIN Switchboard website for a list of the currently available tools.

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the Tools & services item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of tools are currently listed:

  • Corpus Query Tools
  • Normalisation
  • Named Entity Recognition
  • Part-of-Speech Tagging and Lemmatisation
  • Tools for Sentiment Analysis

You can contact us if you are interested in listing your tool in one of these categories.

Sharing lexical resources

The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field.

In order to register your corpus, you can follow these steps (choose the Tools & services item category).

The CLARIN Resource Families website provides an overview of the available language resources in the CLARIN infrastructure per data type. The following types of lexical resources are currently listed:

  • Language Models
  • Lexica
  • Dictionaries
  • Conceptual Resources
  • Glossaries
  • Wordlists

You can contact us if you are interested in listing your tool in one of these categories.

What are the recommended standard data formats?

You can consult this CLARIN page on format recommendations to check whether you are using one of the standardized formats. For converting data or file formats, consider the SSH Conversion Hub in order to find a suitable tool.

In addition, CLARIN-CH provides recommendations for data formats based on a community survey carried out in 2024. More information on standard data formats can be found here:

How can CLARIN-CH support me to increase the FAIRness of my resources?

Our engagement begins with personalized outreach. Researchers are contacted directly and invited to consider the above-mentioned options for making their data FAIR-compliant. Once a researcher expresses interest, we schedule a consultation to review their data and determine the best dissemination pathway. We evaluate the state of the dataset, including format, completeness of metadata, and potential need for preprocessing. For example, preparing a dataset for the LCP may require transforming files into a specific format of a set of relational CSV files, a task that can be challenging for researchers unfamiliar with tools such as Python or R. Where needed, CLARIN-CH provides direct assistance in formatting, metadata enrichment, and the technical steps required for depositing.

In some cases, — especially for concluded projects where researchers lack the time or capacity, — we manage the entire FAIR-ification process on their behalf to avoid the loss of a resource or tool.