Data reuse

Not every research endeavour needs to be addressed by creating new data. In fact, reusing data is an effective way to leverage what has already been done in the research community. The R in FAIR is thus important to keep in mind when planning your research endeavour. You can skip the time-consuming step of data collection and move on to data analysis, thereby saving resources and acknowledging other researchers’ work.

CC BY-SA 4.0 SangyaPundir via Wikimedia Commons
(altered: coloring)

Discovering data

To find language data that may be suitable for your research, CLARIN-CH recommends the following platforms:

  • SSH Open Marketplace

    The SSH Open Marketplace is a European discovery platform for resources from the Social Sciences and Humanities (SSH) field. It offers a wide range of language resources that can be browsed by category and keywords (e.g. L2 learner corpora, spoken corpora). It also includes the CLARIN Resource Families (see below.)
  • CLARIN VLO

    The Virtual Language Observatory is a search tool that provides over a million records of language resources that are hosted at CLARIN institutions. It harvests metadata from CLARIN centers and makes it accessible in a simple online-tool for searching and filtering language resources.
  • LaRS @ SWISSUbase

    SWISSUbase has recently been established as a national repository and data sharing platform, which is operated in partnership between FORS and the Universities of Zurich and Lausanne. It hosts the Language Repository of Switzerland (LaRS), where researchers from CLARIN-CH institutions publish and archive their language data.

These platforms all provide a metadata description of the datasets and language resources including information on accessibility and licensing. If you prefer searching by category, the CLARIN Resource Families (categorized into corpora / language tools / lexical resources) may be useful. Have a look at the resources provided by CLARIN-CH institutions to find examples from the Swiss research ecosystem (grouped by institution).

LiRI, the technical center of CLARIN-CH, also provides resources enabling data reuse. One good example is the Swissdox database: Swissdox@LiRI is an easy-to-use interface which allows researchers and students to download media data from all major and a number of minor media outlets in Switzerland. It is updated on a daily basis with 5000-6000 articles and can be used by all members of the CLARIN-CH institutions.

To learn more about the process of data discovery and how to decide which data to use, have a look at Chapter 7 in the CESSDA Data Management Expert Guide. Further information on Open Data and Data reuse can also be found on forschungsdaten.info.

Know your data

When re-using data, it is important to know precisely what the data is describing (and what it isn’t describing). The documentation of data is therefore crucial: The better the documentation, the more you will be able to understand the data. Metadata should be comprehensive and give you context on how the data has been created (see Metadata standards). In addition, to make sure that you are allowed to use the data, inform yourself about the terms and conditions (see Copyright).

The next step is familiarizing yourself with the data by looking at the individual data points and carrying out first analyses that could be relevant for your research project, e.g. descriptive statistics of a (sub)corpus you are using. It takes time to get to know the data, but the longer you work with the data, the more you will learn about it. It can be useful to keep a journal / logbook to write down insights you might have during the familiarization process.

Citing data

When you reuse data, attributing the data to its authors is critical. FORS, the Swiss Center of Expertise in the Social Sciences, has published a guide on Data Citation. The most important recommendations for researchers reusing data:

  • Cite all sources used in your publication or assignment (primary and secondary data), both in-text and in the reference list.
  • Cite the data directly, that is, where the data has been published and is accessible (for published data).
  • While citing the data, make sure that all the core components are included in the citation: data authors, publication year, title, version, data publisher, and persistent identifier. These are the minimal elements to be included.

Bornatici, C. & Fedrigo, N. (2023). Data Citation: How and Why Citing (Your Own) Data. FORS Guide No. 19, Version 1.0. Lausanne: Swiss Centre of Expertise in the Social Sciences FORS. doi:10.24449/FG-2023-00019

documentation-platform/data-reuse.txt · Last modified: 2024/01/12 14:20 (external edit)