FAIR data

What does FAIR stand for?

According to the FAIR principles data should be findable, accessible, interoperable and reusable to maximise its reuse potential. The guiding principles behind FAIR can be summarized as follows.


Metadata and data documentation

To increase the findability and reusability of data, data should be shared with machine- and human-readable metadata. Metadata, also sometimes called data documentation, provides information about the context, the structure, the provenance and the content of a dataset with the aim to increase its usefulness. In essence, good data documentation should answer questions about the Who, What, When, Where, Why and How of data creation.

Metadata can either be maintained through a data archive/repository where you have to describe the characteristics of the data according to the information the repository requires from you. Alternatively, you can create a data documentation (README file), which contains additional information for the reuse of your data. As a rule, both are recommended: the information in the data repository is machine-readable and can thus be used for meta-analyses, while the README file facilitates the further use of the data by humans.

Metadata / data documentation should reflect the standards of the respective research discipline. By using such standards in terminology (i.e. controlled vocabularies), creators of datasets can ensure that the data is interoperable with other datasets that use the same standard terminology. Such standards can be found on FAIRsharing.org also for linguistics.

For more information on this topic, see the following page: Metadata standards


Persistent identifiers

A persistent identifier or PID is a permanent and unique reference link to a digital object, regardless of changes in the (online) location of that object. The services that provide such a reference link and which promise to keep these links permanently alive are so-called “resolver services”. More often than not, persistent identifiers are expressed as URLs that point to other URLs.

Persistent identifiers can be used not only for records and publications but also to uniquely reference individuals (e.g., with ORCiD ID: https://orcid.org). Having a persistent identifier allows datasets to be easily cited (the data will always be there where the URL/PID points to) and thus increases their findability.

Example:
Hanigan, Ivan (2012): Monthly drought data for Australia 1890-2008 using the Hutchinson Drought Index. The Australian National University Australian Data Archive. http://doi.org/10.4225/13/50BBFD7E6727A


Open formats

To increase interoperability of data, data should be usable regardless of software and operating system. To achieve this, you should share your data in an open, non-proprietary format. This overview by EPFL provides a current overview of appropriate, acceptable and closed formats by data type.

Note that open formats are important once you share your data. It is of course still possible to work with proprietary software while processing your data.

If you want to learn morn about data formats, go to the following page: Standard data formats


Access control and licensing

To increase reusability of data, data should be shared with a clear license to inform under which conditions the data can be reused. Licenses are legal instruments that allow the owners of datasets to control how their data is reused. For instance, if someone were to share their data with a CC-BY license, other users know that they can use the data, download, adapt, use it for whatever purpose, remix and share their new dataset with whatever license they want. They only need to attribute (i.e. cite) the original author and indicate any changes made. By sharing one’s data with a license, others no longer have to contact the original author or copyright holder to ask for permission to use the data – they can simply follow the license conditions.

For more information on this topic, see the following page: Copyright

In addition to licenses, data controllers can also specify under which conditions data can be accessed. That means that if there are legal, ethical or copyright-related clauses that might prevent you from sharing data openly, you can still make it FAIR by regulating access.

Generally, three levels of access are differentiated:

  • Open access: anyone with access to the internet can access and download the data
  • Registered access: Only users registered with the repository where the data is deposited can access and download data
  • Restricted access: Users who want to access data need to ask for access via the repository. Usually, they have to provide reasons for wanting access and the data controller (or data depositor) can then grant them access. Restricted access allows you to exert control over who has access to the data and under which conditions.
  • Additionally, data can also be made available under an embargo. That is, the data is already uploaded to the repository but will only be made publicly available after a certain amount of time has passed.

If data cannot be shared for ethical, legal, copyright-related or other reasons, data can still be made FAIR by sharing metadata or data documentation only.


How to make your data FAIR?

One easy step to make your data FAIR is to upload it on a FAIR-compliant repository. FAIR-compliant repositories assign persistent identifiers, ask for rich and machine-readable metadata, require you to choose a license, ensure that the metadata is always accessible even if the data is not, and ensure long-term preservation.

The Swiss National Science Foundation provides a checklist to verify the FAIRness of repositories as well as a list of recommended repositories to choose from.

Other steps include:

  • Ensure that your data upload to the repository includes a good data documentation that allows others to understand your data without having to read your paper
  • Use controlled vocabularies
  • Use an open format for data sharing
  • Choose an open license to allow the most reuse of your data

How does CLARIN enable FAIR data?

  • Through linked data (e.g. Virtual Language Observatory, Content Search or Virtual Collection Registry)
  • Via CLARIN Guidelines on Standards and Formats
  • Via Recommendations on licenses
  • Via rich vocabularies
  • Via metadata requirements
  • And more

More information can be found here: https://www.clarin.eu/fair


The CARE principles

The CARE principles complement the FAIR principles as they provide guidelines how to ensure ethically sound use of data from indigenous communities. Formulated by the Global Indigenous Data Alliance, the CARE principles are as follows:

  • Collective Benefit: Data ecosystems must be devised and operate in a manner that facilitates Indigenous Peoples in gaining advantages from the data
  • Authority to control: Recognition of Indigenous Peoples' rights and stakes in Indigenous data is imperative, and their authority to govern such data should be strengthened.

  • Responsibility: Those engaged with Indigenous data bear a responsibility to disclose how these data are utilized to uphold Indigenous Peoples' self-determination and mutual advantages.
  • Ethics: The primary focus at all phases of the data life cycle and throughout the data ecosystem should be on the rights and well-being of Indigenous Peoples

Resources

documentation-platform/fair-data.txt · Last modified: 2024/01/12 14:20 (external edit)