According to the FAIR principles data should be findable, accessible, interoperable and reusable to maximise its reuse potential. The guiding principles behind FAIR can be summarized as follows.
Source: Open Science Training Handbook
To increase the findability and reusability of data, data should be shared with machine- and human-readable metadata. Metadata, also sometimes called data documentation, provides information about the context, the structure, the provenance and the content of a dataset with the aim to increase its usefulness. In essence, good data documentation should answer questions about the Who, What, When, Where, Why and How of data creation.
Metadata can either be maintained through a data archive/repository where you have to describe the characteristics of the data according to the information the repository requires from you. Alternatively, you can create a data documentation (README file), which contains additional information for the reuse of your data. As a rule, both are recommended: the information in the data repository is machine-readable and can thus be used for meta-analyses, while the README file facilitates the further use of the data by humans.
Metadata / data documentation should reflect the standards of the respective research discipline. By using such standards in terminology (i.e. controlled vocabularies), creators of datasets can ensure that the data is interoperable with other datasets that use the same standard terminology. Such standards can be found on FAIRsharing.org also for linguistics.
For more information on this topic, see the following page: Metadata standards
A persistent identifier or PID is a permanent and unique reference link to a digital object, regardless of changes in the (online) location of that object. The services that provide such a reference link and which promise to keep these links permanently alive are so-called “resolver services”. More often than not, persistent identifiers are expressed as URLs that point to other URLs.
Persistent identifiers can be used not only for records and publications but also to uniquely reference individuals (e.g., with ORCiD ID: https://orcid.org). Having a persistent identifier allows datasets to be easily cited (the data will always be there where the URL/PID points to) and thus increases their findability.
Example:
Hanigan, Ivan (2012): Monthly drought data for Australia 1890-2008 using the Hutchinson Drought Index. The Australian National University Australian Data Archive. http://doi.org/10.4225/13/50BBFD7E6727A
To increase interoperability of data, data should be usable regardless of software and operating system. To achieve this, you should share your data in an open, non-proprietary format. This overview by EPFL provides a current overview of appropriate, acceptable and closed formats by data type.
Note that open formats are important once you share your data. It is of course still possible to work with proprietary software while processing your data.
If you want to learn morn about data formats, go to the following page: Standard data formats
To increase reusability of data, data should be shared with a clear license to inform under which conditions the data can be reused. Licenses are legal instruments that allow the owners of datasets to control how their data is reused. For instance, if someone were to share their data with a CC-BY license, other users know that they can use the data, download, adapt, use it for whatever purpose, remix and share their new dataset with whatever license they want. They only need to attribute (i.e. cite) the original author and indicate any changes made. By sharing one’s data with a license, others no longer have to contact the original author or copyright holder to ask for permission to use the data – they can simply follow the license conditions.
For more information on this topic, see the following page: Copyright
In addition to licenses, data controllers can also specify under which conditions data can be accessed. That means that if there are legal, ethical or copyright-related clauses that might prevent you from sharing data openly, you can still make it FAIR by regulating access.
Generally, three levels of access are differentiated:
If data cannot be shared for ethical, legal, copyright-related or other reasons, data can still be made FAIR by sharing metadata or data documentation only.
One easy step to make your data FAIR is to upload it on a FAIR-compliant repository. FAIR-compliant repositories assign persistent identifiers, ask for rich and machine-readable metadata, require you to choose a license, ensure that the metadata is always accessible even if the data is not, and ensure long-term preservation.
The Swiss National Science Foundation provides a checklist to verify the FAIRness of repositories as well as a list of recommended repositories to choose from.
Other steps include:
More information can be found here: https://www.clarin.eu/fair
The CARE principles complement the FAIR principles as they provide guidelines how to ensure ethically sound use of data from indigenous communities. Formulated by the Global Indigenous Data Alliance, the CARE principles are as follows: