
Research based on language data often requires specialized software and tools for data processing and analysis. A number of tools have been developed within CLARIN-CH institutions and are open to be used by other researchers – we recommend to consult their documentation before use. If you are a tool owner and are willing to share your asset with the community, please do not hesitate to contact us.

Tool nameTool typeFunctionalityURLCLARIN-CH institution
NematusMachine Translation ToolsAttention-based encoder-decoder model for neural machine translation built in Tensorflow. of Zurich
SwissBERTLanguage ModelSwissBERT is a masked language model for processing Switzerland-related text. It has been trained on more than 21 million Swiss news articles retrieved from Swissdox@LiRI. of Zurich
Subword-NMTWord SegmentationUnsupervised Word Segmentation for Neural Machine Translation and Text Generation of Zurich
NMTScoreText SimilarityNMTScore is a library of translation-based text similarity measures, providing reference-free evaluation by scoring translations based on neural machine translation models. of Zurich
ZmorgeMorphological AnalysisZmorge is a morphology tool that combines a lexicon that is automatically extracted from Wiktionary, and a modified version of the finite-state morphological grammar SMOR. The extraction script is open source, so that new versions of the lexicon can be extracted from future, expanded versions of Wiktionary. of Zurich
ParZuDependency Parsing ToolsParZu is a dependency parser for German. This means that it analyzes the linguistic structure of sentences and, among other things, identifies the subject and object(s) of a verb. of Zurich
clevertaggerPart-of-Speech Tagging and Lemmatisationclevertagger is a German part-of-speech tagger based on a CRF tool and SMOR. Its main component is a module that extracts features from SMOR's morphological analysis. of Zurich
BleualignSentence AlignmentBleualign is a tool to align parallel texts (i.e. a text and its translation) on a sentence level. of Zurich
Swiss German POS modelPart-of-Speech Tagging/Dependency Parsing and LemmatisationThe swiss_german_pos_model is a part-of-speech tagging model for Swiss German. The model is trained on Universal POS tags (upos). of Zurich
Swiss German STTS POS Tagging ModelPart-of-Speech Tagging/Dependency Parsing and LemmatisationThe swiss_german_pos_model is a part-of-speech tagging model for Swiss German. The model is trained on STTS POS Tags. Note that there is also a model trained on Universal POS tags (upos): swiss_german_pos_model. of Zurich
Swiss German XLM-RoBERTaMachine learning modelsThe xlm-roberta-base model (Conneau et al., ACL 2020) trained on Swiss German text data via continued pre-training. of Zurich
Swiss German CANINE-s modelMachine learning modelsPretrained CANINE model on Swiss German using a masked language modeling (MLM) objective. It was introduced in the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Google. of Zurich

Additionally, CLARIN centers all over Europe offer a wide variety of tools that help researchers explore and analyse language data: