University of Basel


The University of Basel is represented in the CLARIN-CH Consortium by RISE (Research and Infrastructure Support).


The community from the University of Basel provides CLARIN-CH language resources and expertise in language sciences.

Language resources

1.The KompAS database contains 180 argumentative interactions with a total of 720 Swiss-German schoolchildren in grades 2 to 6. The data was collected, transcribed and annotated in the Competence levels of oral argumentation among schoolchildren research project. Access to a visualisation of the annotations here.

2. The It-Ist_CH corpus includes texts selected with the intention of representing, as broadly as possible, the diaphasic variability within the “Swiss institutional Italian” variety, balancing the sample on the basis of the criteria of quantitative consistency and communicative relevance of each textual genre and subgenre of this variety. The texts, collected manually and respecting, as far as possible, the original graphic characteristics (bold, italics, paragraphing, etc.), are cut to 1,500 words (with some justified exceptions: as in the case of the central texts of the legislative system, which have been collected in their entirety). The date of publication is limited to the last decade 2010-2020 (again with some exceptions: as in the case of some laws that, although adopted before 2010, were still in force at the time of the corpus' constitution). The corpus can be accessed here.

3. The C-ORAL-ROM is a multilingual reference corpus of spontaneous speech for the main Romance Languages (French, Italian, Portuguese and Spanish) recorded in free situations, roughly 300,000 words for each Language (Informal speech 50%, Formal speech 50%, including media and telephone conversations). The corpus design simultaneously ensures representation of spontaneous speech for each language and comparability throughout the four Romance corpora. Access here.

4. The MemTet Corpus (Corpus of texts published in the Orient between 1880 and 1930) was elaborated in the years 2003-2004 within the framework of the SNF project “Between tradition and modernity: Judeo-Spanish in the Oriente between 1880 and 1930”. This is a large representative textual corpus of modern written Judeo-Spanish. It includes more than half a million words and consists of works of different types (stories and novels; theatre; humorous texts, jokes and anecdotes; administrative texts, statutes and regulations; speeches and addresses; conferences; journalistic texts) published in different cities (Salonika, Istanbul, Izmir, Jerusalem, Cairo, Sofia, Ruse and Sarajevo). On the other hand, it has a diamestic and diaphasic limitation, as it is composed only of written texts (printed in Hebrew aljami) of non-religious subject matter, and it is also limited chronologically, as it covers half a century, from 1880 to 1930. If interested in this corpus, contact Dr. Ángel Berenguer Amador.

Faculties and Departments involved in CLARIN-CH

Faculty of Arts and Humanities

Seminar of Anglophone Linguistics and Literary Studies

Seminar of French Linguistics and Literary Studies

Seminar of German Linguistics and Literary Studies

Seminar of Ibero-Romance Studies

Seminar of Italian Studies

Seminar of Nordic Studies

Seminar of Slavic Studies

resources/unibas.txt · Last modified: 2024/01/24 16:01 by Seraina Nadig