University of Lausanne

The University of Lausanne is represented in the CLARIN-CH Consortium by Prof. Anita Auer, from the English Department.

The community from the University of Lausanne provides CLARIN-CH language resources and expertise in language sciences.

Language resources

1. The Interrogation Programme & Supersenses Extraction (IPSE) was developed at the Italian Department. The IPSE programme was created to rethink the perception of the text within the so-called 'digital galaxy'. The prime aim is to experiment the programme in the courses and seminars of Italian literature and linguistics. Once the characteristics of the texts are discussed and rethought, this instrument can function as an instrument of information (to collect selected textual data) or as a source of statistical retrieval. To do so, IPSE does not merely Interrogate texts on the basis of a Programme of recognition of sequences of characters, but also on the more recent semantic categories of 'Supersenses' and Extraction of general categories, semantic and logical, to bring together more texts or textual data in digital form collected in a corpus. The automatic tagging of the texts is 63% reliable, and is based on the 44 general categories of WordNet, a semantic lexicon developed by George Miller (Princeton). The programme, conceived for the interrogation of Italian texts, can be applied to texts in other languages, such as English and French.

2. The Lyra database was developed at the Italian Department. It offers access to detailed descriptions of printed poetry collections published in the 16th-18th centuries. The description model is defined on the basis of specific philological and literary-historical interests and includes a bibliographic description of the individual books, followed by detailed information on the content of the texts (location, rubrics, incipit, metrical form and rhyme scheme), the validity of the attributions and the authors. The system is able to produce a summary of the occurrences of each text in the different collections. A search module is available to query the data by cross-referencing the different indicators and offers the results in response.

3. The ENIAT database of digitized texts was developed at the Department of South Asian and Slavonic Languages and Cultures. It consists of digitized texts of Early New Indo-Aryan literature.

4. The FLORALE database of audio recordings, developed at the School of French as a Foreign Language, is useful for teaching and learning the comprehension of everyday spoken French. Florale is constituted of transcriptions of radio broadcasts from France and French-speaking Switzerland (documentaries recorded on the spot and interviews) and provides access to nearly 200 phenomena characteristic of spontaneous spoken French. Unlike most corpora where the annotation is carried out automatically, the language features selected to constitute the Florale database are annotated manually. Although much longer, such a procedure makes it possible to guarantee pedagogical reliability in the data, and at the same time to highlight numerous linguistic features of spoken French in its phonetic, morphosyntactic, discursive and lexical dimensions, i.e. more than a thousand annotations per hour of recording.

5.The website consists of didactic resources useful to learn French, its pronunciation, its reading and writing, and the practice of its grammar. These resources irrigate this intermediary space between teachers and learners where imagination grows and a common future in the language of letters and books takes root.

6.The COSUIZA is the Corpus de documentos hispánicos de Suiza (Corpus of Hispanic documents in Switzerland). It consists of ancient non-literary manuscripts texts in Spanish preserved in Swiss archives, edited by a team of linguists and students from the Spanish Section. The editions follow the criteria of the CHARTA’s network (Corpus Hispánico y Americano en la Red: Textos antiguos) and are available on the TEITOK platform since 2021. The number of documents that are already in COSUIZA (60 in February 2022) are held in the following Swiss archives and libraries: Universitätsbibliothek Basel, Burgerbibliothek Bern, Bibliothèque de Genève, Archives de l'État du Valais, Archives cantonales vaudoises and Bibliothèque cantonale et universitaire de Lausanne. Some of these documents are in Switzerland because they were written in Switzerland or sent to people who lived there, but most of them come from Spanish archives that have ended up on Swiss soil for different reasons.

7.The COLESfran (Corpus oral de la lengua Española en la Suiza francófona; Oral Corpus of the Spanish language in French-speaking Switzerland) was launched in 2013 and is made up (in February 2022) of a hundred interviews (1 to 2 hours in length) recorded on video and audio with first and second generation Spanish and Spanish Americans living in French-speaking Switzerland. The aim of this corpus is to study these discourses from the point of view of the sociology of languages in the context of migration, and to analyse the different linguistic contact phenomena. Approximately 20% of the corpus has been transcribed and is expected to be progressively available online (from 2022).

8.The web-app Dialectos del español is useful for the study of grammatical variation in the Spanish speaking world today. It was developped by Mónica Castillo Lluch (UNIL), Miriam Bouzouita (Humboldt-Universität zu Berlin) and Enrique Pato (University of Montreal) (2019-). Participants answer to 26 questions and the app predicts their dialectal origin. After that, he participants provide their real geographical data which is used to geolocate the recorded data (about 650,000 questionnaires have been already done and half of them have been correctly geolocated).

9.The website Fueros medievales presents philological, linguistic and bibliographical information for the linguistic study of the peninsular fueros from the 13th century. Online since August 2015, this website will be progressively enriched with the research on these medieval legal texts conducted by Mónica Castillo Lluch.

10. The website Mapa del español en Suiza, developed by Johannes Kabatek (UZH) and Mónica Castillo Lluch (UNIL), presents data about the Spanish language in Switzerland. A series of maps show the number of Spanish speakers in each commune over the last ten years, the institutions where Spanish is taught (universities, university language centers, institutes and private academies), diplomatic offices and associations of Spanish speakers. This information is complemented by graphs with demographic and demolinguistic data and a mosaic of images of Hispanic personalities and cultural products in Switzerland. The website hosted by the University of Zurich.

Faculties and Departments involved in CLARIN-CH

Faculty of Arts

English Department

German Department

Spanish Department

French Department

Italian Department

Department of South Asian and Slavonic Languages and Cultures

Department of Language and Information Sciences

School of French as a Foreign Language (FLE)

resources/unil.txt · Last modified: 2024/01/22 07:54 by Cristina Grisot