This is an old revision of the document!



The University of Geneva is represented in the CLARIN-CH Consortium by Prof. Eric Haeberli, from the English Departement and the Linguistics Department.


The community from the University of Geneva provides CLARIN-CH language resources and expertise in language sciences, and it is actively involved in research projects involving language resources.

Language resources

1. The Incremental Sigmoid Belief Network Dependency Parser (idp) is an NLP tool for synchronous Syntactic Dependency Parsing and Semantic Role Labeling for Multiple Language. It was developped by Andrea Gesmundo and it can be found here.

2. The Temporal Restricted Boltzmann Machines based model Parser is an NLP tool for dependency parsing of natural language sentences. It was developped by Andrea Gesmundo and it can be found here.

3. The HadoopPerceptron annotated dataset is useful for training, prediction and evaluation for Hadoop reference. It was developped by Andrea Gesmundo and it can be found here.

4. The SIWIS database comprises speech recordings of bilingual and trilingual speakers recorded at the University of Geneva. Each speaker utters about 170 prompts in 2 or 3 languages among French, English, German and Italian. It was developed during the SIWIS “Spoken Interaction with Interpretation in Switzerland” project, which was about speech to speech translation. It will allow a person to speak to a machine in their native language and have it automatically recognised, translated and spoken in a different language. One characteristic of recent technology to achieve this is that the spoken synthetic voice can sound like the original speaker instead of a generic speaker or robot. Release of 27.11.2015 included 40 speakers. Access upon request.

5. The corpus CHEU-lex is a parallel and comparable corpus of Swiss and European Union (EU) legislation published in the three official languages of the Swiss Confederation (French, German and Italian). It comprises: 1) bilateral agreements entered between Switzerland and the EU from 1972 to 2017; and 2) Swiss federal legislation representing the reception of these agreements. The corpus aims at providing a richly annotated multilingual resource to investigate the influence of EU drafting and translation practices on Swiss legislation. Its development is led by Prof. Annarita FELICI as part of a project funded by a grant of the Ernest Boninchi Foundation. Owing to its structure, CHEU-lex datasets can be explored from a monolingual (e.g. bilateral agreements in a single language), parallel (e.g. bilateral agreements in the three languages), cross-textual (e.g. bilateral agreements and Swiss legislation in the same language), intratextual (e.g. by text subsections) or diachronic perspective to obtain information on frequency, concordance, parts-of-speech (POS) or syntactic features. The corpus is hosted on NoSketchEngine and can be browsed here.
6. The LETRINT corpora are four sets of trilingual textual datasets, including one comparable and three parallel corpora. Their scope and features are determined by the goals of the eponymous project LETRINT “Legal Translation in International Institutional Settings: Scope, Strategies and Quality Markers” (Prof. Fernando Prieto Ramos, Faculty of Translation and Interpretation). The LETRINT project was funded by a Consolidator Grant ERC grant (2014-2022). The project was conducted in cooperation with the translation services of the institutions selected for this research, and with the support of IAMLADP through its Universities Contact Group (UCG). They comprise documents published in English, French and Spanish by the four main European Union institutions (the Commission, the Council, the Parliament and the Court of Justice), the United Nations and its International Court of Justice, and the World Trade Organization in 2005, 2010 and 2015. This infographic allows to discover the composition and methodological details of each corpus.

7. The LETRINT-Q is an open source corpus query interface that enables users to explore the LETRINT 1 and the LETRINT 1+ corpora (for further details, see Prieto Ramos, Cerutti & Guzmán 2019) through monolingual and parallel queries in English, French and Spanish. It was developed for the project on the basis of the corpus-querying application ParaVoz. Users can perform “basic” queries (i.e., by token, lexeme or grammatical tag) or use the CQP query language, according to the following parameters: organization, main legal function and functional sub-category of the text, year, textual genre, and document code (assigned during compilation). The platform renders results in several formats (e.g., lists or charts) and offers the possibility to download data as xlsx or tsv files. Access credentials may be requested here.

Faculties and Departments involved in CLARIN-CH

Faculty of Humanities

1. Linguistics Department

Areas of expertise:

  • Formal grammar
  • Corpus Linguistics
  • Computational Learning
  • Computational Linguistics
  • Psycholinguistics
  • Syntax

2. English Department

Areas of expertise in the field of Linguistics:

  • Corpus Linguistics
  • Historical syntax
  • Finno-Ugric languages
  • Syntactic variation and change
  • Syntax of early English (Old and Middle English)
  • Syntactically annotated corpora in the study of syntactic variation and change
  • Syntactic variation and change in Germanic from the perspective of generative syntactic theory
  • Syntax
  • Syntax-semantics interface
  • Quantification and negation in various languages, including English, French, Bellinzonese, and Hungarian

3. Department of German Language and Literature

Areas of expertise in the field of Linguistics:

  • Analysis of political communication
  • Argumentation
  • Cultural, pragmatic and textual linguistics
  • Inclusive language
  • Morphology
  • Word formation

4. Department of Romance Languages and Literatures: Unity of Italian

Areas of expertise in the field of Linguistics:

  • Diachronic syntax of Italian (word order, marked constructions)
  • Functional syntax of Italian (informational structure of utterance, syntax-pragmatics interface) and text linguistics
  • History of Italian language and linguistics (15th and 18th centuries)
  • Literary stylistics

5. School of French as a Foreign Language (FLE)

Areas of expertise in the field of Linguistics:

  • Aquisition of sociolinguistic competence in L2
  • Bilingual education
  • Classroom interactions
  • Didactics of plurilingualism and bilingual teaching
  • Discourse analysis and didactics of French as a foreign language
  • Discourse and interaction analysis
  • Exolingual interactions
  • Experimental methods and data analysis tools
  • Experimental psycholinguistics (perception and production of speech)
  • French as Foreign Language didactics
  • Informational structure
  • Intercomprehension between Romance languages
  • Integrated intercomprehension
  • Language minorities
  • Linguistics of language acquisition
  • Multimedia
  • Oral didactics
  • Oral corpora and data-driven learning, new technologies
  • Phonetics and phonology (L1 and L2)
  • Phonological acquisition in L2
  • Plurilingualism
  • Phonetics and phonology of L1 and L2 French
  • Prosody and periodic organization of discourse
  • Polyphony
  • Social representations
  • Sociolinguistics of language contact
  • Sociolinguistics, phonetic and socio-stylistic variation
  • Typology of languages and/or interaction
  • Training of teachers and trainers

6. Department of Mediterranean, Slavic and Oriental studies

Areas of expertise in the field of Linguistics:

  • Contrastive textual linguistics
  • Phonetics and contrastive grammar French-Russian
  • Semantics

Faculty of Translation and Interpretation

1. Department of Translation

Areas of expertise:

  • Automatic translation from speech to sign language
  • Circulation of terms between specialised and general languages
  • Computerised Lexicography
  • Corpus linguistics, linguistics of specialised corpora
  • Corpus linguistics, tool linguistics and textual terminology
  • Deep learning
  • Determinologisation, especially in relation to general language neology and terminological variation
  • Ergonomics of translation and impact of language technologies
  • International Sign
  • Information and communication technologies
  • Language engineering
  • Lexical semantics
  • Linguistics of Italian, French and French-Swiss Sign Languages
  • Localization
  • Localization standards and XLIFF
  • Machine translation
  • Machine learning
  • Natural Language Processing
  • Pre- and post-editing (MTA)
  • Sign language
  • Speech recognition
  • Socioterminology and textual terminology
  • Terminology and expertise
  • Translation support tools
  • Terminography
  • Variation in specialised languages
  • Web and multimedia technologies
  • Web accessibility
  • XML and multilingual documents

2. The Interpreting Department

Areas of expertise:

  • Ethical considerations affecting the interpreting process
  • Interpreting in conflict zones and scenarios
  • Interpreting in the context of international organisations
  • Interpreter training
  • Multilingual and multimodal processing

3. The Department of Translation

Areas of expertise:

  • Collaboration between authors and translators
  • Dispute settlement (ICJ, law of the sea, arbitration)
  • Genetics of translated texts
  • History of translation
  • Language and education economics
  • Legal and institutional translation
  • Legal and corporate translation
  • Literary self-translation
  • Literary translation
  • Law of international organisations
  • International environmental law
  • International economic law
  • Management of ethnic, linguistic and cultural diversity
  • Multilingualism and language policy
  • Revision and quality assurance
  • Translation and society
  • Techno-pedagogy for remote translation
  • Translation policies in contexts of official multilingualism
  • Texts by multilingual authors

Faculty of Psychology and Education Sciences

1. Department of French Didactics

Areas of expertise:

  • French language training
  • Language didactics
  • Language development
  • French language training

2. Psycholinguistics and speech therapy

Language and cognition Group
Areas of expertise:

  • Grammatical agreement
  • Hierarchical structure in artificial grammar learning
  • Syntactic representations and processes
  • The acquisition of word order and subordination
  • The role of executive control in the acquisition of syntax

Interaction and Training Group
Areas of expertise:

  • Adult education, language and work
  • Workplace learning

Current research projects

1.The project Disentangling linguistic intelligence: automatic generalisation of structure and meaning across languages (Prof. Paola Merlo, SNSF Advanced grant) sets the challenging goals of achieving higher-level linguistic abilities in machines, while training in more realistic settings, and studies if current neural network architectures have the same properties of learning, generalisation and abstraction when processing language.

2.The project Controversial Discourses. Language history as contemporary history since 1990 (Prof. Juliane Schröter, Department of German, in collaboration with Prof. Noah Bubenhofer from the University of Zurich) aims to examine the most important topics of public-political debates since the unification of the two German states and to embed them in an overall discourse and language history. In contrast to historiography, language history as contemporary history has so far only been researched in individual studies, for example on climate discourse or economic crisis discourse. Methodologically, the network, which is funded within the framework of the D-A-CH cooperation with the SNF, also wants to develop a common digital infrastructure for discourse history - an urgent linguistic desideratum“ (from the press release of the German Research Foundation of 10.12.2021). A total of five sub-projects are planned. Sub-project 1 “Participation and Egalitarianism - Discourses on Social Participation and Solidarity as well as Diversity and Equality since 1990” is based in Switzerland. It is being carried out at the University of Zurich and the University of Geneva.

3. The project BabelDr: Spoken Language Translation of Dialogues in the Medical Domain is jointly carried out with the Hôpitaux Universitaires de Genève (HUG), Geneva's largest hospital, where 52% of all patients are foreign nationals and more than 10% speak no French at all. In the context of the ongoing European refugee crisis, the medical professionals at HUG, particularly in the emergency and immigrant health service departments, often find they have no language in common with a patient. Particularly important languages are Tigrinya, Arabic and Farsi; as of September 2015, Eritreans, Syrians and Afghans make up about 60% of all new asylum seekers. Language barriers of this kind pose serious problems regarding the quality, security and equitability of health care, a phenomenon which has been the subject of detailed investigation by several teams over the last twenty years.

4.The project COPECO Collaborative Post-Editing Corpus in Pedagogical Context (Prof. Pierrette Bouillon, Faculty of Translation and Interpretation) is a joint project between Geneva University and Liège University, with three main objectives: 1) to collect post-edits produced by students and teacher corrections, 2) to build an open-source student post-editing corpus and 3) to help systematise the task of translation error annotation. It provides translation teachers with an online post-editing platform, designed to help them to annotate student post-editing tasks using a shared or personalised annotation scheme.

5. The project PASSAGE Automatic subtitling from Swiss-German to Standard German (Prof. Pierrette Bouillon, Faculty of Translation and Interpretation) aims to develop, in collaboration with Schweizer Radio und Fernsehen SRF and recapp IT, an automatic post-editing system to improve the quality of automatic German subtitling of Swiss German television programs. It has three main objectives. The first one is to compare different methods of automatic post-editing for this task, the second one is to collect the opinions of users on this type of transcription in order to generate subtitles that are as accessible as possible, and the third one is to share written and oral resources to promote research on Swiss German in the field of media. Automatic post-editing will be done using modern Machine Translation (MT) techniques. On the scientific level, this project contributes to research on poorly endowed languages, on automatic post-editing, especially with new neural methods, and on the understandability of this type of transcription. In addition, it has a strong societal impact, as it aims to make television more accessible by subtitling programs in Swiss German and making them available in standard German for people who do not speak Swiss German or who suffer from a hearing impairment. This project therefore complies directly with the new legal requirements for accessibility in Europe and falls squarely within the multilingual context of Switzerland, offering solutions to promote linguistic diversity, multilingual cohesion and cultural exchange, which is also the primary mission of the media.

6. The project PROPICTO (French acronym standing for PRojection du langage Oral vers des unités PICTOgraphiques) is a French-Swiss bilateral project, funded by the French National Research Agency (ANR) and the Swiss National Science Foundation (SNF) respectively. This 4-year research program is conducted as a collaboration between the Department of Translation Technology (TIM) at University of Geneva and the Study Group for Machine Translation and Automated Processing of Languages and Speech (GETALP), attached to the Grenoble Informatics Laboratory (LIG). The overall goal of PROPICTO is to create Speech-to-Pictograph translation systems enabling a conversion from a French speech input into pictograph sequences, and thus to enhance communication access for non-French speakers (allophones) or people with cognitive impairments. Our aim is to firstly implement our translating devices within emergency medical settings (and more specifically at Geneva University Hospitals, HUG), but we also plan to extend it into other domains and environments.

7.The project First language acquisition and foreign language learning in French-speaking children with DLD: Targeting grammar through explicit intervention (Dr. Hélène Delage, Faculty of Psychology and Education Sciences, Psycholinguistics and Speech-Language Therapy Group) is funded by the SNSF and aims to investigate the potential of explicit syntactic training to improve the first language and English as a foreign language syntactic abilities of children and adolescents with Developmental Language Disorder. Explicit and implicit training will be compared to test the Procedural Deficit Hypothesis, which predicts that children with implicit learning difficulties will make more progress when intervention is explicit (thus allowing them to circumvent their difficulties).

resources/unige.1674226191.txt.gz · Last modified: 2023/01/20 15:49 by Cristina Grisot