Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
resources:uzh [2024/01/20 16:52] Seraina Nadigresources:uzh [2024/03/20 10:10] (current) Seraina Nadig
Line 7: Line 7:
 {{:uzh.png?nolink&200 |}} {{:uzh.png?nolink&200 |}}
 \\ \\
-<fs small>The University of Zurich is represented in the CLARIN-CH Consortium by [[https://www.ds.uzh.ch/static/cms/pfs/personen.php?detail=22|Prof. Noah Bubenhofer]], from the Department of German and the [[https://www.linguistik.uzh.ch/en.html|Zurich Center for Linguistics]]. </fs> +The University of Zurich is represented in the CLARIN-CH Consortium by [[https://www.ds.uzh.ch/static/cms/pfs/personen.php?detail=22|Prof. Noah Bubenhofer]], from the Department of German and the [[https://www.linguistik.uzh.ch/en.html|Zurich Center for Linguistics]].
 \\ \\
 \\ \\
 \\ \\
-<fs small>The community from the University of Zurich provides CLARIN-CH numerous **[[uzh#Language resources|language resources]]****[[uzh#Faculties and Departements involved in CLARIN-CH|expertise]]** in language sciences, and it is actively involved in **[[uzh#Current research projects|research projects]]** involving language resources.</fs>+The community from the University of Zurich provides CLARIN-CH numerous **language resources** and **expertise** in the language sciences:
  
 ==== Language resources ==== ==== Language resources ====
-++++ Corpora and databases: English | +<WRAP round box 80%> 
-<fs small>1.The [[https://www.es.uzh.ch/en/Subsites/Projects/dbank.html|BNC Dependency Bank]] corpus was created at the English Department. It contains 1'200'000'000 Words. Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=2|request]]. </fs>\\+=== Corpora and databases === 
 +++++ English | 
 +<fs small>1.The [[https://www.es.uzh.ch/en/Subsites/Projects/dbank.html|BNC Dependency Bank]] corpus was created at the English Department. It contains 1'200'000'000 Words. </fs>\\
 \\ \\
-<fs small>2. The [[https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2506|GLBCC (Giessen-Long Beach Chaplin Corpus)]] was created at the English Department. It is a corpus of spoken language with approx. 155,000 words, in the format of audio files and transcriptions). Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=8|request]].  </fs>\\+<fs small>2. The [[https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2506|GLBCC (Giessen-Long Beach Chaplin Corpus)]] was created at the English Department. It is a corpus of spoken language with approx. 155,000 words, in the format of audio files and transcriptions). </fs>\\
 \\ \\
-<fs small>3.The [[https://www.es.uzh.ch/en/Subsites/Projects/zencorpus.html|ZEN (Zurich English Newspaper)]] corpus was created at the English Department. It is a diachronic (1661-1791) corpus of the first English newspapers with about 1.6 million words, searchable with Corpus Navigator. Access upon [[https://www.es.uzh.ch/en/Subsites/Projects/zencorpus/access.html|request]].  </fs>\\+<fs small>3.The [[https://www.es.uzh.ch/en/Subsites/Projects/zencorpus.html|ZEN (Zurich English Newspaper)]] corpus was created at the English Department. It is a diachronic (1661-1791) corpus of the first English newspapers with about 1.6 million words, searchable with Corpus Navigator.  </fs>\\
 \\ \\
-<fs small>4.The [[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE)]] began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-six research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English  produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety. To ensure compatibility among the component corpora, each team is following a common corpus design, as well as a common scheme for grammatical annotation. Until 2016, the ICE corpora were distributed by Prof. Gerry Nelson at the Department of English, The Chinese University of Hong Kong. They are now coordinated by Prof. Marianne Hundt and hosted at the English Department of the University of Zurich. Access upon [[https://www.ice-corpora.uzh.ch/en/access.html|request]]. </fs>\\ +<fs small>4.The [[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE)]] began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-six research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English  produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety. To ensure compatibility among the component corpora, each team is following a common corpus design, as well as a common scheme for grammatical annotation. Until 2016, the ICE corpora were distributed by Prof. Gerry Nelson at the Department of English, The Chinese University of Hong Kong. They are now coordinated by Prof. Marianne Hundt and hosted at the English Department of the University of Zurich. Access upon [[https://www.ice-corpora.uzh.ch/en/access.html|request]]. </fs> ++++ 
-\\ + 
-=== ==== Corpora and databases: French ==== ===+++++ French |
 <fs small> 5. The [[https://www.rose.uzh.ch/phoenix/workspace/web/|PADLF (Les plus anciens documents linguistiques de la France)]] corpus was created at the Romance Department. The corpus is an electronic edition of 13th century documents from Lorraine dating back to 1133. The corpus is bibliographically annotated (especially places of writing) and partially lemmatized. Available [[https://www.rose.uzh.ch/phoenix/workspace/web/corpus.php|here]].   </fs>\\ <fs small> 5. The [[https://www.rose.uzh.ch/phoenix/workspace/web/|PADLF (Les plus anciens documents linguistiques de la France)]] corpus was created at the Romance Department. The corpus is an electronic edition of 13th century documents from Lorraine dating back to 1133. The corpus is bibliographically annotated (especially places of writing) and partially lemmatized. Available [[https://www.rose.uzh.ch/phoenix/workspace/web/corpus.php|here]].   </fs>\\
 \\ \\
 <fs small> 6. The [[https://www.zora.uzh.ch/id/eprint/42901/1/Meisner2010_FrenchNE_cmlf.pdf|Tzéro database]] was created at the Romance Department (Meisner, 2016). It is a database on negation in French, which contains approx. 2500 entries with French language data, transcribed in IPA from approx. 40 hours of conversation recordings of approx. 100 speakers from France and Switzerland. Access upon [[charlotte.meisner@uzh.ch|request]].  </fs> ++++ <fs small> 6. The [[https://www.zora.uzh.ch/id/eprint/42901/1/Meisner2010_FrenchNE_cmlf.pdf|Tzéro database]] was created at the Romance Department (Meisner, 2016). It is a database on negation in French, which contains approx. 2500 entries with French language data, transcribed in IPA from approx. 40 hours of conversation recordings of approx. 100 speakers from France and Switzerland. Access upon [[charlotte.meisner@uzh.ch|request]].  </fs> ++++
  
-++++ Corpora and databases: German and Swiss German |+++++ German and Swiss German |
 \\ \\
 <fs small>7.The [[http://www.sensorysemantics.ch|SenS-Korpus]] was created at the German Department. It consists of 34 hours of recordings of conversations of discussion groups that met as part of the Sensory Semantics project; 14 transcribed (180’000 words). Access upon [[http://www.sensorysemantics.ch/de/kontakte/index.php?navanchor=1010016|request]].   </fs>\\ <fs small>7.The [[http://www.sensorysemantics.ch|SenS-Korpus]] was created at the German Department. It consists of 34 hours of recordings of conversations of discussion groups that met as part of the Sensory Semantics project; 14 transcribed (180’000 words). Access upon [[http://www.sensorysemantics.ch/de/kontakte/index.php?navanchor=1010016|request]].   </fs>\\
Line 34: Line 36:
 <fs small>8. The [[http://www.archimob.ch/|Archimob (Archives de la mobilization)]] database was created at the German Department. In the oral history project Archimob, 555 video interviews were conducted with contemporary witnesses of the Second World War in Switzerland. Of these interviews, about 50 were selected for dialectological studies and 17 of them were transcribed. Available [[http://www.archimob.ch/arc/|here]].    </fs>\\ <fs small>8. The [[http://www.archimob.ch/|Archimob (Archives de la mobilization)]] database was created at the German Department. In the oral history project Archimob, 555 video interviews were conducted with contemporary witnesses of the Second World War in Switzerland. Of these interviews, about 50 were selected for dialectological studies and 17 of them were transcribed. Available [[http://www.archimob.ch/arc/|here]].    </fs>\\
 \\ \\
-<fs small>9. The Picture postcard corpus was created at the German Department. It is a corpus of currently approx. 6000 scanned postcards and 200’000 German words. Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=0|request]].   </fs>\\+<fs small>9. The [[https://www.ds.uzh.ch/de/projekte/ansichtskartenprojekt.html|Picture postcard corpus]] was created at the German Department. It is a corpus of currently approx. 6000 scanned postcards and 200’000 German words. </fs>\\
 \\ \\
-<fs small>10. The [[https://www.linguistik.uzh.ch/static/davads/DAVADS_anleitung.html|DAVADS (Digital Audio/Video Archive)]] database was created at the German Department. It is a collection of approx. 700 broadcasts of Swiss television DRS in the period 1975-1999. The individual broadcasts are annotated (short description of the topics, broadcast dates, studio guests present, etc., linguistic features). 375 broadcasts are digitized. Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=6|request]].   </fs>\\+<fs small>10. The [[https://www.linguistik.uzh.ch/static/davads/DAVADS_anleitung.html|DAVADS (Digital Audio/Video Archive)]] database was created at the German Department. It is a collection of approx. 700 broadcasts of Swiss television DRS in the period 1975-1999. The individual broadcasts are annotated (short description of the topics, broadcast dates, studio guests present, etc., linguistic features). 375 broadcasts are digitized. </fs>\\
 \\ \\
-<fs small>11.The Metalanguage Discourses corpus was created at the German Department. It is a collection of about 1800 media documents on meta-linguistic topics (mainly Anglicisms) in the period 1990-2001. About 1400 documents of the corpus are discourse-analytically annotated via a separate database. Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=9|request]]  </fs>\\+<fs small> 11. The Swissenker corpus was created at the German Department. It contains transcriptions of 700 Swiss Wenker questionnaires, collected in 1933/34. The Wenker questionnaire is a traditional dialectological questionnaire that requires the translation of 40 sentences from standard German into dialect. </fs>\\
 \\ \\
-<fs small> 12. The Swissenker corpus was created at the German Department. It contains transcriptions of 700 Swiss Wenker questionnaires, collected in 1933/34. The Wenker questionnaire is traditional dialectological questionnaire that requires the translation of 40 sentences from standard German into dialectAccess upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=15|request]]. </fs>\\+<fs small> 12. The [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|Zurich summer 1968]] corpus was created at the German Department. It contains a total of 958 transcribed documentsAvailable [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|here]].  </fs>\\
 \\ \\
-<fs small> 13. The [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|Zurich summer 1968]] corpus was created at the German Department. It contains a total of 958 transcribed documentsAvailable [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|here]].  </fs>\\+<fs small> 13. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/hist-temporal-entities.html|Temporal entity extraction from historical texts]] project carried out at the Department of Computational Linguistics resulted into Gold Standard of temporal annotationsThe corpus contains 50 historical legal articles in Early New High German.  It was annotated in subset of the TimeML markup language for temporal annotationThe corpus contains about 34,000 tokens and is available [[https://pub.cl.uzh.ch/projects/hist-temporal-entities/gold_standard/|here]].   </fs>\\
 \\ \\
-<fs small>14. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/hist-temporal-entities.html|Temporal entity extraction from historical texts]] project carried out at the Department of Computational Linguistics resulted into Gold Standard of temporal annotationsThe corpus contains 50 historical legal articles in Early New High German.  It was annotated in subset of the TimeML markup language for temporal annotationThe corpus contains about 34,000 tokens and is available [[https://pub.cl.uzh.ch/projects/hist-temporal-entities/gold_standard/|here]].   </fs>\\+<fs small> 14. The [[https://www.jakoblexikon.ch/|JAKOB lexicon]] was created at the Department of Computational Linguistics in collaboration with the Institute of Psychology. It represents lexicon of psychological dimensions of German wordsAccess upon request [[https://www.jakoblexikon.ch/lexikon/|here]].   </fs>\\
 \\ \\
-<fs small> 15. The [[https://www.jakoblexikon.ch/|JAKOB lexicon]] was created at the Department of Computational Linguistics in collaboration with the Institute of Psychology. It represents a lexicon of psychological dimensions of German words. Access upon request [[https://www.jakoblexikon.ch/lexikon/|here]].   </fs>\\ +<fs small> 15. The [[https://www.dialektsyntax.uzh.ch/de.html|SADS (Syntactic Atlas of German-speaking Switzerland)]] database was created at the German Department. It is a digital atlas describing the syntactic landscape of German-speaking Switzerland. Surveys on 54 syntactic variables were conducted in 383 places with altogether 3187 informants. Access upon [[https://www.dialektsyntax.uzh.ch/de/database.html|request]].  </fs> ++++
-\\ +
-<fs small> 16. The [[https://www.dialektsyntax.uzh.ch/de.html|SADS (Syntactic Atlas of German-speaking Switzerland)]] database was created at the German Department. It is a digital atlas describing the syntactic landscape of German-speaking Switzerland. Surveys on 54 syntactic variables were conducted in 383 places with altogether 3187 informants. Access upon [[https://www.dialektsyntax.uzh.ch/de/database.html|request]].  </fs> +++++
  
-++++ Corpora and databases: multilingual +++++ Multilingual 
-<fs small>17. The [[https://liri.linguistik.uzh.ch/wiki/langtech/swissdox/start|Swissdox@LiRI]] database with press articles was created in collaboration with the Schweizer Mediendatenbank AG. Swissdox@LiRI consists of 29 million media articles (press, online) from a wide range of Swiss media sources covering many decades. The database is updated daily with about 5'000 to 6'000 new articles from the German and French speaking parts of Switzerland. It is designed for big data analyses. Data may be enriched optionally, automatically processed and analyzed. Access upon [[https://www.liri.uzh.ch/en/services/swissdox.html|institutional subscription]]. </fs>\\+<fs small>16. The [[https://liri.linguistik.uzh.ch/wiki/langtech/swissdox/start|Swissdox@LiRI]] database with press articles was created in collaboration with the Schweizer Mediendatenbank AG. Swissdox@LiRI consists of 29 million media articles (press, online) from a wide range of Swiss media sources covering many decades. The database is updated daily with about 5'000 to 6'000 new articles from the German and French speaking parts of Switzerland. It is designed for big data analyses. Data may be enriched optionally, automatically processed and analyzed. Access upon [[https://www.liri.uzh.ch/en/services/swissdox.html|institutional subscription]]. </fs>\\
 \\ \\
-<fs small>18. The [[https://pub.cl.uzh.ch/projects/b4c/en/|bulletin4corpus]] corpus was created at the Department of Computational Linguistics. The corpus contains the Credit Suisse Bulletin, which is published partially in four languages since 1895: German, French, Italian and English. The magazine contains articles on economic and socially relevant topics and is therefore neither a banking magazine nor a traditional corporate magazine. This makes the Bulletin interesting as a training corpus for applications such as machine translation since it provides access to another genre, which is suitable for newspapers and magazines for instance. Available [[https://pub.cl.uzh.ch/projects/b4c/en/corpora.php|here]].   </fs>\\+<fs small>17. The [[https://pub.cl.uzh.ch/projects/b4c/en/|bulletin4corpus]] corpus was created at the Department of Computational Linguistics. The corpus contains the Credit Suisse Bulletin, which is published partially in four languages since 1895: German, French, Italian and English. The magazine contains articles on economic and socially relevant topics and is therefore neither a banking magazine nor a traditional corporate magazine. This makes the Bulletin interesting as a training corpus for applications such as machine translation since it provides access to another genre, which is suitable for newspapers and magazines for instance. Available [[https://pub.cl.uzh.ch/projects/b4c/en/corpora.php|here]].   </fs>\\
 \\ \\
-<fs small>19. The [[https://sms.linguistik.uzh.ch/start|sms4science]] corpus was created at the Romance Department. It consists of approx. 26,000 SMS in all four Swiss national languages. In addition to the original texts and a normalized version with general annotations, various sub-corpora are available which are annotated with specific annotations by doctoral students. Access upon [[https://sms.linguistik.uzh.ch/start|here]].   </fs>\\+<fs small>18. The [[https://sms.linguistik.uzh.ch/start|sms4science]] corpus was created at the Romance Department. It consists of approx. 26,000 SMS in all four Swiss national languages. In addition to the original texts and a normalized version with general annotations, various sub-corpora are available which are annotated with specific annotations by doctoral students. Access upon [[https://sms.linguistik.uzh.ch/start|here]].   </fs>\\
 \\ \\
-<fs small>20. The [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|SMULTRON (Stockholm Multilingual Treebank)]] corpus was created at the Department of Computational Linguistics. It is a parallel treebank with subcorpora, each containing texts of different genres (mainly non-fiction texts) in two or more languages; five languages in total: Swiss German, German, French, Italian, Rhaeto-Romanic (Romansh). Available [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|here]].  </fs>\\+<fs small>19. The [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|SMULTRON (Stockholm Multilingual Treebank)]] corpus was created at the Department of Computational Linguistics. It is a parallel treebank with subcorpora, each containing texts of different genres (mainly non-fiction texts) in two or more languages; five languages in total: Swiss German, German, French, Italian, Rhaeto-Romanic (Romansh). Available [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|here]].  </fs>\\
 \\ \\
-<fs small>21. The [[https://www.ssrq-sds-fds.ch/en/projects/swiss-law-sources-online/|eSSRQ (Electronic Collection of Swiss Legal Sources)]] corpus was created by the Legal Source Foundation of the Swiss Lawyers' Association. It is a collection of Swiss legal texts from the period 501 – 1882, in German, French, Italian, Rhaeto-Romanic (Romansh) and Latin. Available [[https://www.ssrq-sds-fds.ch/exist/apps/ssrq/|here]].  </fs>\\+<fs small>20. The [[https://www.ssrq-sds-fds.ch/en/projects/swiss-law-sources-online/|eSSRQ (Electronic Collection of Swiss Legal Sources)]] corpus was created by the Legal Source Foundation of the Swiss Lawyers' Association. It is a collection of Swiss legal texts from the period 501 – 1882, in German, French, Italian, Rhaeto-Romanic (Romansh) and Latin. Available [[https://www.ssrq-sds-fds.ch/exist/apps/ssrq/|here]].  </fs>\\
 \\ \\
-<fs small>22. The [[https://www.phonogrammarchiv.uzh.ch/en.html|Phonogram Archives]] corpus was created at the Department of Computational Linguistics. It is a collection of approx. 3500 sound recordings or carriers from all four Swiss national languages, corresponding to approx. 120 hours of processed sound material. This includes varieties of all major language areas in Switzerland, such as Swiss German dialects, franco-provençal "Patois", the Lombard dialects of Ticino and parts of the Canton of Grisons and also the Rhaeto-Romance idioms. Currently, all sound carriers are being digitized. Digital versions and transcriptions are already available for many sound carriers. Access upon [[dieterandreas.studer@uzh.ch|request]].  </fs>\\+<fs small>21. The [[https://www.phonogrammarchiv.uzh.ch/en.html|Phonogram Archives]] corpus was created at the Department of Computational Linguistics. It is a collection of approx. 3500 sound recordings or carriers from all four Swiss national languages, corresponding to approx. 120 hours of processed sound material. This includes varieties of all major language areas in Switzerland, such as Swiss German dialects, franco-provençal "Patois", the Lombard dialects of Ticino and parts of the Canton of Grisons and also the Rhaeto-Romance idioms. Currently, all sound carriers are being digitized. Digital versions and transcriptions are already available for many sound carriers. Access upon [[phonogrammarchiv@cl.uzh.ch|request]].  </fs>\\
 \\ \\
-<fs small>23. The [[http://textberg.ch/site/de/willkommen/|Text+Berg]] corpus was created at the Department of Computational Linguistics. It consists of the digitalize volumes of "The yearbooks of the Swiss Alpine Club” from 1864 to 1923, the "Echo des Alpes" from 1872 to 1924, the ALPEN from 1925 to 2011. The corpus currently comprises nearly 45 million words from more than 100,000 book pages and is variously annotated (text structure, part-of-speech, lemmas, toponyms, etc.). The following languages are represented: German, French, Italian, Rhaeto-Romanic (Romansh), Swiss German, English. Available [[http://textberg.ch/site/en/corpora/|here]].  </fs>\\+<fs small>22. The [[http://textberg.ch/site/de/willkommen/|Text+Berg]] corpus was created at the Department of Computational Linguistics. It consists of the digitalize volumes of "The yearbooks of the Swiss Alpine Club” from 1864 to 1923, the "Echo des Alpes" from 1872 to 1924, the ALPEN from 1925 to 2011. The corpus currently comprises nearly 45 million words from more than 100,000 book pages and is variously annotated (text structure, part-of-speech, lemmas, toponyms, etc.). The following languages are represented: German, French, Italian, Rhaeto-Romanic (Romansh), Swiss German, English. Available [[http://textberg.ch/site/en/corpora/|here]].  </fs>\\
 \\ \\
-<fs small>24. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/Bullinger-Digital.html|Bullinger Digital]] corpus is created at the Department of Computational Linguistics. It consists of 2000 letters that Heinrich Bullinger wrote and 10,000 letters that he received have been preserved. The originals are kept in the Zurich State Archives and the Zurich Central Library. 80% of the letters are in Latin, most of the others in Early New High German. About 2900 letters have already been manually transcribed and edited. They can be searched [[http://teoirgsed.uzh.ch/|online]]. Another 5000 letters have been transcribed and are available as electronic texts.   </fs>\\+<fs small>23. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/Bullinger-Digital.html|Bullinger Digital]] corpus is created at the Department of Computational Linguistics. It consists of 2000 letters that Heinrich Bullinger wrote and 10,000 letters that he received have been preserved. The originals are kept in the Zurich State Archives and the Zurich Central Library. 80% of the letters are in Latin, most of the others in Early New High German. About 2900 letters have already been manually transcribed and edited. They can be searched [[http://teoirgsed.uzh.ch/|online]]. Another 5000 letters have been transcribed and are available as electronic texts.   </fs>\\
 \\ \\
-<fs small>25. The [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|CoNTra_corpora: the Federal Gazette]] was created at the Department of Computational Linguistics. The Federal Gazette is a journal published by the Swiss Government. The journal is a political newsletter concerned with resolutions and laws of the Swiss Confederation. This corpus contains the German-French and French-German parallel sentences mined with Laser from the digitized Federal Gazette. The heavily filtered corpus contains 1.3 million parallel sentence pairs in both directions. Available [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|here]].  </fs>\\+<fs small>24. The [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|CoNTra_corpora: the Federal Gazette]] was created at the Department of Computational Linguistics. The Federal Gazette is a journal published by the Swiss Government. The journal is a political newsletter concerned with resolutions and laws of the Swiss Confederation. This corpus contains the German-French and French-German parallel sentences mined with Laser from the digitized Federal Gazette. The heavily filtered corpus contains 1.3 million parallel sentence pairs in both directions. Available [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|here]].  </fs>\\
 \\ \\
-<fs small> 26. The [[https://phoible.org/|PHOIBLE]] database was created at the Department of Comparative Linguistics. It is repository of cross-linguistic phonological inventory data (more than 1000 languages), which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages. Available [[https://phoible.org/|here]].   </fs>\\+<fs small> 25. The [[https://phoible.org/|PHOIBLE]] database was created at the Department of Comparative Linguistics. It is repository of cross-linguistic phonological inventory data (more than 1000 languages), which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages. Available [[https://phoible.org/|here]].   </fs>\\
 \\ \\
-<fs small> 27. The  [[http://www.meta-net.eu/whitepapers/overview|European Language Grid resource collection for the languages in Switzerland]] was created of the Department of Comparative Linguistics with the occasion of their participation in the European Language Equality (ELE) European project. It consist of over 100 resources: corpora (<60), applications (<40) and  lexica. Many of the resources are multilingual: between French, German and Italian, as well as Romansh. Access can be acquired by writing to [[arios@ifi.uzh.ch|Dr. Annette Rios]] or to the  [[contact@clarin-ch.ch|CLARIN-CH Coordination Office]].  </fs>\\+<fs small> 26. The  [[http://www.meta-net.eu/whitepapers/overview|European Language Grid resource collection for the languages in Switzerland]] was created of the Department of Comparative Linguistics with the occasion of their participation in the European Language Equality (ELE) European project. It consist of over 100 resources: corpora (<60), applications (<40) and  lexica. Many of the resources are multilingual: between French, German and Italian, as well as Romansh. Access can be acquired by writing to [[arios@ifi.uzh.ch|Dr. Annette Rios]] or to the  [[contact@clarin-ch.ch|CLARIN-CH Coordination Office]].  </fs>\\
 \\ \\
-<fs small>28. The [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]] corpus was created in a project funded by the SNSF and thanks to a collaboration among the Universities of Zurich, Bern, Neuchâtel and the University of Leipzig. The Swiss WhatsApp corpus is now available as an open access resource with more than 5 mio tokens in all four national languages of Switzerland. You find the documentation and the access to the corpus [[https://www.whatsup-switzerland.ch/index.php/en/|here]].  </fs> +++++<fs small>27. The [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]] corpus was created in a project funded by the SNSF and thanks to a collaboration among the Universities of Zurich, Bern, Neuchâtel and the University of Leipzig. The Swiss WhatsApp corpus is now available as an open access resource with more than 5 mio tokens in all four national languages of Switzerland. You find the documentation and the access to the corpus [[https://www.whatsup-switzerland.ch/index.php/en/|here]].  </fs> ++++
  
-++++ Corpora and databases: Other languages | +++++ Other languages | 
-<fs small>29. The [[https://www.uzh.ch/clrp/|CLC (Chintang Language Corpus)]] corpus was created at Department of Comparative Language Science. It is a multimedia corpus of Chintang (Tibeto-Burman, Nepal, ca. 5000 speakers); ca. 300 hours (1.2 million) words transcribed, most translated into Nepali and English, and morphologically annotated (segments, functions, POS). Includes data from adults (responsibility Seminar for ASW) and longitudinal data on language acquisition (responsibility Psycholinguistics Unit). Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=3|request]].   </fs>\\+<fs small>28. The [[https://www.uzh.ch/clrp/|CLC (Chintang Language Corpus)]] corpus was created at Department of Comparative Language Science. It is a multimedia corpus of Chintang (Tibeto-Burman, Nepal, ca. 5000 speakers); ca. 300 hours (1.2 million) words transcribed, most translated into Nepali and English, and morphologically annotated (segments, functions, POS). Includes data from adults (responsibility Seminar for ASW) and longitudinal data on language acquisition (responsibility Psycholinguistics Unit). </fs>\\
 \\ \\
-<fs small> 30. The [[https://www.zora.uzh.ch/id/eprint/85666/|NNC (Nepali National Corpus)]] corpus was created at the Department of Comparative Language Science. It consists of Nepali texts from various genres, with 14’000’000 words. The majority of the texts are primarily written, with a small portion transcribed from recordings. Access upon [[https://www.linguistik.uzh.ch/de/resources/access.html?typ=korpora_en&id=11|request]].  </fs>\\+<fs small> 29. The [[https://www.zora.uzh.ch/id/eprint/85666/|NNC (Nepali National Corpus)]] corpus was created at the Department of Comparative Language Science. It consists of Nepali texts from various genres, with 14’000’000 words. The majority of the texts are primarily written, with a small portion transcribed from recordings.  </fs>\\
 \\ \\
-<fs small> 31. The [[http://sealang.net/library/|SEAlang]] corpus was created at the Department of Comparative Language Science. It consists of audio recordings (conversations, elicitation, stories), texts of Southeast Asian languages for linguistic purposes (language description, areal typology); some texts of literary interest (Southeast Asian traditions and beliefs); transcript of audio partly in indigenous scripts, partly already IPA, some with glosses and translation. Audio recordings in Mon amount to some 10 hours, Burmese about 8 hours, Karen (Pwo) and Nyahkur about 1 hour each. Transcripts of Mon texts (indigenous script and/or transcription) estimated 60,000 words (including literary texts), Burmese about 30,000 words (including e-mail communication), Karen (indigenous script, hand written) and Nyahkur (indigenous script, hand written) about 5000 words each. A total of 100,000 words are transcribed (a total of 50 hours). Available [[http://sealang.net/library/|here]].   </fs>\\+<fs small> 30. The [[http://sealang.net/library/|SEAlang]] corpus was created at the Department of Comparative Language Science. It consists of audio recordings (conversations, elicitation, stories), texts of Southeast Asian languages for linguistic purposes (language description, areal typology); some texts of literary interest (Southeast Asian traditions and beliefs); transcript of audio partly in indigenous scripts, partly already IPA, some with glosses and translation. Audio recordings in Mon amount to some 10 hours, Burmese about 8 hours, Karen (Pwo) and Nyahkur about 1 hour each. Transcripts of Mon texts (indigenous script and/or transcription) estimated 60,000 words (including literary texts), Burmese about 30,000 words (including e-mail communication), Karen (indigenous script, hand written) and Nyahkur (indigenous script, hand written) about 5000 words each. A total of 100,000 words are transcribed (a total of 50 hours). Available [[http://sealang.net/library/|here]].   </fs>\\
 \\ \\
-<fs small>32. The [[https://www.mlat.uzh.ch/home|Corporum]] corpus was created at the Medieval Latin Seminar. It is a collection of medieval Latin texts. Available [[https://www.mlat.uzh.ch/browser?path=MLS/|here]].   </fs>\\+<fs small>31. The [[https://www.mlat.uzh.ch/home|Corporum]] corpus was created at the Medieval Latin Seminar. It is a collection of medieval Latin texts. Available [[https://www.mlat.uzh.ch/browser?path=MLS/|here]].   </fs>\\
 \\ \\
-<fs small>33.The [[https://www.ieu.uzh.ch/en/research/evolbiol/humangen_langdiv.html|GeLaTo (Genes and Languages Together)]] database was created at the Department of Comparative Language Science in collaboration with the Department of Evolutionary Biology and Environmental Studies. It is a new resource developed to link genomic data to cultural and linguistic identifiers and promote multidisciplinary research. Access upon [[chiara.barbieri@ieu.uzh.ch|request]].   </fs>\\+<fs small>32.The [[https://www.ieu.uzh.ch/en/research/evolbiol/humangen_langdiv.html|GeLaTo (Genes and Languages Together)]] database was created at the Department of Comparative Language Science in collaboration with the Department of Evolutionary Biology and Environmental Studies. It is a new resource developed to link genomic data to cultural and linguistic identifiers and promote multidisciplinary research. Access upon [[chiara.barbieri@ieu.uzh.ch|request]].   </fs>\\
 \\ \\
-<fs small> 34. The [[https://www.autotyp.uzh.ch/|AUTOTYP]] database was created at the Department of Comparative Language Science in collaboration with the University of California. It represents an international network of typological linguistic databases. AUTOTYP is a large-scale research program with goals in both quantitative and qualitative typology. Quantitative typology is interested in detecting and explaining geographical distributions of typological features and in producing statistical estimates of universal preferences as well as of genealogical inheritance and areal diffusion potentials. Qualitative typology aims at a systematic analysis of the kinds of variation found in various typological domains. Available [[https://www.autotyp.uzh.ch/available.html|here]].   </fs>\\+<fs small> 33. The [[https://www.autotyp.uzh.ch/|AUTOTYP]] database was created at the Department of Comparative Language Science in collaboration with the University of California. It represents an international network of typological linguistic databases. AUTOTYP is a large-scale research program with goals in both quantitative and qualitative typology. Quantitative typology is interested in detecting and explaining geographical distributions of typological features and in producing statistical estimates of universal preferences as well as of genealogical inheritance and areal diffusion potentials. Qualitative typology aims at a systematic analysis of the kinds of variation found in various typological domains. Available [[https://www.autotyp.uzh.ch/available.html|here]].   </fs>\\
 \\ \\
-<fs small> 35. The [[https://gitlab.uzh.ch/uzh-slavic-corpora|Zurich Corpora of Slavic Varieties (ZuCoSlaV)]] was created at the Department of Slavonic Studies. It consists of four corpora. 1. [[https://gitlab.uzh.ch/uzh-slavic-corpora/macedonian-dialect-corpus|Macedonian Spoken Corpus]], which comprises transcriptions of audio files collected in a series of field research trips in the Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019. 2. [[https://gitlab.uzh.ch/uzh-slavic-corpora/pre-standardized-balkan-slavic-literature|Pre-Standardized Balkan Slavic Literature corpus]], which includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens). 3. [[https://gitlab.uzh.ch/uzh-slavic-corpora/torlak|Torlak corpus]], which contains transcripts of interviews about traditional culture and history with speakers of Torlak from the Timok area. It comprises 500,697 tokens representing 80 h of recording. 4. [[https://gitlab.uzh.ch/uzh-slavic-corpora/serbian-forms-of-address|Serbian Forms of Address corpus]], which contains transcripts of interviews about forms of address that Serbian speakers use in colloquial and formal settings. It consists of 171,552 tokens, corresponding to about 19 h of recording. Available [[https://gitlab.uzh.ch/uzh-slavic-corpora|here]].   </fs> +++++<fs small> 34. The [[https://gitlab.uzh.ch/uzh-slavic-corpora|Zurich Corpora of Slavic Varieties (ZuCoSlaV)]] was created at the Department of Slavonic Studies. It consists of four corpora. 1. [[https://gitlab.uzh.ch/uzh-slavic-corpora/macedonian-dialect-corpus|Macedonian Spoken Corpus]], which comprises transcriptions of audio files collected in a series of field research trips in the Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019. 2. [[https://gitlab.uzh.ch/uzh-slavic-corpora/pre-standardized-balkan-slavic-literature|Pre-Standardized Balkan Slavic Literature corpus]], which includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens). 3. [[https://gitlab.uzh.ch/uzh-slavic-corpora/torlak|Torlak corpus]], which contains transcripts of interviews about traditional culture and history with speakers of Torlak from the Timok area. It comprises 500,697 tokens representing 80 h of recording. 4. [[https://gitlab.uzh.ch/uzh-slavic-corpora/serbian-forms-of-address|Serbian Forms of Address corpus]], which contains transcripts of interviews about forms of address that Serbian speakers use in colloquial and formal settings. It consists of 171,552 tokens, corresponding to about 19 h of recording. Available [[https://gitlab.uzh.ch/uzh-slavic-corpora|here]]. </fs> 
 +++++ 
 +</WRAP>
  
 ==== Faculties and Departements involved in CLARIN-CH ==== ==== Faculties and Departements involved in CLARIN-CH ====
 +
 <WRAP round box 80%> <WRAP round box 80%>
 === Faculty of Arts and Social Sciences === === Faculty of Arts and Social Sciences ===
 +<WRAP>
 ++++ Institute of German Studies | ++++ Institute of German Studies |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
Line 106: Line 109:
   * <fs small>Linguistic discourse and cultural analysis </fs>   * <fs small>Linguistic discourse and cultural analysis </fs>
   * <fs small>Multimodal communication </fs>   * <fs small>Multimodal communication </fs>
-  * <fs small>Theoretical linguistics </fs> +++++  * <fs small>Theoretical linguistics </fs> 
 +++++ </WRAP>
  
-++++ English Department |+<WRAP> ++++ English Department |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Contact linguistics </fs>   * <fs small>Contact linguistics </fs>
Line 119: Line 123:
   * <fs small>Second language acquisition in multi-lingual contexts </fs>   * <fs small>Second language acquisition in multi-lingual contexts </fs>
   * <fs small>Sociolinguistics (historical and contemporary) </fs>   * <fs small>Sociolinguistics (historical and contemporary) </fs>
-  * <fs small>World Englishes </fs> +++++  * <fs small>World Englishes </fs> 
 +++++ </WRAP>
  
-++++ Department of Romance Studies |+<WRAP> ++++ Department of Romance Studies |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Bilingualism and second language acquisition </fs>   * <fs small>Bilingualism and second language acquisition </fs>
Line 132: Line 137:
   * <fs small>Sociolinguistics and language contact </fs>   * <fs small>Sociolinguistics and language contact </fs>
   * <fs small>Spoken language and description problems in variety linguistics; language use in new media </fs>   * <fs small>Spoken language and description problems in variety linguistics; language use in new media </fs>
-  * <fs small>Theoretical linguistics (phonology, morphology, syntax) </fs> +++++  * <fs small>Theoretical linguistics (phonology, morphology, syntax) </fs> 
 +++++ </WRAP>
  
-++++ Institute of Asian and Oriental Studies |+<WRAP> ++++ Institute of Asian and Oriental Studies |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Phonology, etymology and palaeography of Old Chinese</fs>   * <fs small>Phonology, etymology and palaeography of Old Chinese</fs>
Line 144: Line 150:
   * <fs small>Philosophy of language</fs>   * <fs small>Philosophy of language</fs>
   * <fs small>Sanskrit-Chinese translation and lexicography </fs>   * <fs small>Sanskrit-Chinese translation and lexicography </fs>
-  * <fs small>Grammatology of complex non-alphabetic writing systems </fs> +++++  * <fs small>Grammatology of complex non-alphabetic writing systems </fs> 
 +++++ </WRAP>
  
-++++ Department of Comparative Language Studies |+<WRAP> ++++ Department of Comparative Language Studies |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Auditory neurocognition</fs>   * <fs small>Auditory neurocognition</fs>
Line 157: Line 164:
   * <fs small>Language functions over the life span </fs>   * <fs small>Language functions over the life span </fs>
   * <fs small>Psycholinguistics (focus on processing and learning mechanisms of languages) </fs>   * <fs small>Psycholinguistics (focus on processing and learning mechanisms of languages) </fs>
-  * <fs small>Southeast Asia languages </fs> +++++  * <fs small>Southeast Asia languages </fs> 
 +++++ </WRAP>
  
-++++ Department of Computational Linguistics |+<WRAP> ++++ Department of Computational Linguistics |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Computational Psycholinguistics </fs>   * <fs small>Computational Psycholinguistics </fs>
Line 170: Line 178:
   * <fs small>Neural Machine Translation (NMT) </fs>   * <fs small>Neural Machine Translation (NMT) </fs>
   * <fs small>Speech sciences and speech therapy </fs>   * <fs small>Speech sciences and speech therapy </fs>
-  * <fs small>Translating sign languages</fs> +++++  * <fs small>Translating sign languages</fs> 
 +++++ </WRAP>
  
-++++ Department of Slavonic Studies |+<WRAP> ++++ Department of Slavonic Studies |
 <fs small>**//Areas of expertise in the field of Linguistics://**</fs> <fs small>**//Areas of expertise in the field of Linguistics://**</fs>
   * <fs small>Corpus Linguistics </fs>   * <fs small>Corpus Linguistics </fs>
Line 179: Line 188:
   * <fs small>Interfaces: semantics-pragmatics, syntax-discourse </fs>   * <fs small>Interfaces: semantics-pragmatics, syntax-discourse </fs>
   * <fs small>Morphosyntax: monolingual, comparative & areal </fs>   * <fs small>Morphosyntax: monolingual, comparative & areal </fs>
-  * <fs small>Syntactic structures, synchronic & diachronic </fs> ++++ +  * <fs small>Syntactic structures, synchronic & diachronic </fs> 
-</WRAP>+++++ 
 + 
 +</WRAP></WRAP>
  
resources/uzh.1705765974.txt.gz · Last modified: 2024/01/20 16:52 by Seraina Nadig