Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
resources:uzh [2024/03/08 10:37] Seraina Nadigresources:uzh [2024/03/20 10:10] (current) Seraina Nadig
Line 36: Line 36:
 <fs small>8. The [[http://www.archimob.ch/|Archimob (Archives de la mobilization)]] database was created at the German Department. In the oral history project Archimob, 555 video interviews were conducted with contemporary witnesses of the Second World War in Switzerland. Of these interviews, about 50 were selected for dialectological studies and 17 of them were transcribed. Available [[http://www.archimob.ch/arc/|here]].    </fs>\\ <fs small>8. The [[http://www.archimob.ch/|Archimob (Archives de la mobilization)]] database was created at the German Department. In the oral history project Archimob, 555 video interviews were conducted with contemporary witnesses of the Second World War in Switzerland. Of these interviews, about 50 were selected for dialectological studies and 17 of them were transcribed. Available [[http://www.archimob.ch/arc/|here]].    </fs>\\
 \\ \\
-<fs small>9. The Picture postcard corpus was created at the German Department. It is a corpus of currently approx. 6000 scanned postcards and 200’000 German words. </fs>\\+<fs small>9. The [[https://www.ds.uzh.ch/de/projekte/ansichtskartenprojekt.html|Picture postcard corpus]] was created at the German Department. It is a corpus of currently approx. 6000 scanned postcards and 200’000 German words. </fs>\\
 \\ \\
 <fs small>10. The [[https://www.linguistik.uzh.ch/static/davads/DAVADS_anleitung.html|DAVADS (Digital Audio/Video Archive)]] database was created at the German Department. It is a collection of approx. 700 broadcasts of Swiss television DRS in the period 1975-1999. The individual broadcasts are annotated (short description of the topics, broadcast dates, studio guests present, etc., linguistic features). 375 broadcasts are digitized. </fs>\\ <fs small>10. The [[https://www.linguistik.uzh.ch/static/davads/DAVADS_anleitung.html|DAVADS (Digital Audio/Video Archive)]] database was created at the German Department. It is a collection of approx. 700 broadcasts of Swiss television DRS in the period 1975-1999. The individual broadcasts are annotated (short description of the topics, broadcast dates, studio guests present, etc., linguistic features). 375 broadcasts are digitized. </fs>\\
 \\ \\
-<fs small>11.The Metalanguage Discourses corpus was created at the German Department. It is a collection of about 1800 media documents on meta-linguistic topics (mainly Anglicisms) in the period 1990-2001. About 1400 documents of the corpus are discourse-analytically annotated via a separate database. </fs>\\+<fs small> 11. The Swissenker corpus was created at the German Department. It contains transcriptions of 700 Swiss Wenker questionnaires, collected in 1933/34. The Wenker questionnaire is a traditional dialectological questionnaire that requires the translation of 40 sentences from standard German into dialect. </fs>\\
 \\ \\
-<fs small> 12. The Swissenker corpus was created at the German Department. It contains transcriptions of 700 Swiss Wenker questionnaires, collected in 1933/34The Wenker questionnaire is a traditional dialectological questionnaire that requires the translation of 40 sentences from standard German into dialect. </fs>\\+<fs small> 12. The [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|Zurich summer 1968]] corpus was created at the German Department. It contains a total of 958 transcribed documents. Available [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|here]] </fs>\\
 \\ \\
-<fs small> 13. The [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|Zurich summer 1968]] corpus was created at the German Department. It contains a total of 958 transcribed documentsAvailable [[https://www.uzh.ch/cosmov/edition/ssl-dir/V4/|here]].  </fs>\\+<fs small> 13. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/hist-temporal-entities.html|Temporal entity extraction from historical texts]] project carried out at the Department of Computational Linguistics resulted into Gold Standard of temporal annotationsThe corpus contains 50 historical legal articles in Early New High German.  It was annotated in subset of the TimeML markup language for temporal annotationThe corpus contains about 34,000 tokens and is available [[https://pub.cl.uzh.ch/projects/hist-temporal-entities/gold_standard/|here]].   </fs>\\
 \\ \\
-<fs small>14. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/hist-temporal-entities.html|Temporal entity extraction from historical texts]] project carried out at the Department of Computational Linguistics resulted into Gold Standard of temporal annotationsThe corpus contains 50 historical legal articles in Early New High German.  It was annotated in subset of the TimeML markup language for temporal annotationThe corpus contains about 34,000 tokens and is available [[https://pub.cl.uzh.ch/projects/hist-temporal-entities/gold_standard/|here]].   </fs>\\+<fs small> 14. The [[https://www.jakoblexikon.ch/|JAKOB lexicon]] was created at the Department of Computational Linguistics in collaboration with the Institute of Psychology. It represents lexicon of psychological dimensions of German wordsAccess upon request [[https://www.jakoblexikon.ch/lexikon/|here]].   </fs>\\
 \\ \\
-<fs small> 15. The [[https://www.jakoblexikon.ch/|JAKOB lexicon]] was created at the Department of Computational Linguistics in collaboration with the Institute of Psychology. It represents a lexicon of psychological dimensions of German words. Access upon request [[https://www.jakoblexikon.ch/lexikon/|here]].   </fs>\\ +<fs small> 15. The [[https://www.dialektsyntax.uzh.ch/de.html|SADS (Syntactic Atlas of German-speaking Switzerland)]] database was created at the German Department. It is a digital atlas describing the syntactic landscape of German-speaking Switzerland. Surveys on 54 syntactic variables were conducted in 383 places with altogether 3187 informants. Access upon [[https://www.dialektsyntax.uzh.ch/de/database.html|request]].  </fs> ++++
-\\ +
-<fs small> 16. The [[https://www.dialektsyntax.uzh.ch/de.html|SADS (Syntactic Atlas of German-speaking Switzerland)]] database was created at the German Department. It is a digital atlas describing the syntactic landscape of German-speaking Switzerland. Surveys on 54 syntactic variables were conducted in 383 places with altogether 3187 informants. Access upon [[https://www.dialektsyntax.uzh.ch/de/database.html|request]].  </fs> +++++
  
 ++++ Multilingual | ++++ Multilingual |
-<fs small>17. The [[https://liri.linguistik.uzh.ch/wiki/langtech/swissdox/start|Swissdox@LiRI]] database with press articles was created in collaboration with the Schweizer Mediendatenbank AG. Swissdox@LiRI consists of 29 million media articles (press, online) from a wide range of Swiss media sources covering many decades. The database is updated daily with about 5'000 to 6'000 new articles from the German and French speaking parts of Switzerland. It is designed for big data analyses. Data may be enriched optionally, automatically processed and analyzed. Access upon [[https://www.liri.uzh.ch/en/services/swissdox.html|institutional subscription]]. </fs>\\+<fs small>16. The [[https://liri.linguistik.uzh.ch/wiki/langtech/swissdox/start|Swissdox@LiRI]] database with press articles was created in collaboration with the Schweizer Mediendatenbank AG. Swissdox@LiRI consists of 29 million media articles (press, online) from a wide range of Swiss media sources covering many decades. The database is updated daily with about 5'000 to 6'000 new articles from the German and French speaking parts of Switzerland. It is designed for big data analyses. Data may be enriched optionally, automatically processed and analyzed. Access upon [[https://www.liri.uzh.ch/en/services/swissdox.html|institutional subscription]]. </fs>\\
 \\ \\
-<fs small>18. The [[https://pub.cl.uzh.ch/projects/b4c/en/|bulletin4corpus]] corpus was created at the Department of Computational Linguistics. The corpus contains the Credit Suisse Bulletin, which is published partially in four languages since 1895: German, French, Italian and English. The magazine contains articles on economic and socially relevant topics and is therefore neither a banking magazine nor a traditional corporate magazine. This makes the Bulletin interesting as a training corpus for applications such as machine translation since it provides access to another genre, which is suitable for newspapers and magazines for instance. Available [[https://pub.cl.uzh.ch/projects/b4c/en/corpora.php|here]].   </fs>\\+<fs small>17. The [[https://pub.cl.uzh.ch/projects/b4c/en/|bulletin4corpus]] corpus was created at the Department of Computational Linguistics. The corpus contains the Credit Suisse Bulletin, which is published partially in four languages since 1895: German, French, Italian and English. The magazine contains articles on economic and socially relevant topics and is therefore neither a banking magazine nor a traditional corporate magazine. This makes the Bulletin interesting as a training corpus for applications such as machine translation since it provides access to another genre, which is suitable for newspapers and magazines for instance. Available [[https://pub.cl.uzh.ch/projects/b4c/en/corpora.php|here]].   </fs>\\
 \\ \\
-<fs small>19. The [[https://sms.linguistik.uzh.ch/start|sms4science]] corpus was created at the Romance Department. It consists of approx. 26,000 SMS in all four Swiss national languages. In addition to the original texts and a normalized version with general annotations, various sub-corpora are available which are annotated with specific annotations by doctoral students. Access upon [[https://sms.linguistik.uzh.ch/start|here]].   </fs>\\+<fs small>18. The [[https://sms.linguistik.uzh.ch/start|sms4science]] corpus was created at the Romance Department. It consists of approx. 26,000 SMS in all four Swiss national languages. In addition to the original texts and a normalized version with general annotations, various sub-corpora are available which are annotated with specific annotations by doctoral students. Access upon [[https://sms.linguistik.uzh.ch/start|here]].   </fs>\\
 \\ \\
-<fs small>20. The [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|SMULTRON (Stockholm Multilingual Treebank)]] corpus was created at the Department of Computational Linguistics. It is a parallel treebank with subcorpora, each containing texts of different genres (mainly non-fiction texts) in two or more languages; five languages in total: Swiss German, German, French, Italian, Rhaeto-Romanic (Romansh). Available [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|here]].  </fs>\\+<fs small>19. The [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|SMULTRON (Stockholm Multilingual Treebank)]] corpus was created at the Department of Computational Linguistics. It is a parallel treebank with subcorpora, each containing texts of different genres (mainly non-fiction texts) in two or more languages; five languages in total: Swiss German, German, French, Italian, Rhaeto-Romanic (Romansh). Available [[https://www.cl.uzh.ch/en/texttechnologies/research/corpus-linguistics/paralleltreebanks/smultron|here]].  </fs>\\
 \\ \\
-<fs small>21. The [[https://www.ssrq-sds-fds.ch/en/projects/swiss-law-sources-online/|eSSRQ (Electronic Collection of Swiss Legal Sources)]] corpus was created by the Legal Source Foundation of the Swiss Lawyers' Association. It is a collection of Swiss legal texts from the period 501 – 1882, in German, French, Italian, Rhaeto-Romanic (Romansh) and Latin. Available [[https://www.ssrq-sds-fds.ch/exist/apps/ssrq/|here]].  </fs>\\+<fs small>20. The [[https://www.ssrq-sds-fds.ch/en/projects/swiss-law-sources-online/|eSSRQ (Electronic Collection of Swiss Legal Sources)]] corpus was created by the Legal Source Foundation of the Swiss Lawyers' Association. It is a collection of Swiss legal texts from the period 501 – 1882, in German, French, Italian, Rhaeto-Romanic (Romansh) and Latin. Available [[https://www.ssrq-sds-fds.ch/exist/apps/ssrq/|here]].  </fs>\\
 \\ \\
-<fs small>22. The [[https://www.phonogrammarchiv.uzh.ch/en.html|Phonogram Archives]] corpus was created at the Department of Computational Linguistics. It is a collection of approx. 3500 sound recordings or carriers from all four Swiss national languages, corresponding to approx. 120 hours of processed sound material. This includes varieties of all major language areas in Switzerland, such as Swiss German dialects, franco-provençal "Patois", the Lombard dialects of Ticino and parts of the Canton of Grisons and also the Rhaeto-Romance idioms. Currently, all sound carriers are being digitized. Digital versions and transcriptions are already available for many sound carriers. Access upon [[phonogrammarchiv@cl.uzh.ch|request]].  </fs>\\+<fs small>21. The [[https://www.phonogrammarchiv.uzh.ch/en.html|Phonogram Archives]] corpus was created at the Department of Computational Linguistics. It is a collection of approx. 3500 sound recordings or carriers from all four Swiss national languages, corresponding to approx. 120 hours of processed sound material. This includes varieties of all major language areas in Switzerland, such as Swiss German dialects, franco-provençal "Patois", the Lombard dialects of Ticino and parts of the Canton of Grisons and also the Rhaeto-Romance idioms. Currently, all sound carriers are being digitized. Digital versions and transcriptions are already available for many sound carriers. Access upon [[phonogrammarchiv@cl.uzh.ch|request]].  </fs>\\
 \\ \\
-<fs small>23. The [[http://textberg.ch/site/de/willkommen/|Text+Berg]] corpus was created at the Department of Computational Linguistics. It consists of the digitalize volumes of "The yearbooks of the Swiss Alpine Club” from 1864 to 1923, the "Echo des Alpes" from 1872 to 1924, the ALPEN from 1925 to 2011. The corpus currently comprises nearly 45 million words from more than 100,000 book pages and is variously annotated (text structure, part-of-speech, lemmas, toponyms, etc.). The following languages are represented: German, French, Italian, Rhaeto-Romanic (Romansh), Swiss German, English. Available [[http://textberg.ch/site/en/corpora/|here]].  </fs>\\+<fs small>22. The [[http://textberg.ch/site/de/willkommen/|Text+Berg]] corpus was created at the Department of Computational Linguistics. It consists of the digitalize volumes of "The yearbooks of the Swiss Alpine Club” from 1864 to 1923, the "Echo des Alpes" from 1872 to 1924, the ALPEN from 1925 to 2011. The corpus currently comprises nearly 45 million words from more than 100,000 book pages and is variously annotated (text structure, part-of-speech, lemmas, toponyms, etc.). The following languages are represented: German, French, Italian, Rhaeto-Romanic (Romansh), Swiss German, English. Available [[http://textberg.ch/site/en/corpora/|here]].  </fs>\\
 \\ \\
-<fs small>24. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/Bullinger-Digital.html|Bullinger Digital]] corpus is created at the Department of Computational Linguistics. It consists of 2000 letters that Heinrich Bullinger wrote and 10,000 letters that he received have been preserved. The originals are kept in the Zurich State Archives and the Zurich Central Library. 80% of the letters are in Latin, most of the others in Early New High German. About 2900 letters have already been manually transcribed and edited. They can be searched [[http://teoirgsed.uzh.ch/|online]]. Another 5000 letters have been transcribed and are available as electronic texts.   </fs>\\+<fs small>23. The [[https://www.cl.uzh.ch/en/texttechnologies/research/digital-humanities/Bullinger-Digital.html|Bullinger Digital]] corpus is created at the Department of Computational Linguistics. It consists of 2000 letters that Heinrich Bullinger wrote and 10,000 letters that he received have been preserved. The originals are kept in the Zurich State Archives and the Zurich Central Library. 80% of the letters are in Latin, most of the others in Early New High German. About 2900 letters have already been manually transcribed and edited. They can be searched [[http://teoirgsed.uzh.ch/|online]]. Another 5000 letters have been transcribed and are available as electronic texts.   </fs>\\
 \\ \\
-<fs small>25. The [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|CoNTra_corpora: the Federal Gazette]] was created at the Department of Computational Linguistics. The Federal Gazette is a journal published by the Swiss Government. The journal is a political newsletter concerned with resolutions and laws of the Swiss Confederation. This corpus contains the German-French and French-German parallel sentences mined with Laser from the digitized Federal Gazette. The heavily filtered corpus contains 1.3 million parallel sentence pairs in both directions. Available [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|here]].  </fs>\\+<fs small>24. The [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|CoNTra_corpora: the Federal Gazette]] was created at the Department of Computational Linguistics. The Federal Gazette is a journal published by the Swiss Government. The journal is a political newsletter concerned with resolutions and laws of the Swiss Confederation. This corpus contains the German-French and French-German parallel sentences mined with Laser from the digitized Federal Gazette. The heavily filtered corpus contains 1.3 million parallel sentence pairs in both directions. Available [[https://github.com/ZurichNLP/CoNTra_corpora/tree/main/federal_gazette|here]].  </fs>\\
 \\ \\
-<fs small> 26. The [[https://phoible.org/|PHOIBLE]] database was created at the Department of Comparative Linguistics. It is repository of cross-linguistic phonological inventory data (more than 1000 languages), which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages. Available [[https://phoible.org/|here]].   </fs>\\+<fs small> 25. The [[https://phoible.org/|PHOIBLE]] database was created at the Department of Comparative Linguistics. It is repository of cross-linguistic phonological inventory data (more than 1000 languages), which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages. Available [[https://phoible.org/|here]].   </fs>\\
 \\ \\
-<fs small> 27. The  [[http://www.meta-net.eu/whitepapers/overview|European Language Grid resource collection for the languages in Switzerland]] was created of the Department of Comparative Linguistics with the occasion of their participation in the European Language Equality (ELE) European project. It consist of over 100 resources: corpora (<60), applications (<40) and  lexica. Many of the resources are multilingual: between French, German and Italian, as well as Romansh. Access can be acquired by writing to [[arios@ifi.uzh.ch|Dr. Annette Rios]] or to the  [[contact@clarin-ch.ch|CLARIN-CH Coordination Office]].  </fs>\\+<fs small> 26. The  [[http://www.meta-net.eu/whitepapers/overview|European Language Grid resource collection for the languages in Switzerland]] was created of the Department of Comparative Linguistics with the occasion of their participation in the European Language Equality (ELE) European project. It consist of over 100 resources: corpora (<60), applications (<40) and  lexica. Many of the resources are multilingual: between French, German and Italian, as well as Romansh. Access can be acquired by writing to [[arios@ifi.uzh.ch|Dr. Annette Rios]] or to the  [[contact@clarin-ch.ch|CLARIN-CH Coordination Office]].  </fs>\\
 \\ \\
-<fs small>28. The [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]] corpus was created in a project funded by the SNSF and thanks to a collaboration among the Universities of Zurich, Bern, Neuchâtel and the University of Leipzig. The Swiss WhatsApp corpus is now available as an open access resource with more than 5 mio tokens in all four national languages of Switzerland. You find the documentation and the access to the corpus [[https://www.whatsup-switzerland.ch/index.php/en/|here]].  </fs> +++++<fs small>27. The [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]] corpus was created in a project funded by the SNSF and thanks to a collaboration among the Universities of Zurich, Bern, Neuchâtel and the University of Leipzig. The Swiss WhatsApp corpus is now available as an open access resource with more than 5 mio tokens in all four national languages of Switzerland. You find the documentation and the access to the corpus [[https://www.whatsup-switzerland.ch/index.php/en/|here]].  </fs> ++++
  
 ++++ Other languages | ++++ Other languages |
-<fs small>29. The [[https://www.uzh.ch/clrp/|CLC (Chintang Language Corpus)]] corpus was created at Department of Comparative Language Science. It is a multimedia corpus of Chintang (Tibeto-Burman, Nepal, ca. 5000 speakers); ca. 300 hours (1.2 million) words transcribed, most translated into Nepali and English, and morphologically annotated (segments, functions, POS). Includes data from adults (responsibility Seminar for ASW) and longitudinal data on language acquisition (responsibility Psycholinguistics Unit). </fs>\\+<fs small>28. The [[https://www.uzh.ch/clrp/|CLC (Chintang Language Corpus)]] corpus was created at Department of Comparative Language Science. It is a multimedia corpus of Chintang (Tibeto-Burman, Nepal, ca. 5000 speakers); ca. 300 hours (1.2 million) words transcribed, most translated into Nepali and English, and morphologically annotated (segments, functions, POS). Includes data from adults (responsibility Seminar for ASW) and longitudinal data on language acquisition (responsibility Psycholinguistics Unit). </fs>\\
 \\ \\
-<fs small> 30. The [[https://www.zora.uzh.ch/id/eprint/85666/|NNC (Nepali National Corpus)]] corpus was created at the Department of Comparative Language Science. It consists of Nepali texts from various genres, with 14’000’000 words. The majority of the texts are primarily written, with a small portion transcribed from recordings.  </fs>\\+<fs small> 29. The [[https://www.zora.uzh.ch/id/eprint/85666/|NNC (Nepali National Corpus)]] corpus was created at the Department of Comparative Language Science. It consists of Nepali texts from various genres, with 14’000’000 words. The majority of the texts are primarily written, with a small portion transcribed from recordings.  </fs>\\
 \\ \\
-<fs small> 31. The [[http://sealang.net/library/|SEAlang]] corpus was created at the Department of Comparative Language Science. It consists of audio recordings (conversations, elicitation, stories), texts of Southeast Asian languages for linguistic purposes (language description, areal typology); some texts of literary interest (Southeast Asian traditions and beliefs); transcript of audio partly in indigenous scripts, partly already IPA, some with glosses and translation. Audio recordings in Mon amount to some 10 hours, Burmese about 8 hours, Karen (Pwo) and Nyahkur about 1 hour each. Transcripts of Mon texts (indigenous script and/or transcription) estimated 60,000 words (including literary texts), Burmese about 30,000 words (including e-mail communication), Karen (indigenous script, hand written) and Nyahkur (indigenous script, hand written) about 5000 words each. A total of 100,000 words are transcribed (a total of 50 hours). Available [[http://sealang.net/library/|here]].   </fs>\\+<fs small> 30. The [[http://sealang.net/library/|SEAlang]] corpus was created at the Department of Comparative Language Science. It consists of audio recordings (conversations, elicitation, stories), texts of Southeast Asian languages for linguistic purposes (language description, areal typology); some texts of literary interest (Southeast Asian traditions and beliefs); transcript of audio partly in indigenous scripts, partly already IPA, some with glosses and translation. Audio recordings in Mon amount to some 10 hours, Burmese about 8 hours, Karen (Pwo) and Nyahkur about 1 hour each. Transcripts of Mon texts (indigenous script and/or transcription) estimated 60,000 words (including literary texts), Burmese about 30,000 words (including e-mail communication), Karen (indigenous script, hand written) and Nyahkur (indigenous script, hand written) about 5000 words each. A total of 100,000 words are transcribed (a total of 50 hours). Available [[http://sealang.net/library/|here]].   </fs>\\
 \\ \\
-<fs small>32. The [[https://www.mlat.uzh.ch/home|Corporum]] corpus was created at the Medieval Latin Seminar. It is a collection of medieval Latin texts. Available [[https://www.mlat.uzh.ch/browser?path=MLS/|here]].   </fs>\\+<fs small>31. The [[https://www.mlat.uzh.ch/home|Corporum]] corpus was created at the Medieval Latin Seminar. It is a collection of medieval Latin texts. Available [[https://www.mlat.uzh.ch/browser?path=MLS/|here]].   </fs>\\
 \\ \\
-<fs small>33.The [[https://www.ieu.uzh.ch/en/research/evolbiol/humangen_langdiv.html|GeLaTo (Genes and Languages Together)]] database was created at the Department of Comparative Language Science in collaboration with the Department of Evolutionary Biology and Environmental Studies. It is a new resource developed to link genomic data to cultural and linguistic identifiers and promote multidisciplinary research. Access upon [[chiara.barbieri@ieu.uzh.ch|request]].   </fs>\\+<fs small>32.The [[https://www.ieu.uzh.ch/en/research/evolbiol/humangen_langdiv.html|GeLaTo (Genes and Languages Together)]] database was created at the Department of Comparative Language Science in collaboration with the Department of Evolutionary Biology and Environmental Studies. It is a new resource developed to link genomic data to cultural and linguistic identifiers and promote multidisciplinary research. Access upon [[chiara.barbieri@ieu.uzh.ch|request]].   </fs>\\
 \\ \\
-<fs small> 34. The [[https://www.autotyp.uzh.ch/|AUTOTYP]] database was created at the Department of Comparative Language Science in collaboration with the University of California. It represents an international network of typological linguistic databases. AUTOTYP is a large-scale research program with goals in both quantitative and qualitative typology. Quantitative typology is interested in detecting and explaining geographical distributions of typological features and in producing statistical estimates of universal preferences as well as of genealogical inheritance and areal diffusion potentials. Qualitative typology aims at a systematic analysis of the kinds of variation found in various typological domains. Available [[https://www.autotyp.uzh.ch/available.html|here]].   </fs>\\+<fs small> 33. The [[https://www.autotyp.uzh.ch/|AUTOTYP]] database was created at the Department of Comparative Language Science in collaboration with the University of California. It represents an international network of typological linguistic databases. AUTOTYP is a large-scale research program with goals in both quantitative and qualitative typology. Quantitative typology is interested in detecting and explaining geographical distributions of typological features and in producing statistical estimates of universal preferences as well as of genealogical inheritance and areal diffusion potentials. Qualitative typology aims at a systematic analysis of the kinds of variation found in various typological domains. Available [[https://www.autotyp.uzh.ch/available.html|here]].   </fs>\\
 \\ \\
-<fs small> 35. The [[https://gitlab.uzh.ch/uzh-slavic-corpora|Zurich Corpora of Slavic Varieties (ZuCoSlaV)]] was created at the Department of Slavonic Studies. It consists of four corpora. 1. [[https://gitlab.uzh.ch/uzh-slavic-corpora/macedonian-dialect-corpus|Macedonian Spoken Corpus]], which comprises transcriptions of audio files collected in a series of field research trips in the Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019. 2. [[https://gitlab.uzh.ch/uzh-slavic-corpora/pre-standardized-balkan-slavic-literature|Pre-Standardized Balkan Slavic Literature corpus]], which includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens). 3. [[https://gitlab.uzh.ch/uzh-slavic-corpora/torlak|Torlak corpus]], which contains transcripts of interviews about traditional culture and history with speakers of Torlak from the Timok area. It comprises 500,697 tokens representing 80 h of recording. 4. [[https://gitlab.uzh.ch/uzh-slavic-corpora/serbian-forms-of-address|Serbian Forms of Address corpus]], which contains transcripts of interviews about forms of address that Serbian speakers use in colloquial and formal settings. It consists of 171,552 tokens, corresponding to about 19 h of recording. Available [[https://gitlab.uzh.ch/uzh-slavic-corpora|here]]. </fs>+<fs small> 34. The [[https://gitlab.uzh.ch/uzh-slavic-corpora|Zurich Corpora of Slavic Varieties (ZuCoSlaV)]] was created at the Department of Slavonic Studies. It consists of four corpora. 1. [[https://gitlab.uzh.ch/uzh-slavic-corpora/macedonian-dialect-corpus|Macedonian Spoken Corpus]], which comprises transcriptions of audio files collected in a series of field research trips in the Prespa, Bitola and Debar regions in 2012, 2014, 2016 and 2019. 2. [[https://gitlab.uzh.ch/uzh-slavic-corpora/pre-standardized-balkan-slavic-literature|Pre-Standardized Balkan Slavic Literature corpus]], which includes various Balkan Slavic texts from the 15th-19th century. The annotated section includes 20 shorter texts with full morphological and syntactic annotation (48k tokens). The raw section contains 14 sources digitized manually or automatically as a whole (ca. 1M tokens). 3. [[https://gitlab.uzh.ch/uzh-slavic-corpora/torlak|Torlak corpus]], which contains transcripts of interviews about traditional culture and history with speakers of Torlak from the Timok area. It comprises 500,697 tokens representing 80 h of recording. 4. [[https://gitlab.uzh.ch/uzh-slavic-corpora/serbian-forms-of-address|Serbian Forms of Address corpus]], which contains transcripts of interviews about forms of address that Serbian speakers use in colloquial and formal settings. It consists of 171,552 tokens, corresponding to about 19 h of recording. Available [[https://gitlab.uzh.ch/uzh-slavic-corpora|here]]. </fs>
 ++++ ++++
 </WRAP> </WRAP>
resources/uzh.1709890662.txt.gz · Last modified: 2024/03/08 10:37 by Seraina Nadig