
Modelling of Multi-Modal Data in LiRI Corpus Platform and Beyond
April 28 @ 3:30 pm - 5:00 pm
While there exist common solutions and standard practices to model text-only corpora, multimodal conversation corpora present the unique challenge of integrating textual transcripts, temporal information and annotations that can equally relate to speech and non-speech modalities. The LiRI Corpus Platform (LCP) was designed to accommodate both textual and multimodal corpora, allowing for the modeling of time-aligned annotations associated with multimedia files, making it possible for users of the platform to efficiently browse and query multimodal corpora. LCP aims to become a solution of reference for hosting multimodal corpora and we especially encourage curators of such corpora to attend this session.
In this session, we will report on the process of importing a multimodal conversation corpus into LCP as part of the FAIR FI-LD project. We will give an overview of the different file formats that were involved in the modelling of the corpus and explain how we converged on using the TEI/ISO format as our interoperable standard. Participants will then engage in a practical exercise as they are guided through a simple Python script to import a simplified sample corpus into LCP. We will end the session with a general discussion about setting up workflows for the curation of multi-modal corpora and about the possibilities of automatizing the extraction and annotation of speech from videos.
Speakers: Johanna Miecznikowski-Fuenfschilling, Teodora Vuković and Jeremy Zehr
Please note that the Zoom link will be sent to participants upon registration: 📝 Register now