LECTURES
Digital Humanities and Small Languages: Expectations and Reality
Liina Lindström
Kristel Uiboaed
July 17, 9.50 - 11.20 (1 hr 30 min)SUMMARY: Digital humanities are gradually becoming an inseparable part of all branches of humanities research. Methods used in other fields are increasingly implemented in humanities, and new methods are constantly being developed. On the one hand, this is a natural development in interdisciplinary research but on the other hand it is an inevitable outcome of the availability of massive data in the humanities already in digital form. Using digital data presupposes digital methods and skills to handle this data. This creates new opportunities for more versatile research and opens the humanities to greater collaboration with other fields. Objectively, this should suffice to enforce these positive trends in humanities research. However, there are constant obstacles in doing so. Technical skills and know-how for working with data are still generally not part of humanities curricula. Methods and tools developed in other fields cannot always be directly transferred to solving research questions in the humanities and often need to be adapted before implementation. Additional problems may rise when the data comes from poorly-resourced and less studied language. Tools developed for English, for example, cannot be directly transferred to languages with more complex morphology. Furthermore, there are fewer people working in the field and the workload cannot always be distributed as needed, i.e. the researcher should do everything by her/himself. This imposes a restriction on research topics that can be exhaustively studied.
In this presentation we discuss
problems that are common to digital humanities research, concentrating on
characteristics of research in small and complex languages like Estonian. We
introduce data and tools we have implemented in our DH projects at the University
of Tartu and give an overview of obstacles and problems we have faced in this
process. As our background is in linguistics, we mostly discuss linguistic data and language technology tools. In the end
we summarize the main problems and provide some solutions to improve DH
research and mutually lucrative interdisciplinary collaboration.
Liina Lindström is an associate professor
of Estonian language at the University of Tartu. She has studied Estonian word
order, syntax of spoken Estonian as well as syntax of Estonian dialects, using linguistic
corpora. She has leaded a compilation and annotation of the Corpus of Estonian
Dialects, which is a huge collection of spoken dialect data in digital form.
More recently, she has been in charge of developing possibilities to teach and
learn digital humanities at the University of Tartu.
Fredrik Norén
July 17, 11.35 – 12.35 (1 hr)
Rimvydas Laužikas
July 17, 12.35 – 13.35 (1 hr)Kārlis Dagilis
July 18, 15.20 - 16.50 (1 hr 30 min)SUMMARY: Some say that there are no bigger lies than statistics. However, for journalists whose main task is to seek the truth numbers are everything. Diving deep into complex data sets can help find a story or reveal a problem otherwise well hidden. While investigative journalists in the United States have been using open data sets for decades, it is now more than ever this method takes a significant role in producing groundbreaking stories about our life in the digital world. Moreover, it has exceeded the borders of journalism or communication science. Nowadays it is a highly interdisciplinary field, impossible to imagine without computer science.
Karlis Dagilis is a lecturer in Radio Journalism at the University of Latvia. For more than 17 years he has been working as a journalist for Latvian national broadcasting organizations Latvijas Radio (LR) and Latvijas Televīzija (LTV). Karlis is also a founder of multimedia radio station for youth Pieci.lv. During his Hubert H. Humphrey fellowship program in 2016/2017, he did extensive studies of data journalism at The University of Maryland, which has one of the leading journalism colleges in United States. Furthermore, he collaborated with the Pulitzer Prize winning journalist Dana Priest from The Washington Post to shed the light of Russia's interference into U.S. presidential elections. As of fall 2018, Karlis will start his PhD studies at The Philip Merrill College of journalism.
Elīna Lange-Ionatamišvili
July 20, 15.30 - 17.00 (1 hr 30 min)Elīna Lange-Ionatamišvili is a Senior Expert at the NATO Strategic Communications Centre of Excellence (NATO StratCom COE) in Riga, Latvia. She holds MA in Communications Science (2006) and has spent large part of her career working for the Ministry of Defence of Latvia and NATO. Large part of Elīna’s work at the NATO StratCom COE is related to the analysis of Russia’s information confrontation in the context of the new generation warfare and on strategic communications terminology. She is also a Trainer on Behavioural Dynamics Institute’s Target Audience Analysis Methodology.
WORKSHOPS
Nika Aleksejeva
July 17, 14.20 - 18.20 (3 hr 45 min)
Ilze Auziņa
Baiba Saulīte
July 18, 9.30 - 15.00 (4 hr 30 min)- The history of corpus linguistics
- Definition and content of a corpus
- Types of text corpora (general, specialized, monolingual, parallel, text, speech etc.)
- Quantitative data
- Corpora and computational linguistics
- The use of corpora for different purposes
- Corpus queries and regular expressions
- Freely available corpora tools
- Morphologically annotated corpora (part of speech analysis and tagging of a corpus)
- Searching in morphologically annotated corpora
Ilze Auziņa, PhD, is a leading researcher at the Institute of Mathematics and Computer Science, University of Latvia. Ilze is Latvian linguist, defended her PhD thesis on computational phonology investigating syllable structure, grapheme-phoneme correspondences, phonotactics of Latvian. She co-authored “The grammar on modern Latvian” on phonetics and phonology. Ilze has more than 20 years experience in phonological un phonetic analysis of Latvian. Ilze also has an experience in the speech data processing and analysis, development of speech synthesis system and automatic phonetic transcription system. She has carried out several specialized corpora development projects (as the project coordinator and the leading researcher), for example, The Corpus of the Transcripts of the Saeima’s (Parliament of Latvia) Sessions, An annotated longitudinal Latvian children's speech corpus, The Latvian Speech Recognition Corpus.
Baiba Saulīte, PhD, is a leading researcher at the Institute of Mathematics and Computer Science, University of Latvia. Baiba is Latvian linguist, defended her PhD thesis on word order and information structure in Latvian. She co-authored “The grammar on modern Latvian” on syntax and information structure. Baiba has over 10 years experience in morphological, syntactic and semantic analysis of Latvian. Her research in computational linguistics focuses on multi-layered semantically annotated language resources for Latvian (anchored in widely acknowledged multilingual representations like AMR, PropBank, FrameNet, Universal Dependencies, etc.) needed for natural language processing. She is currently working on the analysis of deverbal derivatives in Latvian.
Jan Rybicki
July 19, 9.30 - 18.15 (7 hr. 30 min)If the participants wish to work on their own computers, they are strongly recommended to download and install R and gephi (and check if they are functioning correctly on their computers).
- R: cran.r-project.org/
- Gephi: gephi.org/
- Download link for sample text collection for first analysis: https://1drv.ms
If the participants plan to try out the new methods on their own texts, these should be in plain text (.txt) format, UTF-8 encoded. Preferably, the file names should follow the pattern: author_title_date.txt (keep the underscores). It makes sense to bring texts by at least five authors, at least two texts each (from short story to novel or full piece of drama) – but the more the merrier.
Pim van Bree
Geert Kessels (LAB1100)
July 20, 9.30 - 15.00 (4 hr 30 min)SUMMARY: A well thought-out database for digital history projects allows for various modes of analysis, visualisation, and interconnectivity. Each database with historical data requires a thorough understanding of the underlying conceptual data model and logical data model. Moreover, the interface at hand has to be scrutinised as well. This workshop will deal with the following three distinct levels of any data modelling process:
1. Creating a conceptual data model
What are the types of information that can be identified in the research process, and how do they relate to one another?
2. Creating a logical data model
How will different kinds of information be stored and how to deal with vague / ambiguous / uncertain / contradictory / unique / irregular data?
3. Using a database application
Which options does the database application offer and how can the conceptualised data model best be implemented?
During the workshop, we will first focus on getting a good understanding of these three distinct levels and explore how these levels inform each other. After this, participants will be able to create/refine a data model of their own and learn how to implement this in nodegoat.
Requirements
No prior knowledge is required to attend this workshop. Participants are required to bring their own laptop to the workshop. No new software has to be installed, as you only need to use a (modern) web-browser.
LAB1100 is a research and development firm established in 2011 by Pim van Bree and Geert Kessels. Their joint skill set in new media, history, and software development allows them to conceptualise and develop complex software applications. Working together with universities, research institutes, and musea, LAB1100 has built the digital research platform nodegoat and produces interactive data visualisations.
Pim van Bree received his MA in New Media and Digital Culture at the University of Amsterdam. He graduated with a thesis on the actor network of transnational online dating, investigating the crossroads between the local, national, global, and the online assemblage. His work experience in the field of new media: digital strategist at Tribal DDB Amsterdam and software developer at KIWA.
Geert Kessels received his BA in History from Radboud University Nijmegen and completed the research master program in History at the University of Amsterdam. He graduated with a thesis on the influences of German Idealism on the Slovak romantic intellectual Ľudovit Štúr. During his studies he completed an internship at the Study Platform on Interlocking Nationalisms and worked as a project manager for EUROCLIO - The European Association of History Educators.