Workshops
Workshop 1-A (Python Beginners).
Python Essentials and Text Analysis with NLTK
Uldis Bojārs
In this introductory session, students will become acquainted with the versatile Jupyter Notebooks platform for Python programming. We will begin by exploring Python's fundamental data structures, including lists, dictionaries, and sets. Next, we will delve into text analysis by learning how to read, filter, and convert text files, as well as work with folders. By introducing the Natural Language Toolkit (NLTK) library, students will gain hands-on experience in processing and analyzing textual data. We will also explore how ChatGPT can help with programming tasks. The first day will provide a strong foundation in Python and set the stage for more advanced topics on Day 2.
Uldis Bojārs is a computer scientist interested in the fields of Semantic Web, Open Data and Digital Libraries. He has a Ph.D. in Computer Science from the National University of Ireland, Galway, focusing on the Semantic Web and its applications. Uldis is an Associate Professor at the Faculty of Computing, University of Latvia, where he shares his knowledge and expertise with students, and a Data Semantics Development Manager at the National Library of Latvia, where he works on library linked data projects, enhancing information accessibility and resource sharing. At the University of Latvia, Uldis' teaching activities include the Python programming language, emphasizing its practical applications in diverse areas, including natural language processing.
Workshop 1-B (Python Intermediate).
Introduction to Text Content Analysis and Preprocessing
Valdis Saulespurēns
The first day is dedicated to preparing text corpora for analysis. Participants will engage with essential preprocessing techniques, including text cleaning, tokenization, stemming, and lemmatization, utilizing prominent Python libraries such as Pandas and Spacy. Additionally, the integration of AI assistance, with a focus on Large Language Models (LLMs), will be introduced to enhance the efficiency of text analysis and workflow.
This session targets individuals with a basic understanding of Python, aiming to equip them with the necessary skills for digital humanities research. The workshop is structured to provide insights into transforming textual data for in-depth analysis and exploring the application of AI tools in research projects.
Valdis Saulespurēns works as a researcher and developer at the National Library of Latvia. Additionally, he is a lecturer at Riga Technical University, where he teaches Python, JavaScript, and other computer science subjects. Valdis has a specialization in Machine Learning and Data Analysis, and he enjoys transforming disordered data into structured knowledge. With more than 30 years of programming experience, Valdis began his professional career by writing programs for quantum scientists at the University of California, Santa Barbara. Before moving into teaching, he developed software for a radio broadcast equipment manufacturer. Valdis holds a Master's degree in Computer Science from the University of Latvia. When not working or spending time with his family, Valdis enjoys biking and playing chess, sometimes even at the same time.
Workshop 2-B (Python Intermediate).
Advanced Discourse Analysis, Machine Learning, and Visualization
Valdis Saulespurēns
On the second day participants will work with intermediate analytical techniques using Python libraries for topic modeling and sentiment analysis. Participants will explore how to extract themes and gauge sentiment from large text datasets, employing libraries designed for these sophisticated tasks. The day's agenda includes practical exercises with APIs from leading tech entities such as OpenAI and/or Google, offering a hands-on experience in leveraging external tools for enriching analysis. This session aims to broaden the participants' skill set in text analysis, demonstrating the power of integrating Python with cutting-edge technology platforms to uncover deeper insights in digital humanities research.
Workshop 2-A (Python Beginners).
Data Manipulation and Visualization with Pandas, Matplotlib and Plotly
Uldis Bojārs
Building upon the foundation established on Day 1, this session will focus on data manipulation and visualization using the powerful Pandas, Matplotlib and Plotly libraries. Students will learn how to load, filter, and transform data using Pandas DataFrames, as well as perform basic statistical analysis. We will then explore data visualization techniques, such as bar charts, scatter plots, and line graphs, by leveraging the capabilities of Matplotlib and Plotly. By the end of Day 2, participants will have gained valuable skills in Python programming, enabling them to analyze and visualize real-world data sets with confidence and ease.
Workshop 3
Rapidly annotating and analyzing textual and visual data with zero-shot LLMs
Andres Karjus
The increasing capacities of instructable large language models (LLMs) presents an unprecedented opportunity to scale up data analytics in the humanities, cultural studies and social sciences, and to automate qualitative tasks previously typically allocated to human labor. Of particular interest here is the capacity to use LLMs as zero-shot classifiers and inference engines. While classifying texts or images for various properties has been available for a while in the form of supervised learning, the necessity to train such models (or even tune pretrained models) on sufficiently large sets of laboriously labeled examples has arguably hampered their adoption. We will look into recent applications of LLMs to analytics, discuss transparency and replicability, and run some experiments of our own.
Andres Karjus is a multidisciplinary scientist with a background in linguistics (University of Edinburgh, University of Tartu) and artificial intelligence (KU Leuven). His research focuses on the study of language, media, and culture, particularly changes over time and the evolutionary processes that drive them. He employs methods ranging from machine learning and statistics to cognitive experiments. In addition to his academic work, Andres also operates as an instructor and consultant in digital skills and AI (more on https://andreskarjus.github.io).
Workshop 4
Creating and Analysing Multilingually Comparable Text Corpora
Normunds Grūzītis, Artūrs Znotiņš
In this workshop, participants will explore the process of transforming an unstructured text collection into a grammatically annotated text corpus. Using Python and established multilingual NLP libraries, attendees will learn how to process text documents, segmenting them into sentences, words and other tokens, lemmatizing the words, and further annotating each token with a part-of-speech tag and a Universal Dependencies (UD) role to ensure consistency across different languages. The workshop will emphasize the use of UD for uniform annotations, facilitating queries and linguistic analysis across multilingual corpora. Participants will prepare the annotated corpora in a data format suitable for importing, indexing and querying this data in an open-source corpus platform like NoSketch Engine or Korp.
Normunds Grūzītis is an associate professor at University of Latvia, Faculty of Computing, and a lead researcher at AI Lab, Institute of Mathematics and Computer Science, University of Latvia. He has 20 years of experience in language technology and digital humanities. He has coordinated several research and innovation projects on natural language processing and creation of advanced language resources.
Artūrs Znotiņš is a researcher and a lead software engineer at AI Lab, IMCS, University of Latvia. He is a PhD candidate in computer science. Artūrs has 15 years of experience in language technology and machine learning. He has developed state-of-the-art language models for text and speech processing in Latvian.
Workshop 6
ChatGPT for Humanities Research
Līva Rotkale
This workshop introduces humanities scholars to ChatGPT, an artificial intelligence program adept at generating human-like text responses from user input. The workshop begins with a demonstration detailing the workflow and offering practical tips for using both the free and paid versions of ChatGPT effectively. With a foundational understanding of the tool’s capabilities and limitations, participants will then embark on hands-on tasks exclusively using the free version (GPT-3.5). These tasks encompass a range of activities: formulating precise prompts, generating text summaries, creating reading comprehension questions, translating and paraphrasing texts, analyzing arguments, brainstorming innovative topics, structuring essays, and formatting references. The workshop concludes with a Q&A session, allowing attendees to share their insights and address challenges.
Workshop 5
Navigating Visual Collections using Image Embeddings
Mar Canet Solà
Lectures
Artificial Intelligence at the National Library of Norway
Javier de la Rosa
The integration of Artificial Intelligence technologies has profoundly impacted various sectors, including that of libraries and cultural institutions. The National Library of Norway (Nasjonalbiblioteket) stands at the forefront of this digital transformation, leveraging AI to revolutionize its operations, services, and user experiences. Through real-world examples, we will delve into the innovative applications of AI within the context of Norway's national repository of knowledge and cultural heritage, from enhancing cataloging to improving accessibility. The library's pioneering work includes developing AI models for Norwegian languages, including Sámi, ensuring accuracy and inclusivity. We hope to showcase AI's transformative potential in advancing cultural preservation and public access.
Javier de la Rosa is a Senior Research Scientist at the Artificial Intelligence Lab at the National Library of Norway. A former Postdoctoral Fellow at UNED Digital Humanities Innovation Lab, he holds a PhD in Hispanic Studies with a specialisation in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville. Javier has previously worked as a Research Engineer at the Stanford University Center for Interdisciplinary Digital Research, and as the Technical Lead at the University of Western Ontario CulturePlex Lab for Cultural Complexity. He is interested in Natural Language Processing applied to historical and literary text, with a special focus on large language models.
Latgalian Language Corpora and Other Digital Resources in the Context of European Lesser-used Languages
Sanita Martena
Ilga Šuplinska
Antra Kļavinska
Corpora of lesser used languages play an important role in the documentation and development of the language and are also a valuable resource for the preparation of linguistic tools and teaching materials. In the first part of the session we will give insights into the current process of developing two corpora (written and speech) and a comprehensive electronic dictionary of Latgalian, a regional language in Eastern Latvia. In our lecture we will explain sources and principles of compiling these resources and compare the corpus of written Latgalian (MuLa) to corpora of other lesser used languages of Europe. We will further share our experience in using those resources in research and education as well as for increasing language awareness among speakers and all interested in Latgalian. In the summary part we also will indicate challenges.
In the second part (Workshop) we will invite participants to explore and work practically with examples from digital platforms and tools such as www.lingvistiskakarte.lv, www.tavaklase.lv, www.futureofmuseums.eu, https://iepazisimies.rta.lv/, Latgalian vowel pronunciation trainer and the digital e-book “Minority Languages” (2023). Participants will explore the unit on Latgalian through interactive reading, filling tasks, watching videos, and thereby learn about the language, culture and Latgalian community living in Latvia and abroad. Other units of the book will allow later on to investigate independently other endangered languages of Europe (South Saami, West-Frisian or Mirandese).
Dr Sanita Martena is Professor in Applied Linguistics, her research interests include language and educational policies, multilingualism, lesser-used languages, language corpora of Latgalian, language education planning at schools, family language policy.
Dr Ilga Šuplinska is Literary scholar, Head of Latgalian Culture Society, concept and content creator for educational computer games and some other digital resources, her research interests include approaches and methods in literary studies and education, gender issues, textual reception, Latgalian cultural concepts and language.
Dr Antra Kļavinska is lead researcher and docent in Baltic linguistics at Rēzekne Academy of technologies (Latvia). Her research interests and academic work are related to ethnolinguistics, onomastics, corpus linguistics and promotion of corpus literacy.