Workshops

Workshop 1-A (Python Beginners).

Python Essentials and Text Analysis with NLTK

Uldis Bojārs

In this introductory session, students will become acquainted with the versatile Jupyter Notebooks platform for Python programming. We will begin by exploring Python's fundamental data structures, including lists, dictionaries, and sets. Next, we will delve into text analysis by learning how to read, filter, and convert text files, as well as work with folders. By introducing the Natural Language Toolkit (NLTK) library, students will gain hands-on experience in processing and analyzing textual data. We will also explore how ChatGPT can help with programming tasks. The first day will provide a strong foundation in Python and set the stage for more advanced topics on Day 2.

Uldis Bojārs is a computer scientist interested in the fields of Semantic Web, Open Data and Digital Libraries. He has a Ph.D. in Computer Science from the National University of Ireland, Galway, focusing on the Semantic Web and its applications. Uldis is an Associate Professor at the Faculty of Computing, University of Latvia, where he shares his knowledge and expertise with students, and a Data Semantics Development Manager at the National Library of Latvia, where he works on library linked data projects, enhancing information accessibility and resource sharing. At the University of Latvia, Uldis' teaching activities include the Python programming language, emphasizing its practical applications in diverse areas, including natural language processing.

Workshop 1-B (Python Intermediate).

Introduction to Text Content Analysis and Preprocessing

Valdis Saulespurēns

The first day is dedicated to preparing text corpora for analysis. Participants will engage with essential preprocessing techniques, including text cleaning, tokenization, stemming, and lemmatization, utilizing prominent Python libraries such as Pandas and Spacy. Additionally, the integration of AI assistance, with a focus on Large Language Models (LLMs), will be introduced to enhance the efficiency of text analysis and workflow.
This session targets individuals with a basic understanding of Python, aiming to equip them with the necessary skills for digital humanities research. The workshop is structured to provide insights into transforming textual data for in-depth analysis and exploring the application of AI tools in research projects.

Valdis Saulespurēns works as a researcher and developer at the National Library of Latvia. Additionally, he is a lecturer at Riga Technical University, where he teaches Python, JavaScript, and other computer science subjects. Valdis has a specialization in Machine Learning and Data Analysis, and he enjoys transforming disordered data into structured knowledge. With more than 30 years of programming experience, Valdis began his professional career by writing programs for quantum scientists at the University of California, Santa Barbara. Before moving into teaching, he developed software for a radio broadcast equipment manufacturer. Valdis holds a Master's degree in Computer Science from the University of Latvia. When not working or spending time with his family, Valdis enjoys biking and playing chess, sometimes even at the same time.

Workshop 2-B (Python Intermediate).

Advanced Discourse Analysis, Machine Learning, and Visualization

Valdis Saulespurēns

On the second day participants will work with intermediate analytical techniques using Python libraries for topic modeling and sentiment analysis. Participants will explore how to extract themes and gauge sentiment from large text datasets, employing libraries designed for these sophisticated tasks. The day's agenda includes practical exercises with APIs from leading tech entities such as OpenAI and/or Google, offering a hands-on experience in leveraging external tools for enriching analysis. This session aims to broaden the participants' skill set in text analysis, demonstrating the power of integrating Python with cutting-edge technology platforms to uncover deeper insights in digital humanities research.

Workshop 2-A (Python Beginners).

Data Manipulation and Visualization with Pandas, Matplotlib and Plotly

Uldis Bojārs 

Building upon the foundation established on Day 1, this session will focus on data manipulation and visualization using the powerful Pandas, Matplotlib and Plotly libraries. Students will learn how to load, filter, and transform data using Pandas DataFrames, as well as perform basic statistical analysis. We will then explore data visualization techniques, such as bar charts, scatter plots, and line graphs, by leveraging the capabilities of Matplotlib and Plotly. By the end of Day 2, participants will have gained valuable skills in Python programming, enabling them to analyze and visualize real-world data sets with confidence and ease.

Workshop 3

Rapidly annotating and analyzing textual and visual data with zero-shot LLMs

Andres Karjus

The increasing capacities of instructable large language models (LLMs) presents an unprecedented opportunity to scale up data analytics in the humanities, cultural studies and social sciences, and to automate qualitative tasks previously typically allocated to human labor. Of particular interest here is the capacity to use LLMs as zero-shot classifiers and inference engines. While classifying texts or images for various properties has been available for a while in the form of supervised learning, the necessity to train such models (or even tune pretrained models) on sufficiently large sets of laboriously labeled examples has arguably hampered their adoption. We will look into recent applications of LLMs to analytics, discuss transparency and replicability, and run some experiments of our own.

Andres Karjus is a multidisciplinary scientist with a background in linguistics (University of Edinburgh, University of Tartu) and artificial intelligence (KU Leuven). His research focuses on the study of language, media, and culture, particularly changes over time and the evolutionary processes that drive them. He employs methods ranging from machine learning and statistics to cognitive experiments. In addition to his academic work, Andres also operates as an instructor and consultant in digital skills and AI (more on https://andreskarjus.github.io).

Workshop 4

Creating and Analysing Multilingually Comparable Text Corpora

Normunds Grūzītis, Artūrs Znotiņš

In this workshop, participants will explore the process of transforming an unstructured text collection into a grammatically annotated text corpus. Using Python and established multilingual NLP libraries, attendees will learn how to process text documents, segmenting them into sentences, words and other tokens, lemmatizing the words, and further annotating each token with a part-of-speech tag and a Universal Dependencies (UD) role to ensure consistency across different languages. The workshop will emphasize the use of UD for uniform annotations, facilitating queries and linguistic analysis across multilingual corpora. Participants will prepare the annotated corpora in a data format suitable for importing, indexing and querying this data in an open-source corpus platform like NoSketch Engine or Korp.

Normunds Grūzītis is an associate professor at University of Latvia, Faculty of Computing, and a lead researcher at AI Lab, Institute of Mathematics and Computer Science, University of Latvia. He has 20 years of experience in language technology and digital humanities. He has coordinated several research and innovation projects on natural language processing and creation of advanced language resources.

Artūrs Znotiņš is a researcher and a lead software engineer at AI Lab, IMCS, University of Latvia. He is a PhD candidate in computer science. Artūrs has 15 years of experience in language technology and machine learning. He has developed state-of-the-art language models for text and speech processing in Latvian.

Workshop 6

ChatGPT for Humanities Research

Līva Rotkale

This workshop introduces humanities scholars to ChatGPT, an artificial intelligence program adept at generating human-like text responses from user input. The workshop begins with a demonstration detailing the workflow and offering practical tips for using both the free and paid versions of ChatGPT effectively. With a foundational understanding of the tool’s capabilities and limitations, participants will then embark on hands-on tasks exclusively using the free version (GPT-3.5). These tasks encompass a range of activities: formulating precise prompts, generating text summaries, creating reading comprehension questions, translating and paraphrasing texts, analyzing arguments, brainstorming innovative topics, structuring essays, and formatting references. The workshop concludes with a Q&A session, allowing attendees to share their insights and address challenges.

Līva Rotkale is an analytic philosopher who pursues metaphysics, likes logic, and loves Aristotle. With more than ten years of teaching experience at the University of Latvia, Līva has developed and taught courses on logic, metaphysics, ancient philosophy, philosophy of language, meta-ethics, and aesthetics. Her published research is primarily devoted to mereology (the theory of part and whole), specifically in relation to Aristotle’s conception of matter and form, ontology of kinds, and semantic ambiguity. In addition, she has contributed to the Latvian translation and publication of Aristotle’s Rhetoric. Currently, Līva is interested in the opportunities and risks created by large language models, their usability in humanities research and education.

Workshop 5

Navigating Visual Collections using Image Embeddings

Mar Canet Solà

This practical workshop delves into image embeddings, and the Collection Space Navigator (CSN) tool developed by CUDAN research group researchers (https://cudan.tlu.ee). The session is tailored to acquaint participants with the fundamentals of image embeddings, providing a launchpad for navigating vast visual datasets effortlessly. The workshop will begin with a brief introduction to various visual embeddings, followed by a step-by-step demonstration on preparing embeddings of visual datasets. Some time is dedicated to discussing the features and capabilities of the CSN tool, equipping participants with the know-how to harness its potential use in their research projects. Participants are encouraged to bring their own visual datasets (image collections and metadata). Utilizing a subset might be more manageable for the scope of this workshop if the data is large. For those without personal datasets, we will provision data to engage in the hands-on exercises. Given the novelty of the CSN tool, we value your feedback immensely and would appreciate participants completing a brief survey regarding their experience with the tool. Participants should bring their laptops and download the open-source CSN tool (https://github.com/Collection-Space-Navigator/CSN) ahead of he workshop.

Lectures

Artificial Intelligence at the National Library of Norway

Javier de la Rosa

The integration of Artificial Intelligence technologies has profoundly impacted various sectors, including that of libraries and cultural institutions. The National Library of Norway (Nasjonalbiblioteket) stands at the forefront of this digital transformation, leveraging AI to revolutionize its operations, services, and user experiences. Through real-world examples, we will delve into the innovative applications of AI within the context of Norway's national repository of knowledge and cultural heritage, from enhancing cataloging to improving accessibility. The library's pioneering work includes developing AI models for Norwegian languages, including Sámi, ensuring accuracy and inclusivity. We hope to showcase AI's transformative potential in advancing cultural preservation and public access.

Javier de la Rosa is a Senior Research Scientist at the Artificial Intelligence Lab at the National Library of Norway. A former Postdoctoral Fellow at UNED Digital Humanities Innovation Lab, he holds a PhD in Hispanic Studies with a specialisation in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville. Javier has previously worked as a Research Engineer at the Stanford University Center for Interdisciplinary Digital Research, and as the Technical Lead at the University of Western Ontario CulturePlex Lab for Cultural Complexity. He is interested in Natural Language Processing applied to historical and literary text, with a special focus on large language models.

Latgalian Language Corpora and Other Digital Resources in the Context of European Lesser-used Languages

Sanita Martena
Ilga Šuplinska
Antra Kļavinska

Corpora of lesser used languages play an important role in the documentation and development of the language and are also a valuable resource for the preparation of linguistic tools and teaching materials. In the first part of the session we will give insights into the current process of developing two corpora (written and speech) and a comprehensive electronic dictionary of Latgalian, a regional language in Eastern Latvia. In our lecture we will explain sources and principles of compiling these resources and compare the corpus of written Latgalian (MuLa) to corpora of other lesser used languages of Europe. We will further share our experience in using those resources in research and education as well as for increasing language awareness among speakers and all interested in Latgalian. In the summary part we also will indicate challenges.

In the second part (Workshop) we will invite participants to explore and work practically with examples from digital platforms and tools such as www.lingvistiskakarte.lv, www.tavaklase.lv, www.futureofmuseums.eu, https://iepazisimies.rta.lv/, Latgalian vowel pronunciation trainer and the digital e-book “Minority Languages” (2023). Participants will explore the unit on Latgalian through interactive reading, filling tasks, watching videos, and thereby learn about the language, culture and Latgalian community living in Latvia and abroad. Other units of the book will allow later on to investigate independently other endangered languages of Europe (South Saami, West-Frisian or Mirandese).

Dr Sanita Martena is Professor in Applied Linguistics, her research interests include language and educational policies, multilingualism, lesser-used languages, language corpora of Latgalian, language education planning at schools, family language policy.

Dr Ilga Šuplinska is Literary scholar, Head of Latgalian Culture Society, concept and content creator for educational computer games and some other digital resources, her research interests include approaches and methods in literary studies and education, gender issues, textual reception, Latgalian cultural concepts and language.

Dr Antra Kļavinska is lead researcher and docent in Baltic linguistics at Rēzekne Academy of technologies (Latvia). Her research interests and academic work are related to ethnolinguistics, onomastics, corpus linguistics and promotion of corpus literacy.