Workshops
Workshop 1 (Python).
Language Processing and Visualization with Python I
Uldis Bojārs
Valdis Saulespurēns
July 26
- Using Jupyter Notebooks for interactive coding
- Basics of Python programming language:Python data structuresprogram flowbasic functions
- Importing data in Python
- Text pre-processing, tokenization, lemmatization
Valdis Saulespurens is a programming instructor at Riga Coding School. He teaches adult professionals new to programming Python and Javascript among other skills. Valdis specializes in Data Analysis and Web Scraping. He enjoys wrangling unruly data into structured knowledge. Valdis has over 20 years programming experience. He wrote his first professional programs for quantum scientists at University of California at Santa Barbara. Prior to teaching he wrote software for a radio broadcast equipment manufacturer. He holds a Master's degree in Computer Science from University of Latvia. When not spending time with his family Valdis likes to bike and play chess, sometimes at the same time.
Workshop 2 (Python).
Language Processing and Visualization with Python II
Uldis Bojārs
Valdis Saulespurēns
July 27
- Analyzing data, text analysis
- Elements of machine learning
- Data export and visualization
Workshop 1 (R).
Basics of R for Data Analysis and
Visualization
Importing, exploring and visualizing multilingual data in R
Andres Karjus
July 26
Andres Karjus is a computational linguist and computational humanist (PhD, University of Edinburgh 2020). He studies language and culture using a combination of text corpora, computational simulations and human experiments. All these approaches produce lots of information, usually too much to analyze qualitatively - this is where careful application of machine learning and rigorous statistical modelling can help to make sense of (and make predictions based on) the data. He is currently engaged in a number of projects, on changes in language in social and traditional media, quantifying visual art complexity, and on dynamics of television and film festival programming.
Workshop 2 (R).
Basics of R for Data Analysis and Visualization
Working with (non-English) text in R
Basics of R for Data Analysis and Visualization
Working with (non-English) text in R
Andres Karjus
July 27
Workshop 3.
Web Data Harvesting
Web Data Harvesting
Marija Isupova
July 28
Workshop 4.
Network Visualisation – Sense-Making Through Design and Aesthetics
Noemi Chow, Fidel Thomet
July 29
Noemi Chow is a scientific illustrator. In her research work in the field of Knowledge Visualisation at the Zurich University of the Arts, she deals with mediation through visualisation, immersive and three-dimensional media.
Fidel Thomet is an interaction designer working on data visualisation, investigative interfaces, and speculative futures. He is a research associate at uclab, a visualisation research group situated between design, computing, and the humanities at the University of Applied Sciences Potsdam. Before that, he worked for the City of Zurich’s statistical office, the Aargauer Kunsthaus, as a Google News Lab Fellow for the Frankfurter Allgemeine Zeitung.
Lectures
Machine Learning to Read Yesterday’s News. How Semantic Enrichments Enhance the Study of Digitised Historical Newspapers
Marten Düring
July 26
Newspapers count among the most attractive sources for historical research. Following mass digitisation efforts over the past decades, researchers now face the problem of overabundance of materials which can no longer be managed with keyword search and basic content filtering techniques alone even though only a fraction of the overall archival record has actually been made available. This poses challenges for the contextualisation and critical assessment of these sources which can be effectively addressed using semantic enrichments based on natural language processing techniques. In this lecture we will discuss epistemological challenges in data exploration and interface design as well as opportunities in terms of source criticism and content exploration, based on the impresso interface.
Marten Düring is an Assistant Professor in Digital History at the Luxembourg Centre for Contemporary and Digital History (C2DH) and holds a PhD in contemporary history. His research is positioned on the intersection between historical thinking, novel computational methods, and software design. In his ongoing work prof. Düring coordinates the C²DH-based team of the impresso project for the exploration of semantically enriched historical newspapers, works as a founding editor on the Journal of Historical Network Research, coordinates the Historical Network Research Community, and contributes to the DHARPA project.
Fine-Tuning the Historian's Macroscope: Data Reuse and Medieval Korean Biographical Records in Neo4j
Javier Cha
July 27
Javier Cha is Associate Professor at Seoul National University and the principal investigator of the Big Data Studies Lab, which approaches data centers and the global telecommunications infrastructure similarly to how a medieval book historian would explore the material bibliography of manuscripts and libraries. As an intellectual historian of medieval Korea and a technologist, Cha has been active in the digital humanities community for fourteen years. Cha is the recipient of the prestigious Innovative and Pioneering Research Scheme, which provides financial support for his digital humanities research lab. He serves on the editorial boards of the International Journal of Humanities and Arts Computing and Cursor Mundi as well as the international nominations committee for Digital Humanities Awards.
Digital History Between Measuring and Interpreting
Jani Marjanen
July 27
Working with digitized data is liberating. We can suddenly do things that would have been either impossible or too time consuming before. Still, in our liberated state, we need to be extra careful in thinking about what it is our new methods actually measure and how we can interpret those results. In this talk, I will present three case studies that use historical digitized newspapers to make historical arguments. The first of them uses topic models, the second is based on word embeddings and the third on simple bigram counts. Through these cases, I will discuss the transparency of different methods, and how they make it more or less difficult to communicate to a reader what is being measured and where humanistic interpretation starts. I will argue that machine-learning methods are sometimes better for exploration and for identifying themes for qualitative analysis, whereas count-based methods can be more useful for analyzing quantitative trends in data.
Dr. Jani Marjanen is a senior researcher at the University of Helsinki. He is a historian specializing in the history of patriotism and nationalism, the history of ideology, the history of newspapers and book printing, and the theory and methodology of conceptual history. In his work, he combines traditional historical inquiry with new methods from the digital humanities. A list of his publications can be found here: https://researchportal.helsinki.fi/en/persons/jani-marjanen/publications/
There is No Journalism Without Data Journalism
Raivis Vilūns
July 28