Tutorials

Six tutorials are offered online on Monday, July 26, at two different times. Registration fee is 30 EUR flat (reductions apply for participants from lower income countries), no matter whether you participate in one or two tutorials.

starting at 10:00 UTC+2instructorsduration
From NLP to CSS: Practical Guide to TransformersChristopher Klamm, Moritz Laurer & Elliott Ash5 h
Understanding Group Membership from Language UseElahe Naserianhanzaei & Miriam Koschate-Reis3 h
Drafting and Publishing Scientific Articles with R MarkdownJulia Schulte-Cloos1.5 h
starting at 16:00 UTC+2instructorsduration
PyNetworkshop: Hands-on Tutorial on Analyzing Social Network Data in Jupyter PythonRezvaneh Rezapour, Samin Aref, Ly Dinh & Jana Diesner6 h
Dealing with Bias and Fairness in Data Science SystemsKit Rodolfa, Pedro Saleiro & Rayid Ghani3 h
Transparent CSS Research: Post-Project Sharing of Human Participant DataDessi Kirilova, Diana Kapiszewski & Colin Elman3 h

From NLP to CSS: Practical Guide to Transformers

Christopher Klamm, Moritz Laurer & Elliott Ash

‘Transformers’ have revolutionised Natural Language Processing (NLP) since 2018. From text classification, to text summarization, to translation – the state of the art for most NLP tasks today is dominated by this type of new deep learning model architecture. This tutorial will introduce the participants to this new type of model (Part I); will demonstrate how thousands of pre-trained models can easily be used with only five (!) lines of code (Part II); outline how they can be used as powerful measurement tools in social science using concrete examples (Part III & IV) and dive deeper into the technical ingredients of transformers (Part IV). A key part of the tutorial will be the HuggingFace Transformers library and its open-source community. We hope to light the passion of the text-as-data community to contribute to and benefit from open source transformers and create closer ties with the NLP community.


Understanding Group Membership from Language Use

Elahe Naserianhanzaei & Miriam Koschate-Reis

Groups are at the heart of much of our social lives, and are therefore fundamental to the social sciences and humanities. Different groups affect individuals at different times through their norms and values. Computational techniques, and linguistic analysis in particular, have provided researchers with the means to understand which group affects an individual at a particular point in time, and what a group stands for. This tutorial will introduce a toolkit for building a classifier based on linguistic indicators that can distinguish which group membership affects an individual at the point of writing. It will provide the audience, first, with the required theoretical background of social identity theory, and what the potential applications of this toolkit are. Second, it provides hands-on examples on how to adapt this method for different groups of interest. Third, it will introduce best practices for ensuring the quality of the tool, such as combining social media data and experimental data for psychometric validation. The tutorial will also introduce techniques on how to apply the tool to different domains (domain adaptation) and, finally, a discussion around how to map similarities and differences between groups using multi-dimensional scaling. Basic experience with Python and a good working knowledge of basic statistical techniques (e.g., logistic regression) will be an advantage when participating, particularly for the hands-on part of the tutorial.


Drafting and Publishing Scientific Articles with R Markdown

Julia Schulte-Cloos

This hands-on tutorial provides participants with an in-depth understanding of how to build on R Markdown to make their research workflows fully reproducible. Acknowledging the diversity of programming languages (e.g., R, Python, SQL) and types of research outputs (e.g., manuscripts, blog posts) that are common in computational social science, the tutorial showcases an integrated and automated R Markdown research project workflow. It familiarizes participants with the basic logic of R Markdown for research outputs, while extending this basic logic for scientific use cases by relying on the power of Pandoc and Lua.


PyNetworkshop: Hands-on Tutorial on Analyzing Social Network Data in Jupyter Python

Rezvaneh Rezapour, Samin Aref, Ly Dinh & Jana Diesner

PyNetworkshop is a hands-on tutorial on using network libraries in Jupyter for analyzing the structure of social networks. Social network analysis is a longstanding methods toolbox used to examine the structures of relations between social entities, which can represent individuals, groups, or organizations, among other entity types. After covering general preliminaries and essentials, this tutorial focuses on different methods for analyzing the structure of signed directed networks. Existing network metrics and models are flexible in that they can detect structural dynamics that exist at three fundamental levels of analysis, namely the micro, meso, and macro levels of networks. While several open-source tools for analyzing networks are available for Python, there is a need for a pipeline that guides scholars through a multilevel analysis of networks. This tutorial is based on recent methodological advancements at the intersection of social network analysis and graph optimization. The intended audience are researchers who use networks or plan to start using networks in their work. We do not assume any prior knowledge other than basic level of mathematics and basic familiarity with Jupyter Python (being able to run “Hello World!” in Jupyter).


Dealing with Bias and Fairness in Data Science Research

Kit Rodolfa, Pedro Saleiro & Rayid Ghani

Tackling issues of bias and fairness when building and deploying data science systems has received increased attention from the research community in recent years, yet a lot of the research has focused on theoretical aspects with a very limited set of application areas and data sets. Today, treating bias and fairness as primary metrics of interest, and building, selecting, and validating models using those metrics is not standard practice for data scientists. There is a lack of 1) practical training materials, 2) methodologies, and 3) tools for researchers and developers working on real-world algorithmic decision making system to deal with issues of bias and fairness. In this tutorial we will try to bridge the gap between research and practice, by deep diving into algorithmic fairness, from metrics and definitions to practical case studies, including bias audits using the Aequitas toolkit. By the end of this hands-on tutorial, the audience will be familiar with thinking through issues of bias in a practical project, selecting appropriate metrics, and apply bias audit and mitigation frameworks and tools to help them make design decisions when developing a real-world data science system. This tutorial is aimed at Data Scientists and practitioners from both the public and private sectors, and PhD students. Prerequisites: basic knowledge of classification models and evaluation methodologies.


Transparent CSS Research: Post-Project Sharing of Human Participant Data

Dessi Kirilova, Diana Kapiszewski & Colin Elman

Data generated through human encounters can be richly informative. Sharing such data at the article-publishing stage allows researchers to inform readers about the empirical basis of their published findings and allows other scholars to use those data to answer new questions. Over the last few decades, socio-technical infrastructure has been in development that empowers researchers both to protect their human participants and share with the scholarly community some of the information those participants conveyed. In the workshop, you will be encouraged to discuss the challenges that you have faced or anticipate facing in interacting with human participants on the topic of sharing the data your interactions generate. You will also be invited to discuss how you have approached mitigating those challenges, and how you and other CSS scholars could work to mitigate them. The conveners will also share recommended practices and recent ideas from the perspective of social science data repositories. [SLIDES]