Quantitative Text Analysis I

Prof. Sagarzazu's extensive use of illustrative examples was extremely helpful. They helped me learn and understand the different methods as well as how to practically apply them. — participant from Sourth Korea

This course provides participants with an introduction to quantitative text analysis methods that allows them to systematically extract information from texts. The course starts out with a brief introduction to more traditional approaches before quickly moving on to more recent methodological advances that treat words as data. After covering such important concepts in content analysis as content validity and inter-coder reliability, the course takes a closer look at manual hand-coding approaches. From there, it turns to computer-assisted dictionary-based text analysis techniques. The second half of the course focuses on sentiment analysis as well as the scaling technique Wordscores, two cutting-edge content analysis approaches that allow social scientists to automatically extract information from text. The course combines theoretical sessions with practical exercises that allow participants to immediately practice and apply their newly acquired skills.

This course is the first part in a two-course sequence. Part two (cf. Quantitative Text Analysis II) covers more advanced topics, such as unsupervised scaling and topic coding.


This course was offered in 2017 and 2018.


Iñaki Sagarzazu (picture), Texas Tech University

Detailed Description

This course provides participants with an applied introduction to basic methods of quantitative text analysis that allow them to systematically extract information from texts. The course starts with more traditional approaches, such as manual hand-coding, but quickly moves on to recent advances in social science methodology that treat words and text as data.

The course begins with important concepts in content analysis, such as content validity and inter-coder reliability. Afterwards, it takes a closer look at manual hand-coding approaches as employed by many famous research projects, e.g., the Comparative Manifesto Project, that have relied on human coders to code the content of a wide variety of texts according to a predefined category scheme. From there, the course moves to computer-assisted, dictionary-based text analysis techniques that employ computers to code the content of documents by relying on previously devised codebooks, which assigns individual words to specific thematic categories. In a next step, participants are introduced to various refinements to the dictionary approach, such as sentiment analysis and Wordscores. While the former approach allows for the study of attitudes or emotions in texts, the latter allows social scientists to automatically extract policy positions from political texts, such as election manifestos or speeches.

This course is an applied course for beginners and intermediate users of content analysis that provides participants with an overview of the theoretical foundations of quantitative text analysis, but which is mainly practical and applied so that participants learn how to use these methods in their own research. It combines theoretical sessions with practical exercises that allow participants to immediately apply the presented techniques.

This course is the first part in a two-course sequence. More advanced techniques of unsupervised scaling and topic coding are covered by the advanced Quantitative Text Analysis II course.


While there are no formal prerequisites, it would be beneficial if participants had some experience with the statistical software R and were familiar with basic statistical concepts. However, participants unfamiliar with these concepts and tools will be able to effectively participate in the course.


Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.

Core Readings

Krippendorff, Klaus H. 2013. Content Analysis: An Introduction to Its Methodology. 3rd edition. Thousand Oaks, CA: Sage Publications.

Kluver, Heike. 2009. Measuring Interest Group Influence Using Quantitative Text Analysis. European Union Politics 10: 535–549.

Laver, Michael, and John Garry. 2000. Estimating Policy Positions from Political Texts. American Journal of Political Science 44: 619–634.

Hu, Minqing, and Bing Liu. 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168-177.

Hart, Roderick P., and Jay P. Childers. 2005. The Evolution of Candidate Bush. A Rhetorical Analysis. American Behavioral Scientist 49: 180–197.

Laver, Michael, Kenneth Benoit, and John Garry. 2003. Extracting Policy Positions from Political Texts Using Word as Data. American Political Science Review 97: 311–331.

Klemmensen, Robert, Sara Binzer Hobolt, and Martin E. Hansen. 2007. Estimating Policy Positions Using Political Texts: An Evaluation of the Wordscores Approach. Electoral Studies 26: 746–755.

Benoit, Kenneth, and Michael Laver. 2003. Estimating Irish Party Policy Positions Using Computer Wordscoring. Irish Political Studies 18: 97–107.

Suggested Readings

Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks, CA: Sage Publications.

Alexa, Melina, and Cornelia Zull. 2000. Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review. Quality and Quantity 34: 299–321.

Feinerer, Ingo, Kurt Hornik, and David Meyer. 2008. Text Mining Infrastructure in R. Journal of Statistical Software 25: 1-54.

Lowe, Will, Ken Benoit, Slava Mikhaylov, and Michael Laver. 2011. Scaling Policy Positions from Coded Units of Political Texts. Legislative Studies Quarterly 36: 123–155.

Veen, Tim. 2011. Positions and Salience in European Union Politics: Estimation and Validation of a New Dataset. European Union Politics 12: 267–288.