Data mining in social science

2014-08-09 #data mining

Course Description

Data mining is a rapidly growing field in these years because of its ability to provide people powerful tools to extract meaningful information from a sea of data. However, harnessing the power of the data deluge could be challenging to social scientists because of its interdisciplinary nature.

This course covers the entire process of scientific data mining, including data collection, analysis, and visualization. The opportunity for hands-on Python programing practice on these tasks will be provided. The overall goal is to equip social science students with programming skills to complete research projects independently.

Course Completion

Students are divided into small groups of 2-3 members and are required to complete small data mining projects based on their own research interest.

Background Requirements

This course opens to both of undergraduate and graduated students. Students of zero programing background are welcomed.

Outline

Introduction

1.1 Beautiful Data and Human Behavior

1.2 An Introduction to Python Programming

Data Collection

2.1 Connecting to Twitter API

2.2 Scraping Articles from The New York Times

2.3 Processing the Large Dataset of Stack Exchange

2.4 Retrieving Raw Data from Statistical Figures

Data Analysis

3.1 Analyzing the Sentiment from Street Harassment Stories

3.2 Clustering Countries by National Constitutions

3.3 Determining Influential Papers in Citation Networks

3.4 Measuring the Difficulty of Questions in Q&A sites

3.5 Discovering the Global Value Chain behind Transaction Networks

3.6 Modeling the Growth of Cities Using Satellite Images

Data Visualization

4.1 Statistical Figures: Scatter Plot, Histogram, Time Series, Heat Map

4.2 Networks

4.3 Text and Maps

References

Programming Collective Intelligence: Building Smart Web 2.0 Applications

Building Machine Learning Systems with Python

Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age