Course Description
Data mining is a rapidly growing field in these years because of its ability to provide people powerful tools to extract meaningful information from a sea of data. However, harnessing the power of the data deluge could be challenging to social scientists because of its interdisciplinary nature.
This course covers the entire process of scientific data mining, including data collection, analysis, and visualization. The opportunity for hands-on Python programing practice on these tasks will be provided. The overall goal is to equip social science students with programming skills to complete research projects independently.
Course Completion
Students are divided into small groups of 2-3 members and are required to complete small data mining projects based on their own research interest.
Background Requirements
This course opens to both of undergraduate and graduated students. Students of zero programing background are welcomed.
Outline
- Introduction
1.1 Beautiful Data and Human Behavior
1.2 An Introduction to Python Programming
- Data Collection
2.1 Connecting to Twitter API
2.2 Scraping Articles from The New York Times
2.3 Processing the Large Dataset of Stack Exchange
2.4 Retrieving Raw Data from Statistical Figures
- Data Analysis
3.1 Analyzing the Sentiment from Street Harassment Stories
3.2 Clustering Countries by National Constitutions
3.3 Determining Influential Papers in Citation Networks
3.4 Measuring the Difficulty of Questions in Q&A sites
3.5 Discovering the Global Value Chain behind Transaction Networks
3.6 Modeling the Growth of Cities Using Satellite Images
- Data Visualization
4.1 Statistical Figures: Scatter Plot, Histogram, Time Series, Heat Map
4.2 Networks
4.3 Text and Maps
References
Programming Collective Intelligence: Building Smart Web 2.0 Applications
Building Machine Learning Systems with Python
Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age