We will continue to add relevant course descriptions as they are made available. Below you will find some of the core and elective classes that will be offered by our Computation faculty in 2016-17.

## Fall Quarter 2016

**Perspectives on Computational Analysis**

(MACS 30000). *Rick Evans and Benjamin Soltoff, M/W 11:30-12:50 p.m. Weekly Lab Tuesdays 5-5:50 p.m.*

Massive digital traces of human behavior and ubiquitous computation have both extended and altered classical social science inquiry. This course surveys successful social science applications of computational approaches to the representation of complex data, information visualization, and model construction and estimation. We will reexamine the scientific method in the social sciences in context of both theory development and testing, exploring how computation and digital data enables new answers to classic investigations, the posing of novel questions, and new ethical challenges and opportunities. Students will review fundamental research designs such as observational studies and experiments, statistical summaries, visualization of data, and how computational opportunities can enhance them. The focus of the course is on exploring the wide range of contemporary approaches to computational social science, with practical programming assignments to train with these approaches.

**Computing for the Social Sciences**

(MACS 30500). *Benjamin Soltoff, M/W 1:30-2:50 p.m. Weekly Lab Wednesdays 5-5:50 p.m.*

This is an applied course for social scientists with little programming experience who wish to use computational analysis in their research. After completion of this course, students will be able to write basic programs that fulfill their own research needs. Major topics to be covered include data wrangling, data exploration, functional programming, statistical modeling, and reproducible research. Students will also learn how to parse text files, scrape data from other sources, create and query relational databases, implement parallel processes, and manage digital projects. Class meetings will be a combination of lecture and laboratory sessions, and students will complete weekly programming assignments as well as a final research project. Assignments will be completed primarily using the open-source R and Python programming languages and the version control software Git.

**Computer Science with Applications - 1**

(CAPP 30121). *Anne Rogers and Borja Sotomayor, M/W/F 9-9:50 a.m.* *& Weekly Lab Mondays 6-7:20 p.m.*

This three-quarter sequence teaches computational thinking and skills to students who are majoring in the sciences, mathematics, and economics. Lectures cover topics in (1) programming, such as recursion, abstract data types, and processing data; (2) computer science, such as clustering methods, event-driven simulation, and theory of computation; and to a lesser extent (3) numerical computation, such as approximating functions and their derivatives and integrals, solving systems of linear equations, and simple Monte Carlo techniques. Applications from a wide variety of fields serve both as examples in lectures and as the basis for programming assignments. In recent offerings, students have written programs to evaluate betting strategies, determine the number of machines needed at a polling place, and predict the size of extinct marsupials. Students learn Java, Python, R and C++.

**Linear Algebra **

(MATH 19620). *Instructor TBD, Tu/Th 9-10:20 a.m.*

This course takes a concrete approach to the basic topics of linear algebra. Topics include vector geometry, systems of linear equations, vector spaces, matrices and determinants, and eigenvalue problems.

**Statistical Theory and Methods - 1**

(STAT 24400).* **Instructor TBD, T/Th 9-10:20 a.m. PQ: Multivariate calculus. Some previous experience with statistics and/or probability helpful but not required.*

This course is the first quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course covers tools from probability and the elements of statistical theory. Topics include the definitions of probability and random variables, binomial and other discrete probability distributions, normal and other continuous probability distributions, joint probability distributions and the transformation of random variables, principles of inference (including Bayesian inference), maximum likelihood estimation, hypothesis testing and confidence intervals, likelihood ratio tests, multinomial distributions, and chi-square tests. Examples are drawn from the social, physical, and biological sciences. The coverage of topics in probability is limited and brief, so students who have taken a course in probability find reinforcement rather than redundancy. Students who have already taken STAT 25100 may choose to take STAT 24410 (if offered) instead of STAT 24400.

**Analysis in Rn I**

(MATH 20300). I*nstructor TBD, M/W/F 10:30-11:20. PQ: MATH 16300 or MATH 15910 or MATH 15900 or MATH 19900.*

For students concentrating in Computational Economics who need exposure to real analysis. Students must be proficient in linear algebra. This course covers the construction of the real numbers, the topology of R^n including the Bolzano-Weierstrass and Heine-Borel theorems, and a detailed treatment of abstract metric spaces, including convergence and completeness, compact sets, continuous mappings, and more.

**Mathematical Methods for Biological Sciences - 1**

(PSYC 36210). *Dmitry Kondrasov, T/Th 1:30-2:50. Weekly Lab Fridays 3-4:50 p.m.*

This course builds on the introduction to modeling course biology students take in the first year (BIOS 20151 or 152). It begins with a review of one-variable ordinary differential equations as models for biological processes changing with time, and proceeds to develop basic dynamical systems theory. Analytic skills include stability analysis, phase portraits, limit cycles, and bifurcations. Linear algebra concepts are introduced and developed, and Fourier methods are applied to data analysis. The methods are applied to diverse areas of biology, such as ecology, neuroscience, regulatory networks, and molecular structure. The students learn computation methods to implement the models in MATLAB.

**Introduction to Spatial Data Science**

(MACS 54000). *Luc Anselin, M/W 1:30-2:50 p.m.*

Spatial data science is an evolving field that can be thought of as a collection of concepts and methods drawn from both statistics and computer science. These techniques deal with accessing, transforming, manipulating, visualizing, exploring and reasoning about data where the locational component is important (spatial data). The course introduces the types of spatial data relevant in social science inquiry and reviews a range of methods to explore these data. The types of data considered include observations at the point level (e.g., locations of crimes, commercial establishments, traffic accidents), data gathered for aggregate units, such as census tracts or counties (e.g., unemployment rates, disease rates by area, crime rates), and data measured at spatially located sampling points (such as air quality monitoring stations and urban sensors). Specific topics covered include the implementation of formal spatial data structures, geovisualization and visual analytics, spatial autocorrelation analysis, variogram analysis, cluster detection, regionalization, point pattern analysis and spatial data mining. An important aspect of the course is to learn and apply open source geospatial software tools, such as R and GeoDa.

**Economic Policy Analysis with Overlapping Generation Models **

(MACS 40000). *Rick Evans, T/Th 10:30-11:50 a.m.*

This course will study economic policy questions ideally addressed by the overlapping generations (OG) dynamic general equilibrium framework. OG models represent a rich class of macroeconomic general equilibrium model that is extremely useful for answering questions in which inequality, demographics, and individual heterogeneity are important. OG models are used extensively by the Joint Committee on Taxation, Congressional Budget Office, and Department of the Treasury. This course will train students how to set up and solve OG models. The standard nonlinear global solution method for these models--time path iteration--is a fixed point method that is similar to but significantly different from value function iteration. This course will take students through progressively richer versions of the model, which will include endogenous labor supply, nontrivial demographics, bequests, stochastic income, multiple industries, non-balanced government budget constraint, and household tax structure.

**Nonparametric Inference**

(STAT 37400). *John Lafferty, T/Th 10:30-11:50 a.m.*

Nonparametric inference is about developing statistical methods and models that make weak assumptions. A typical nonparametric approach estimates a nonlinear function from an infinite dimensional space, rather than a linear model from a finite dimensional space. This course gives an introduction to nonparametric inference, with a focus on density estimation, regression, confidence sets, orthogonal functions, random processes, and kernels. The course treats nonparametric methodology and its use, together with theory that explains the statistical properties of the methods.

**Theoretical Neuroscience: Single Neuron Dynamics and Computation**

(CPNS 35510). *Nicolas Brunel, T/Th 9-10:20 a.m.*

This course is the first part of a three-quarter sequence in theoretical/computational neuroscience. It will focus on mathematical models of single neurons. Topics will include: basic biophysical properties of neurons; Hodgkin-Huxley model for action potential generation; 2D models, phase-plane analysis and bifurcations leading to action potential generation; integrate-and-fire-type models; noise; characterization of neuronal activity with stochastic inputs; spatially extended models; models of synaptic currents and synaptic plasticity; unsupervised learning; supervised learning; reinforcement learning.

**Big Data**

(MPCS 53013). *Michael Spertus, time TBD. PQ: core programming, very basic programming skills in Java, and basic linux IT. Placement exam given on September 14th and 15th. Complete Course Request Form. *

In this course, we will cover both the theory and practice of Big Data. We will use technologies such as HDFS, Kafka, Storm, Cassandra, Pig, Thrift, MapReduce, and more to implement a running Big Data web application correlating all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight performance. To develop a sound understanding of the theory of Big Data, we will use Marz and Warren's Big Data textbook providing a conceptual architecture for Big Data systems. We will also cover important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc. Students are required to bring a laptop to class every week.

**Advanced Data Analytics**

(MPCS 53112). *Amitabh Chaudhry, time TBD. PQ: Students should satisfy prerequisites for Machine Learning. Equivalent courses or experience accepted with instructor permission. Further, students are expected to know or teach themselves programming in R and Hash tables. Placement exam given on September 14th and 15th. Complete Course Request Form. *

This course explores selected advanced themes in data mining and analytics. These include the recent ``model-free'' techniques for mining massive datasets, foundations of natural language processing, and time series analysis. Topics include frameworks such as MapReduce; algorithmic ideas such as locality-sensitive hashing, Bloom filters, random walks, and competitive analysis; and applications such as link analysis, social-network analysis, recommendation systems, streaming data, and advertising on the web. In natural language processing, the course introduces fundamentals of language models, text classification, and information retrieval and extraction. In time series analysis, the course examines stationary processes and the ARIMA and GARCH models.

**Algorithms**

(MPCS 55001). *Geraldine Brady, time TBD. PQ: immersion math (MPCS 50103) or placement. Immersion programming (MPCS 50101) or programming waiver, or core programming (MPCS 51036 or 51040), or instructor consent. Placement exam given on September 14th and 15th. Complete Course Request Form. *

The course is an introduction to the design and analysis of efficient algorithms, with emphasis on developing techniques for the design and rigorous analysis of algorithms rather than on implementation. Algorithmic problems include sorting and searching, discrete optimization, and algorithmic graph theory. Design techniques include divide-and-conquer methods, dynamic programming, greedy methods, graph search, as well as the design of efficient data structures. Methods of algorithm analysis include asymptotic notation, evaluation of recurrences, and the concepts of polynomial-time algorithms. NP-completeness is introduced toward the end the course. Students who complete the course will have demonstrated the ability to use divide-and-conquer methods, dynamic programming methods, and greedy methods, when an algorithmic design problem calls for such a method. They will have learned the design strategies employed by the major sorting algorithms and the major graph algorithms, and will have demonstrated the ability to use these design strategies or modify such algorithms to solve algorithm problems when appropriate. They will have derived and solved recurrences describing the performance of divide-and-conquer algorithms, have analyzed the time and space complexity of dynamic programming algorithms, and have analyzed the efficiency of the major graph algorithms, using asymptotic analysis.

**Databases**

(MPCS 53001). *Zachary Freeman, time TBD. PQ: core programming (completed or currently enrolled). Placement exam given on September 14th and 15th. **Complete Course Request Form.*

In this course students will learn database design and development and will build a simple but complete web application powered by a relational database. We start by gathering requirements and showing how to model a relational database using an Entity-Relationship Diagram (ERD). Concepts covered include entity sets and relationships, using keys as a unique identifier for each object in an entity set, one-one, many-one, and many-many relationships as well as translational rules from conceptual modeling (ERD) to relational table definitions. We will examine the relational model and functional dependencies along with their application to the methods for improving database design: normal forms and normalization. After designing and modeling their database, students will learn the universal language of relational databases: SQL (Structured Query Language). We will first introduce relational algebra, the theoretical foundation of SQL and then examine in detail the two main aspects of SQL: data definition language (DDL) and data manipulation language (DML). Concepts covered include subqueries, aggregation, various types of joins, functions, triggers and stored procedures. Students will then learn about web connectivity, as they build a simple front-end for their application in order to interact with their database online. Finally, we will provide an overview of related topics such as data warehousing, big data, NoSQL and NewSQL databases.

**Computational Social Science Workshop**

(MACS 50000).* James Evans, Th 5-6:30 p.m. Saieh 247. PQ: Computation students must register for a R. Other faculty and graduate students welcome.*

High performance and cloud computing, massive digital traces of human behavior from ubiquitous sensors, and a growing suite of efficient model estimation, machine learning and simulation tools are not just extending classical social science inquiry, but transforming it to pose novel questions at larger and smaller scales. The Computational Social Science (CSS) Workshop is a weekly event that features this work, highlights associated skills and data, and explores the use of CSS in the world. The CSS Workshop alternates weekly between research workshops and professional workshops. The research workshops feature new CSS work from top faculty and advanced graduate students from UChicago and around the world, while professional workshops highlight useful skills and data (e.g., machine learning with Python’s scikit-learn; the Twitter firehose API) and showcase practitioners using CSS in the government, industry and nonprofit sectors. Each quarter, the CSS Workshop also hosts a distinguished lecture, debate and dinner, and a student conference.

## Winter Quarter 2017

**Perspectives on Computational Modeling**

(MACS 30100). *Rick Evans and Benjamin Soltoff, M/W 11:30-12:50 p.m. Weekly Lab Wednesdays 5-6 p.m.*

Students are often well trained in the details of specific models relevant to their respective fields. This course presents a generic definition of a model in the social sciences as well as a taxonomy of the wide range of different types of models used. We then cover principles of model building, including static versus dynamic models, linear versus nonlinear, simple versus complicated, and identification versus overfitting. Major types of models implemented in this course include linear and nonlinear regression, machine learning (e.g., parametric, Bayesian and nonparametric), agent-based and structural models. We will also explore the wide range of computational strategies used to estimate models from data and make statistical and causal inference. Students will study both good examples and bad examples of modeling and estimation and will have the opportunity to build their own model in their field of interest.

**Computing for the Social Sciences**

(MACS 30500). *Benjamin Soltoff, M/W 1:30-2:50 p.m. Weekly Lab Mondays 4:30-5:20 p.m.*

This is an applied course for social scientists with little programming experience who wish to use computational analysis in their research. After completion of this course, students will be able to write basic programs that fulfill their own research needs. Major topics to be covered include data wrangling, data exploration, functional programming, statistical modeling, and reproducible research. Students will also learn how to parse text files, scrape data from other sources, create and query relational databases, implement parallel processes, and manage digital projects. Class meetings will be a combination of lecture and laboratory sessions, and students will complete weekly programming assignments as well as a final research project. Assignments will be completed primarily using the open-source R and Python programming languages and the version control software Git.

**Computer Science with Applications - 2**

(CAPP 30122). *Amitabh Chaudhary, M/W/F 9:30-10:20 a.m. Weekly Lab Tuesday 3-4:20, 4:30-5:50, or 6-7:20 p.m.*

This course is the second in a three-quarter sequence that teaches computational thinking and skills to students in the sciences, mathematics, economics, etc. Lectures cover topics in (1) data representation, (2) basics of relational databases, (3) shell scripting, (4) data analysis algorithms, such as clustering and decision trees, and (5) data structures, such as hash tables and heaps. Applications and datasets from a wide variety of fields serve both as examples in lectures and as the basis for programming assignments. In recent offerings, students have written a course search engine and a system to do speaker identification. Students will program in Python and do a quarter-long programming project.

**Structural Estimation**

(MACS 40200). *Rick Evans, T/Th 12-1:20 p.m.*

Structural estimation refers to the estimation of model parameters by taking a theoretical model directly to the data. (This is in contrast to reduced form estimation, which often entails estimating a linear model that is either explicitly or implicitly a simplified, linear version of a related theoretical model). This class will survey a range of structural models, then teach students estimation approaches including the generalized method of moments approach and maximum likelihood estimation. We will then examine the strengths and weaknesses of both approaches in a series of examples from the fields of economics, political science, and sociology. We will also learn the simulated method of moments approach. We will explore applications across the social sciences.

**Statistical Theory and Methods - 1**

(STAT 24400). *Instructor TBD, T/Th 9-10:20 a.m. PQ: Multivariate calculus. Some previous experience with statistics and/or probability helpful but not required.*

This course is the first quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course covers tools from probability and the elements of statistical theory. Topics include the definitions of probability and random variables, binomial and other discrete probability distributions, normal and other continuous probability distributions, joint probability distributions and the transformation of random variables, principles of inference (including Bayesian inference), maximum likelihood estimation, hypothesis testing and confidence intervals, likelihood ratio tests, multinomial distributions, and chi-square tests. Examples are drawn from the social, physical, and biological sciences. The coverage of topics in probability is limited and brief, so students who have taken a course in probability find reinforcement rather than redundancy. Students who have already taken STAT 25100 may choose to take STAT 24410 (if offered) instead of STAT 24400.

**Statistical Theory and Methods - 2**

(STAT 24500). *Instructor TBD, T/Th 9-10:20 a.m. PQ: Multivariate calculus and linear algebra and STAT 24400 or STAT 24410.*

This course is the second quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course continues from either STAT 24400 or STAT 24410 and covers statistical methodology, including the analysis of variance, regression, correlation, and some multivariate analysis. Some principles of data analysis are introduced, and an attempt is made to present the analysis of variance and regression in a unified framework. Statistical software is used.

**Analysis in Rn I**

(MATH 20300). *Instructor TBD, M/W/F 10:30-11:20. PQ: MATH 16300 or MATH 15910 or MATH 15900 or MATH 19900.*

For students concentrating in Computational Economics who need exposure to real analysis. Students must be proficient in linear algebra. This course covers the construction of the real numbers, the topology of R^n including the Bolzano-Weierstrass and Heine-Borel theorems, and a detailed treatment of abstract metric spaces, including convergence and completeness, compact sets, continuous mappings, and more.

**Analysis in Rn II**

(MATH 20400). *Instructor TBD. M/W/F 10:30-11:20. PQ: MATH 20700 OR MATH 20300 AND MATH 20250 or STAT 24300.*

For students concentrating in Computational Economics who have taken MATH 20300. This course covers differentiation in R^n including partial derivatives, gradients, the total derivative, the Chain Rule, optimization problems, vector-valued functions, and the Inverse and Implicit Function Theorems.

**Introduction to Causal Inference**

(MACS 51000). *Guanglei Hong, Kazuo Yamaguchi, and Fang Yang, Tuesdays 1:30-4:20 p.m.*

This course is designed for graduate students and advanced undergraduate students from the social sciences, education, public health science, public policy, social service administration, and statistics who are involved in quantitative research and are interested in studying causality. The goal of this course is to equip students with basic knowledge of and analytic skills in causal inference. Topics for the course will include the potential outcomes framework for causal inference; experimental and observational studies; identification assumptions for causal parameters; potential pitfalls of using ANCOVA to estimate a causal effect; propensity score based methods including matching, stratification, inverse-probability-of-treatment-weighting (IPTW), marginal mean weighting through stratification (MMWS), and doubly robust estimation; the instrumental variable (IV) method; regression discontinuity design (RDD) including sharp RDD and fuzzy RDD; difference in difference (DID) and generalized DID methods for cross-section and panel data, and fixed effects model. Intermediate Statistics or equivalent such as STAT 224 is a prerequisite. This course is a pre-requisite for “Advanced Topics in Causal Inference” and “Mediation, moderation, and spillover effects.”

**Computational Content Analysis**

(MACS 60000). *James Evans. Fridays 1-3:50 p.m.*

A vast expanse of information about what people do, know, think, and feel lies embedded in text, and more of the contemporary social world lives natively within electronic text than ever before. These textual traces range from collective activity on the web, social media, instant messaging and automatically transcribed YouTube videos to online transactions, medical records, digitized libraries and government intelligence. This supply of text has elicited demand for natural language processing and machine learning tools to filter, search, and translate text into valuable data. The course will survey and practically apply many of the most exciting computational approaches to text analysis, highlighting both supervised methods that extend old theories to new data and unsupervised techniques that discover hidden regularities worth theorizing. These will be examined and evaluated on their own merits, and relative to the validity and reliability concerns of classical content analysis, the interpretive concerns of qualitative content analysis, and the interactional concerns of conversation analysis. We will also consider how these approaches can be adapted to content beyond text, including audio, images, and video. We will simultaneously review recent research that uses these approaches to develop social insight by exploring (a) collective attention and reasoning through the content of communication; (b) social relationships through the process of communication; and (c) social states, roles, and moves identified through heterogeneous signals within communication.

The course is structured around gaining understanding and experimenting with text analytical tools, deploying those tools and interpreting their output in the context of individual research projects, and assessment of contemporary research within this domain. Class discussion and assignments will focus on how to use, interpret, and combine computational techniques in the context of compelling social science research investigations.

**Mathematical Methods for Biological Sciences - 2**

(PSYC 36211). *Dmitry Kondrashov, T/Th 1:30-2:50 p.m. Weekly Lab Fridays 3-5 p.m. PQ: PSYC 36210.*

This course is a continuation of PSYC 36210. The topics start with optimization problems, such as nonlinear least squares fitting, principal component analysis and sequence alignment. Stochastic models are introduced, such as Markov chains, birth-death processes, and diffusion processes, with applications including hidden Markov models, tumor population modeling, and networks of chemical reactions. In computer labs, students learn optimization methods and stochastic algorithms, e.g., Markov Chain, Monte Carlo, and Gillespie algorithm. Students complete an independent project on a topic of their interest.

**Methods in Computational Neuroscience**

(CPNS 34231). *Silvan Bensmaia, T/Th 3:30-4:50, F 1:30-2:30. Weekly Lab Wednesdays or Fridays 9:30-11:20. PQ: PSYC 36210 and PSYC 36211 which must be taken concurrently, or consent of instructor.*

Topics include (but are not limited to): Hodgkin-Huxley equations, Cable theory, Single neuron models, Information theory, Signal Detection theory, Reverse correlation, Relating neural responses to behavior, and Rate vs. temporal codes.

**Theoretical Neuroscience: Network Dynamics and Computation**

(CPNS 35520). *Nicolas Brunel, T/Th 9-10:20 a.m.*

This course is the second part of a three-quarter sequence in theoretical/computational neuroscience. It will focus on mathematical models of networks of neurons. Topics will include: firing rate models for populations of neurons; spatially extended firing rate models; models of visual cortex; models of brain networks at different levels; characterization of properties of specific brain networks; models of networks of binary neurons, mean rates, correlations, reductions to rate models; learning in networks of binary neurons, associative memory models; models of networks of spiking neurons: asynchronous vs synchronous states; oscillations in networks of spiking neurons; learning in networks of spiking neurons; models of working memory; models of decision-making.

**Machine Learning**

(CMSC 25400). *Risi Kondor, M/W 3-4:20 p.m.*

This course offers a practical, problem-centered introduction to machine learning. Topics covered include the Perceptron and other online algorithms; boosting; graphical models and message passing; dimensionality reduction and manifold learning; SVMs and other kernel methods; and a short introduction to statistical learning theory. Weekly programming assignments give students the opportunity to try out each learning algorithm on real world datasets.

**Advanced Machine Learning for Public Policy**

(CAPP 30255). *Amitabh Chaudhary, M/W 1:30-2:50 p.m.*

In this course we apply advanced machine learning techniques to application areas common in policy analysis, focusing on text mining and network analysis, and emphasizing computational efficiency and software development. In text mining, we use Markov chains to succinctly represent language models, term frequency and inverse document frequency to find the most relevant documents for information retrieval, regular expressions to extract specific information in a document, and specialized supervised and unsupervised techniques for sentiment analysis. In network analysis we see, e.g., how random walks are used to rank nodes based on their "importance” (Google’s PageRank), how graph clustering helps identify sub-communities in a social network, and how the Apriori algorithm can quick discover frequently occurring subgraphs. In addition, we study a couple of other advanced machine learning topics based on student interest. Possible choices are optimization, dimensionality reduction, online learning, reinforcement learning, artificial neural networks, and large-scale data mining. A major component of the course is a quarter long project in which students build a prototype system for solving a real-world problem.

**Machine Learning and Public Policy**

(CAPP 30256). *Rayid Ghani**, Tuesdays 2-4:50 p.m.*

This is a hands-on lab course where students will work on real-world policy projects using machine learning and data science methods. The projects will be in collaboration with government agencies and non-profits spanning policy areas including education, health, environment, economic development, public safety, and criminal justice. Students will work in small teams and will meet 1-2 times/week to present updates and discuss challenges. This course will build on materials covered in the Machine Learning and Public Policy course and will be run in parallel to the Advanced Machine Learning and Public Policy course.

**Foundations of Computational Data Analysis**

(MPCS 53110). *Geraldine Brady, Time TBD. PQ: Core Programming. B or better MPCS 50103 or passing score on math placement exam. Non-MPCS student must meet prerequsites and complete c**omplete Course Request Form.*

This course covers basic statistics and linear algebra, and programming in R. Topics in statistics include discrete and continuous random variables, discrete and continuous probability distributions, variance, covariance, correlation, sampling and distribution of the mean and standard deviation of a sample, central limit theorem, confidence intervals, maximum likelihood estimators, hypothesis testing, linear and multiple regression. Topics in linear algebra include Gaussian elimination, matrix transpose and matrix inverse, eigenvectors and eigenvalues, singular value decomposition.

**Computational Social Science Workshop**

(MACS 50000).* James Evans, Thursdays 5-6:30 p.m. Saieh 247. PQ: Computation students must register for a R. Other faculty and graduate students welcome.*

High performance and cloud computing, massive digital traces of human behavior from ubiquitous sensors, and a growing suite of efficient model estimation, machine learning and simulation tools are not just extending classical social science inquiry, but transforming it to pose novel questions at larger and smaller scales. The Computational Social Science (CSS) Workshop is a weekly event that features this work, highlights associated skills and data, and explores the use of CSS in the world. The CSS Workshop alternates weekly between research workshops and professional workshops. The research workshops feature new CSS work from top faculty and advanced graduate students from UChicago and around the world, while professional workshops highlight useful skills and data (e.g., machine learning with Python’s scikit-learn; the Twitter firehose API) and showcase practitioners using CSS in the government, industry and nonprofit sectors. Each quarter, the CSS Workshop also hosts a distinguished lecture, debate and dinner, and a student conference.

## Spring Quarter 2017

**Perspectives on Computational Research**

(MACS 30200). *Rick Evans and Benjamin Soltoff, M/W 11:30-12:50 p.m.*

This course focuses on applying computational methods to conducting social scientific research through a student-developed research project. Students will identify a research question of their own interest, collect data, develop, apply, and interpret statistical learning models, and generate a fully reproducible research paper. We will identify how computational methods can be used throughout the research process, from data collection and tidying, to exploration, visualization and modeling, to the final communication of results. The course will include modules on theoretical and practical considerations, including topics such as epistemological questions about research design, identifying data sources, and IRB review.

**Computer Science with Applications-3**

(CAPP 30123). *Matthew Wachs. M/W/F 9:30-10:20 a.m. Weekly Lab Mondays 3-4:20 or 4:30-5:50.*

This three-quarter sequence teaches computational thinking and skills to students who are majoring in the sciences, mathematics, and economics. Lectures cover topics in (1) programming, such as recursion, abstract data types, and processing data; (2) computer science, such as clustering methods, event-driven simulation, and theory of computation; and to a lesser extent (3) numerical computation, such as approximating functions and their derivatives and integrals, solving systems of linear equations, and simple Monte Carlo techniques. Applications from a wide variety of fields serve both as examples in lectures and as the basis for programming assignments. In recent offerings, students have written programs to evaluate betting strategies, determine the number of machines needed at a polling place, and predict the size of extinct marsupials. Students learn Java, Python, R and C++.

**Data Visualization**

(MACS 40700). *Benjamin Soltoff, M/W 1:30-2:50 p.m.*

Social scientists frequently wish to convey information to a broader audience in a cohesive and interpretable manner. Visualizations are an excellent method to summarize information and report analysis and conclusions in a compelling format. This course introduces the theory and applications of data visualization. Students will learn techniques and methods for developing rich, informative and interactive, web-facing visualizations based on principles from graphic design and perceptual psychology. Students will practice these techniques on many types of social science data, including multivariate, temporal, geospatial, text, hierarchical, and network data. These techniques will be developed using a variety of software implementations such as R, ggplot2, D3, and Tableau.

**Statistical Theory and Methods - 2**

(STAT 24500). *Chao Gao, T/Th 9-10:20 a.m. PQ: STAT 24400 w/ grade of B- or better, or STAT 24410, w/ grade of C+ or better; and MATH 19620 or 20250 or 25500 or 25800 or STAT 24300.*

This course is the second quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course continues from either STAT 24400 or STAT 24410 and covers statistical methodology, including the analysis of variance, regression, correlation, and some multivariate analysis. Some principles of data analysis are introduced, and an attempt is made to present the analysis of variance and regression in a unified framework. Statistical software is used.

**Advanced Topics in Causal Inference**

(MACS 52000). *Guanglei Hong, Kazuo Yamaguchi, and Fang Yang, Tuesdays 1:30-4:20 p.m.*

This course provides an in-depth discussion of selected topics in causal inference that are beyond what are covered in the introduction to causal inference course. The course is intended for graduate students and advanced undergraduate students who have taken the intro course and want to extend their knowledge in causal inference. Topics include (1) alternative matching methods, randomization inference for testing hypothesis and sensitivity analysis; (2) marginal structural models and structural nested models for time-varying treatment; (3) Rubin Causal Model (RCM) and Heckman’s scientific model of causality; (4) latent class treatment variable; (5) measurement error in the covariates; (6) the M-estimation for the standard error of the treatment effect for the use of IPW; (7) the local average treatment effect (LATE) and its problems, sensitivity analysis to examine the impact of plausible departure from the IV assumptions, and identification issues of multiple IVs for multiple/one treatments; (8) Multi-level data for treatment evaluation for multilevel experimental designs and observational designs, and spilt-over effect; (9) Nonignorable missingness and informative censoring issues.

**Optimization**

(STAT 28000).* Lek-Heng Lim, T/Th 3-4:20 p.m.*

This is an introductory course on optimization that will cover the rudiments of unconstrained and constrained optimization of a real-valued multivariate function. The focus is on the settings where this function is, respectively, linear, quadratic, convex, or differentiable. Time permitting, topics such as nonsmooth, integer, vector, and dynamic optimization may be briefly addressed. Materials will include basic duality theory, optimality conditions, and intractability results, as well as algorithms and applications.

**Analysis in Rn-1**

(MATH 20300.) *Daniil Rudenko, M/W/F 11:30-12:20 p.m. PQ: MATH 16300 or MATH 15910 or MATH 15900 or MATH 19900.*

For students concentrating in Computational Economics with no prior exposure to Real Analysis. Both theoretical and problem solving aspects of multivariable calculus are treated carefully. This course covers the construction of the real numbers, the topology of R^n including the Bolzano-Weierstrass and Heine-Borel theorems, and a detailed treatment of abstract metric spaces, including convergence and completeness, compact sets, continuous mappings, and more.

**Analysis in Rn - 2**

(MATH 20400). *Marco Mendez Guaraco, M/W/F 11:30-12:20 p.m. PQ: MATH 20700 OR MATH 20300 AND MATH 20250 or STAT 24300.*

For students concentrating in Computational Economics who have taken MATH 20300 or who have prior exposure to Real Analysis. This course covers differentiation in R^n including partial derivatives, gradients, the total derivative, the Chain Rule, optimization problems, vector-valued functions, and the Inverse and Implicit Function Theorems.

**Analysis In Rn-3**

(MATH 20500). *Marco Mendez Guaraco. M/W/F 10:30-11:20 a.m. PQ: MATH 20400 or MATH 20800.*

For students concentrating in Computational Economics with excellent exposure to Real Analysis. This course covers integration in R^n including Fubini's Theorem and iterated integration, line and surface integrals, differential forms, and the theorems of Green, Gauss, and Stokes.

**Spatial Regression Analysis**

(MACS 55000). *Luc Anselin, M/W 1:30-2:50 p.m. PQ: graduate level econometrics or multivariate regression, matrix algebra.*

This course covers statistical and econometric methods specifically geared to the problems of spatial dependence and spatial heterogeneity in cross-sectional data. The main objective of the course is to gain insight into the scope of spatial regression methods, to be able to apply them in an empirical setting, and to properly interpret the results of spatial regression analysis. While the focus is on spatial aspects, the types of methods covered have general validity in statistical practice. The course covers the specification of spatial regression models in order to incorporate spatial dependence and spatial heterogeneity, as well as different estimation methods and specification tests to detect the presence of spatial autocorrelation and spatial heterogeneity. Special attention is paid to the application to spatial models of generic statistical paradigms, such as Maximum Likelihood, Generalized Methods of Moments and the Bayesian perspective. An important aspect of the course is the application of open source software tools such as R, GeoDa and PySal to solve empirical problems.

**Machine Learning and Large Scale Data Analysis **

(CMSC 25025). *John Lafferty, T/Th 1:30-2:50 p.m. Weekly Lab Wednesdays 1:30-2:50, 3-4:20, 4:30-5:50. PQ: CMSC 15400 or CMSC 12200 and STAT 22000 or STAT 23400, or by consent.*

This course is an introduction to machine learning and the analysis of large data sets using distributed computation and storage infrastructure. Basic machine learning methodology and relevant statistical theory will be presented in lectures. Homework exercises will give students hands-on experience with the methods on different types of data. Methods include algorithms for clustering, binary classification, and hierarchical Bayesian modeling. Data types include images, archives of scientific articles, online ad clickthrough logs, and public records of the City of Chicago. Programming will be based on Python and R, but previous exposure to these languages is not assumed.

**Machine Learning for Public Policy**

(CAPP 30254). *Rayid Ghani, T/Th 10:30-11:50 a.m. Weekly Lab Wednesdays 3-4:20, 4:30-5:50. PQ: PPHA 31100 or PPHA 31300 and CAPP 30122 or PPHA 30550.*

This course will be an introduction to machine learning and how it can be applied to public policy problems. It’s designed for students who are interested in learning how to use modern, scalable, computational data analysis methods and tools, and apply them to social and policy problems. This course will teach students: what role machine learning can play in designing, implementing, evaluating, and improving public policy; machine Learning methods and tools; how to solve policy problems using machine learning methods and tools. This is a hands-on course where students will be expected to use Python (as well as other computational tools) to implement solutions to various policy problems. We will cover supervised and unsupervised learning algorithms and will learn how to use them with data from a variety of public policy problems in areas such as education, public health, sustainability, economic development, and public safety.

**Machine Learning**

(CMSC 35400). *Imre Kondor. T/Th 3-4:20 p.m. PQ: Instructor consent.*

This course provides hands-on experience with a range of contemporary machine learning algorithms, as well as an introduction to the theoretical aspects of the subject. Topics covered include: the PAC framework, Bayesian learning, graphical models, clustering, dimensionality reduction, kernel methods including SVMs, matrix completion, neural networks, and an introduction to statistical learning theory.

**Databases for Public Policy**

(CAPP 30235). *Aaron Elmore. T/Th 1:30-2:50 p.m. PQ: CAPP 30122.*

The course will cover the foundations of Database Management Systems (DBMS). This includes data models, database design, SQL, core database system components (e.g. transactions, recovery, query processing), distributed databases, NewSQL/NoSQL, and systems for data analytics (e.g. column-orientated databases, data warehouses). The goals for this class are for you to have the ability to model and design a database, an understanding of the core components of a database management system, the ability to write SQL, and an understanding of the differences between databases and data models.

**Computational Approaches to Cognitive Neuroscience**

(PSYC 34410). *Nicholas Hatsopoulos, T/Th 3:30-4:50 p.m. PQ: BIOS 24231 or CPNS 33100.*

This course is concerned with the relationship of the nervous system to higher order behaviors (e.g., perception, object recognition, action, attention, learning, memory, and decision making). Psychophysical, functional imaging, and electrophysiological methods are introduced. Mathematical and statistical methods (e.g. neural networks and algorithms for studying neural encoding in individual neurons and decoding in populations of neurons) are discussed. Weekly lab sections allow students to program cognitive neuroscientific experiments and simulations.

**Modeling and Signal Analysis for Neuroscientists**

(CPNS 32111). *Wim Van Drongelen, M/W 1:30-2:50 p.m. Weekly Lab TBD. PQ: BIOS 26210, BIOS 26211 or instructor approval.*

The course provides an introduction into signal analysis and modeling for neuroscientists. We cover linear and nonlinear techniques and model both single neurons and neuronal networks. The goal is to provide students with the mathematical background to understand the literature in this field, the principles of analysis and simulation software, and allow them to construct their own tools. Several of the 90-minute lectures include demonstrations and/or exercises in Matlab.

**Algorithms**

(MPCS 55001). *Geraldine Brady, Tuesday 5:30-8:30 p.m. PQ: immersion math (MPCS 50103) or placement. Immersion programming (MPCS 50101) or programming waiver, or core programming (MPCS 51036 or 51040), or instructor consent.*

The course is an introduction to the design and analysis of efficient algorithms, with emphasis on developing techniques for the design and rigorous analysis of algorithms rather than on implementation. Algorithmic problems include sorting and searching, discrete optimization, and algorithmic graph theory. Design techniques include divide-and-conquer methods, dynamic programming, greedy methods, graph search, as well as the design of efficient data structures. Methods of algorithm analysis include asymptotic notation, evaluation of recurrences, and the concepts of polynomial-time algorithms. NP-completeness is introduced toward the end the course. Students who complete the course will have demonstrated the ability to use divide-and-conquer methods, dynamic programming methods, and greedy methods, when an algorithmic design problem calls for such a method. They will have learned the design strategies employed by the major sorting algorithms and the major graph algorithms, and will have demonstrated the ability to use these design strategies or modify such algorithms to solve algorithm problems when appropriate. They will have derived and solved recurrences describing the performance of divide-and-conquer algorithms, have analyzed the time and space complexity of dynamic programming algorithms, and have analyzed the efficiency of the major graph algorithms, using asymptotic analysis.

**Databases**

(MPCS 53001). *Zachary Freeman, Thursday 5:30-8:30 p.m. PQ: MPCS 51036 or 51040 or 51100 (completed or concurrently enrolled). Non-MPCS students must meet prerequisites and c**omplete Course Request Form.*

Students will learn database design and development and will build a simple but complete web application powered by a relational database. We start by showing how to model relational databases using the prevailing technique for conceptual modeling -- Entity-Relationship Diagrams (ERD). Concepts covered include entity sets and relationships, entity key as a unique identifier for each object in an entity set, one-one, many-one, and many-many relationships as well as translational rules from conceptual modeling (ERD) to relational table definitions. We also examine the relational model and functional dependencies and their application to the methods for improving database design: normal forms and normalization. After design and modeling, students will learn the universal language of relational databases: SQL (Structured Query Language). We start by introducing relational algebra -- the theoretical foundation of SQL. Then we examine in detail the two aspects of SQL: data definition language (DDL) and the data manipulation language (DML). Concepts covered include subqueries (correlated and uncorrelated), aggregation, various types of joins including outer joins and syntax alternatives. Students will gain significant experience with writing and reading SQL queries throughout the course in the detailed discussions in class, online homework, and the real-world individual project.

**Machine Learning**

(MPCS 53111). *Amitabh Chaudhary, Friday 5:30-8:30 p.m. PQ: B+ or above MPCS 51036 or 51040 or 51100. B or above in MPCS 55001. B or above in MPCS 53110 (or placement exam). If concurrently taking MPCS 55001 then a B+ or better in MPCS 50103. If your grades in the above classes do not meet the minimum requirements set above, please contact the instructor to discuss your background.*

This course introduces the fundamental concepts and techniques in data mining, machine learning, and statistical modeling, and the practical know- how to apply them to real-world data through Python-based software. The course examines in detail topics in both supervised and unsupervised learning. These include linear and logistic regression and regularization; classi cation using decision trees, nearest neighbors, naive Bayes, boosting, random trees, and arti cial neural networks; clustering using k-means, expectation-maximization, hierarchical approaches, and density-based techniques; and dimensionality reduction through PCA and SVD. Students use Python and Python libraries such as NumPy, SciPy, matplotlib, and pandas for for implementing algorithms and analyzing data.

**Time Series Analysis and Stochastic Processes**

(MPCS 58020).* Andrew Siegel, Monday 5:30-8:30 p.m. PQ: MPCS 51036 or 51040 or 51100. Non-MPCS students must meet prerequisites and c**omplete Course Request Form.*

Stochastic processes are driven by random events. They can be used to model phenomena in a broad range of disciplines, including science/engineering (e.g. computational physics, chemistry, and biology), business/finance (e.g. investment models and operations research), and computer systems (e.g. client/server workloads and resilience modeling). In many cases relatively simple stochastic simulations can provide estimates for problems that are difficult or impossible to model with closed-form equations. In this class we focus on the rudimentary ideas and techniques that underlie stochastic time series analysis, discrete events modeling, and Monte Carlo simulations. Course lectures will focus on the basic principles of probability theory, their efficient implementation on modern computers, and examples of their application to real world problems. Upon completion of the course, students should have an adequate background to quickly learn in depth specific Monte Carlo approaches in their chosen field of interest.

**Computational Social Science Workshop**

(MACS 50000). *James Evans, Th 5-6:30 p.m. Saieh 247. PQ: Computation students must register for a R. Other faculty and graduate students welcome.*

High performance and cloud computing, massive digital traces of human behavior from ubiquitous sensors, and a growing suite of efficient model estimation, machine learning and simulation tools are not just extending classical social science inquiry, but transforming it to pose novel questions at larger and smaller scales. The Computational Social Science (CSS) Workshop is a weekly event that features this work, highlights associated skills and data, and explores the use of CSS in the world. The CSS Workshop alternates weekly between research workshops and professional workshops. The research workshops feature new CSS work from top faculty and advanced graduate students from UChicago and around the world, while professional workshops highlight useful skills and data (e.g., machine learning with Python’s scikit-learn; the Twitter firehose API) and showcase practitioners using CSS in the government, industry and nonprofit sectors. Each quarter, the CSS Workshop also hosts a distinguished lecture, debate and dinner, and a student conference.