MIDS
Course curriculum from Berkeley MIDS program. I’m trying to quantify how much of this I already know through:
- my undergrad (MechEngg) and grad classes (MSCS)
- work experience
- coursera classes
- self learning
(my comments are in monospace)
Research Design and Application for Data and Analysis
SKILL SETS Research design / Question formulation / Data and decision making / Understanding cognitive bias / Data for persuasion and action / Integrating data and domain knowledge / Storytelling with data
DESIGNED BY Steve Weber and Andy Brooks
This course introduces students to the burgeoning data sciences landscape, with a particular focus on learning how to apply data science reasoning techniques to uncover, enrich, and answer questions facing decision makers across a variety of industries and organizations today. After an introduction to data science and an overview of the program, students will explore how individuals and organizations assess options, make decisions, and probe the emerging role of big data in guiding both tactical and strategic decisions. Lectures, readings, discussions, and assignments will teach students how to apply disciplined, creative methods in order to ask better questions, efficiently gather data, interpret results, and convey findings to various audiences in order to change minds and behaviors. The emphasis throughout is on making practical contributions to decisions that organizations will and should make. Industries explored include sports management, finance, energy, journalism, intelligence, healthcare, and media/entertainment.
Statistics for Data Science
SKILL SETS Research design / Statistical analysis
TOOLS R
DESIGNED BY Coye Chesire and Paul Laskowski
The goal of this course is to provide students with an introduction to many different types of quantitative research methods and statistical techniques for analyzing data. We begin with a focus on measurement, inferential statistics, and causal inference. Then, we will explore a range of statistical techniques and methods using the open-source statistics language, R. We will use many different statistics and techniques for analyzing and viewing data, with a focus on applying this knowledge to real-world data problems. Topics in quantitative techniques include: descriptive and inferential statistics, sampling, experimental design, parametric and non-parametric tests of difference, ordinary least squares regression, and logistic regression.
I have finished [R programming](https://www.coursera.org/learn/r-programming) from coursera
Fundamentals of Data Engineering
SKILL SETS Analytics Solution Architectures / Data at Scale Concerns and Tradeoffs / Distributed Data Processing / Relational Databases / Graph Databases / Streaming Data Applications / Cube Technology
TOOLS Python / Relational databases / Hadoop / Map reduce / Spark / Cloud Computing (AWS)
DESIGNED BY Mark Mims and Taylor Martin
Storing, managing, and processing datasets are foundational to both applied computer science and data science. Indeed, successful deployment of data science in any organization is closely tied to how data is stored and processed. This course introduces the fundamentals of data storage, retrieval, and processing systems in the context of common data analytics processing needs. As these fundamentals are introduced, representative technologies will be used to illustrate how to construct storage and processing architectures. This course aims to provide a set of “building blocks” by which one can construct a complete architecture for storing and processing data. The course will examine how technical architectures vary depending on the problem to be solved and the reliability and freshness of the result.
The course considers the complete breadth of technology choices. The content spans from traditional databases and business warehouse architectures, so-called big-data architectures, to streaming analytics solutions and graph processing. Students will consider both small and large datasets because both are equally important and both justify different trade-offs. Exercises and examples will consider both simple and complex data structures, as well as data that is both clean and structured and dirty and unstructured.
Applied Machine Learning
SKILL SETS Experimental design / Working with machine learning algorithms / Feature engineering / Prediction vs. explanation / Network analysis / Collaborative filtering
TOOLS Python / Python libraries for linear algebra, plotting, machine learning: numpy, matplotlib, sk-learn / Github for submitting project code
DESIGNED BY Josh Blumenstock and Dan Gillick
Machine learning is a rapidly growing field at the intersection of computer science and statistics that is concerned with finding patterns in data. It is responsible for tremendous advances in technology, from personalized product recommendations to speech recognition in cell phones. The goal of this course is to provide a broad introduction to the key ideas in machine learning. The emphasis will be on intuition and practical examples rather than theoretical results, though some experience with probability, statistics, and linear algebra will be important. Through a variety of lecture examples and programming projects, students will learn how to apply powerful machine learning techniques to new problems, run evaluations and interpret results, and think about scaling up from thousands of data points to billions.
Data Visualization
SKILL SETS Exploratory data analysis / Effective written communication / Effective visual presentation of data / Design for human perception
TOOLS Tableau / Javascript / D3 / Illustrator / R/ggplot2 / Highcharts / Visit
DESIGNED BY Annette Greiner and Christopher Arnold
Data Visualization enhances exploratory analysis as well as efficient communication of data results. This course focuses on the design of visual representations of data in order to discover patterns, answer questions, convey findings, drive decisions, and provide persuasive evidence. The goal is to give you the practical knowledge you need to create effective tools for both exploring and explaining your data. Exercises throughout the course provide a hands-on experience using relevant programming libraries and software tools to apply research and design concepts learned.
Synthetic Capstone
SKILL SETS Project scoping, planning and management / Data acquisition and analysis / Communication / Teamwork / Influence in organizations / Design thinking for data science
DESIGNED BY Alex Marrs
In this capstone class, students will combine technical, analytic, interpretive, and social dimensions to design and execute a full data science project, developing their skills as data scientists with a focus on real-world applications and situations. The final project provides an opportunity to integrate all of the core skills and concepts learned throughout the program, and prepares students for long-term professional success in the field. It provides experience in formulating and carrying out a sustained, coherent, and influential course of work resulting in a tangible data science project using real-world data. Students are evaluated on their ability to collaboratively develop and communicate their work in both written and oral form.
The capstone is completed as a group/team project (3-4 students), and each project will focus on open, pre-existing secondary data. A robust listing of open datasets will be made available before the capstone course begins.