Home

mids

MIDS

Course curriculum from Berkeley MIDS program. I’m trying to quantify how much of this I already know through:

  1. my undergrad (MechEngg) and grad classes (MSCS)
  2. work experience
  3. coursera classes
  4. self learning

(my comments are in monospace)

Research Design and Application for Data and Analysis

SKILL SETS Research design / Question formulation / Data and decision making / Understanding cognitive bias / Data for persuasion and action / Integrating data and domain knowledge / Storytelling with data

DESIGNED BY Steve Weber and Andy Brooks

This course introduces students to the burgeoning data sciences landscape, with a particular focus on learning how to apply data science reasoning techniques to uncover, enrich, and answer questions facing decision makers across a variety of industries and organizations today. After an introduction to data science and an overview of the program, students will explore how individuals and organizations assess options, make decisions, and probe the emerging role of big data in guiding both tactical and strategic decisions. Lectures, readings, discussions, and assignments will teach students how to apply disciplined, creative methods in order to ask better questions, efficiently gather data, interpret results, and convey findings to various audiences in order to change minds and behaviors. The emphasis throughout is on making practical contributions to decisions that organizations will and should make. Industries explored include sports management, finance, energy, journalism, intelligence, healthcare, and media/entertainment.

Statistics for Data Science

SKILL SETS Research design / Statistical analysis

TOOLS R

DESIGNED BY Coye Chesire and Paul Laskowski

The goal of this course is to provide students with an introduction to many different types of quantitative research methods and statistical techniques for analyzing data. We begin with a focus on measurement, inferential statistics, and causal inference. Then, we will explore a range of statistical techniques and methods using the open-source statistics language, R. We will use many different statistics and techniques for analyzing and viewing data, with a focus on applying this knowledge to real-world data problems. Topics in quantitative techniques include: descriptive and inferential statistics, sampling, experimental design, parametric and non-parametric tests of difference, ordinary least squares regression, and logistic regression.

I have finished [R programming](https://www.coursera.org/learn/r-programming) from coursera

Fundamentals of Data Engineering

SKILL SETS Analytics Solution Architectures / Data at Scale Concerns and Tradeoffs / Distributed Data Processing / Relational Databases / Graph Databases / Streaming Data Applications / Cube Technology

TOOLS Python / Relational databases / Hadoop / Map reduce / Spark / Cloud Computing (AWS)

DESIGNED BY Mark Mims and Taylor Martin

Storing, managing, and processing datasets are foundational to both applied computer science and data science. Indeed, successful deployment of data science in any organization is closely tied to how data is stored and processed. This course introduces the fundamentals of data storage, retrieval, and processing systems in the context of common data analytics processing needs. As these fundamentals are introduced, representative technologies will be used to illustrate how to construct storage and processing architectures. This course aims to provide a set of “building blocks” by which one can construct a complete architecture for storing and processing data. The course will examine how technical architectures vary depending on the problem to be solved and the reliability and freshness of the result.

The course considers the complete breadth of technology choices. The content spans from traditional databases and business warehouse architectures, so-called big-data architectures, to streaming analytics solutions and graph processing. Students will consider both small and large datasets because both are equally important and both justify different trade-offs. Exercises and examples will consider both simple and complex data structures, as well as data that is both clean and structured and dirty and unstructured.

Applied Machine Learning

SKILL SETS Experimental design / Working with machine learning algorithms / Feature engineering / Prediction vs. explanation / Network analysis / Collaborative filtering

TOOLS Python / Python libraries for linear algebra, plotting, machine learning: numpy, matplotlib, sk-learn / Github for submitting project code

DESIGNED BY Josh Blumenstock and Dan Gillick

Machine learning is a rapidly growing field at the intersection of computer science and statistics that is concerned with finding patterns in data. It is responsible for tremendous advances in technology, from personalized product recommendations to speech recognition in cell phones. The goal of this course is to provide a broad introduction to the key ideas in machine learning. The emphasis will be on intuition and practical examples rather than theoretical results, though some experience with probability, statistics, and linear algebra will be important. Through a variety of lecture examples and programming projects, students will learn how to apply powerful machine learning techniques to new problems, run evaluations and interpret results, and think about scaling up from thousands of data points to billions.

Data Visualization

SKILL SETS Exploratory data analysis / Effective written communication / Effective visual presentation of data / Design for human perception

TOOLS Tableau / Javascript / D3 / Illustrator / R/ggplot2 / Highcharts / Visit

DESIGNED BY Annette Greiner and Christopher Arnold

Data Visualization enhances exploratory analysis as well as efficient communication of data results. This course focuses on the design of visual representations of data in order to discover patterns, answer questions, convey findings, drive decisions, and provide persuasive evidence. The goal is to give you the practical knowledge you need to create effective tools for both exploring and explaining your data. Exercises throughout the course provide a hands-on experience using relevant programming libraries and software tools to apply research and design concepts learned.

Synthetic Capstone

SKILL SETS Project scoping, planning and management / Data acquisition and analysis / Communication / Teamwork / Influence in organizations / Design thinking for data science

DESIGNED BY Alex Marrs

In this capstone class, students will combine technical, analytic, interpretive, and social dimensions to design and execute a full data science project, developing their skills as data scientists with a focus on real-world applications and situations. The final project provides an opportunity to integrate all of the core skills and concepts learned throughout the program, and prepares students for long-term professional success in the field. It provides experience in formulating and carrying out a sustained, coherent, and influential course of work resulting in a tangible data science project using real-world data. Students are evaluated on their ability to collaboratively develop and communicate their work in both written and oral form.

The capstone is completed as a group/team project (3-4 students), and each project will focus on open, pre-existing secondary data. A robust listing of open datasets will be made available before the capstone course begins.