Data Science
A Subjective and Anecdotal FAQ on Becoming a Data Scientist  tdhopper.com
Learning resources
 Example Machine Learning Notebook.ipynb – a Jupyter notebook. See also zeppelin – notebook for interactive data analytics.
 Continually updated data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe), scikitlearn, Kaggle, Spark, Hadoop MapReduce, HDFS, matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Data Science for Doofuses: What Toolbox to Use  CyberSmashup a little primer on what tools to use.

Foundations of Data Science Boot Camp  Simons Institute for the Theory of Computing
Courses
Data Analysis

2018 BE/Bi 103 home Data Analysis in the Biological Sciences, Caltech, Fall term, 2018. Uses Python, Jupyter etc., Looks good!

Introduction to data analysis. stat405. by Hadley Wickham of ggplot2 fame. Fall 2012, Rice University :  lecture notes  R code  data sets
 Data Science Courses  Harvard Extension Online and OnCampus Courses
Data Visualization

Stat645. Data visualisation. at Rice by Hadley Wickham :  Reading papers  Code heavy assignemnts and projects  Heavy use of ggplot2

hon322f. Escape from flatland at Rice by Hadley Wickham :  explore data visually  think in more than 3 dimensions  interactive graphics software

Applied Statistical computing at Rice Hadley Wickham :  how to deal with complex, messy, real data  Use graphics to explore and understand data  Gain familiarity with basic data collection, storage and manipulation.  Fluently reshape data into the most convenient form for analysis or reporting  Uses Excel, R and SAS

… and a lot of short courses by Hadley Wickham
 Presenting Data and Information workshops by Tufte [PAID]
Best practices
Tutorials
 Data Analytics for Beginners : Part 1,2 and 3 using the Titanic dataset from Kaggle.
Tools
 Emacs for Data Science

Graphlab interactive python console with pandas, numpy, graphlab engine in a hosted environment. Graphlab engine implements clustering, CV, graphical models, graph analysis etc.,

Datamash – is a commandline program which performs basic numeric,textual and statistical operations on input textual data files. Datamash manual.
Libraries
 Deedle: Exploratory data library for .NET
 Seaborn: Improved matplotlib for statistical data visualization (like ggplot2)
Reading List
 Building data science teams – DJ Patil
 Some ideas on communicating risks to the general public
 My Amazon wishlist of datascince books
Papers
 Tidy data (pdf) how to create tidy datasets; how to deal with untidy ones.
List of related things
 CS 19416 – an exhaustive list of datascience resources.
 Data science glossary
Blogs
 The Unofficial Google Data Science Blog
 correlation  What happens if the explanatory and response variables are sorted independently before regression?  Cross Validated
 (1) Sean McClure’s answer to Why are there so many fake data scientists and machine learning engineers?  Quora
 Chris Albon  Data Science, Machine Learning, and Artificial Intelligence
 Towards Data Science
