- Google Flatbuffers as a faster, smaller, better typed serialization format.
Distributed data processing
- Apache Flink: Scalable Stream and Batch Data Processing. See Flink for notes.
- Apache Beam: Portable and Parallel Data Processing (Google Cloud Next ‘17) - YouTube by Frances Perry
- Smile - Statistical Machine Intelligence and Learning Engine
Graph data processing
- Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery (PDF). VLDB 2014.
- Grail The case against specialized graph analytics engines (PDF)
- Impact of Social Sciences – Big data problems we face today can be traced to the social ordering practices of the 19th century.
- The Rise of the Data Engineer – Medium; Jan 2017.
- Jane Street Tech Blog - How to shuffle a big dataset; Sept 2018.
- Uber’s Big Data Platform: 100+ Petabytes with Minute Latency | Uber Engineering Blog
- Edwin Chen’s Blog – good explnations of many ML concepts. e.g – Introduction to Latent Dirichlet Allocation