dataflow

Created: by Pradeep Gowda Updated: Sep 22, 2023 Tagged: dataflow

Hydro - Build Software for Every Scale ; via HN > The Hydro Project at UC Berkeley is developing cloud-native programming models that allow anyone to develop scalable and resilient distributed applications that take full advantage of cloud elasticity. Our research spans across databases, distributed systems, and programming languages to deliver a modern, end-to-end stack for cloud programming.

“what is it? and what can I do with it?” >  in contrast with Beam, Spark, etc. where the runtime deployment management/coordination is perhaps the biggest most important part of the product. Hydroflow aims to be a lot lower-level than that, with no opinions on coordination and networking. For example we’d want it to be possible to implement the coordination mechanisms one of those systems (Beam, Spark, MapReduce, Materialize) in Hydroflow. We do have a tool, Hydro Deploy, to setup and manage Hydro cluster deployment on GCP (other clouds later), but it’s mainly for running experiments and not long-running applications. The long term idea is that the Hydro stack will determine how to distribute and scale programs as part of the compilation process. Some of that will be rewriting single-node Hydroflow programs into multi-node ones.

Related tech: Spring Reactor, Apache Beam.

Papers

  • Naiad: A Timely dataflow system Murray, Derek G, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. “Naiad: A timely dataflow system,” 2013.