Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

TitleDiscretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
Publication TypeConference Paper
Year of Publication2012
AuthorsZaharia, M., Das T., Li H., Shenker S. J., & Stoica I.
Page(s)1-6
Other Numbers3352
Abstract

Many important “big data” applications need to processdata arriving in real time. However, current programmingmodels for distributed stream processing are relativelylow-level, often leaving the user to worry aboutconsistency of state across the system and fault recovery.Furthermore, the models that provide fault recoverydo so in an expensive manner, requiring either hot replicationor long recovery times. We propose a new programmingmodel, discretized streams (D-Streams), thatoffers a high-level functional programming API, strongconsistency, and efficient fault recovery. D-Streams supporta new recovery mechanism that improves efficiencyover the traditional replication and upstream backup solutionsin streaming databases: parallel recovery of loststate across the cluster. We have prototyped D-Streams inan extension to the Spark cluster computing frameworkcalled Spark Streaming, which lets users seamlessly intermixstreaming, batch and interactive queries.

Acknowledgment

This work was partially supported by funding provided to ICSI by an NSF CISE Expeditions award, gifts from Google, SAP, Amazon WebServices, Blue Goji, Cisco, Cloudera, Ericsson, General Electric, Hewlett Packard, Huawei, Intel, Mark-Logic, Microsoft, NetApp, Oracle, Quanta, Splunk, and VMware, as well as by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funders.

URLhttp://www.icsi.berkeley.edu/pubs/networking/ICSI_discretizedstreams12.pdf
Bibliographic Notes

Proceedings of the Fourth USENIX Conference on Hot Topics in Cloud Computing (HotCloud ’12), pp. 1-6, Boston, Massachusetts

Abbreviated Authors

M. Zaharia, T. Das, Haoyuan Li, S. Shenker and I. Stoica

ICSI Research Group

Networking and Security

ICSI Publication Type

Article in conference proceedings