Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
Title | Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Authors | Zaharia, M., Das T., Li H., Shenker S. J., & Stoica I. |
Page(s) | 1-6 |
Other Numbers | 3352 |
Abstract | Many important big data applications need to processdata arriving in real time. However, current programmingmodels for distributed stream processing are relativelylow-level, often leaving the user to worry aboutconsistency of state across the system and fault recovery.Furthermore, the models that provide fault recoverydo so in an expensive manner, requiring either hot replicationor long recovery times. We propose a new programmingmodel, discretized streams (D-Streams), thatoffers a high-level functional programming API, strongconsistency, and efficient fault recovery. D-Streams supporta new recovery mechanism that improves efficiencyover the traditional replication and upstream backup solutionsin streaming databases: parallel recovery of loststate across the cluster. We have prototyped D-Streams inan extension to the Spark cluster computing frameworkcalled Spark Streaming, which lets users seamlessly intermixstreaming, batch and interactive queries. |
Acknowledgment | This work was partially supported by funding provided to ICSI by an NSF CISE Expeditions award, gifts from Google, SAP, Amazon WebServices, Blue Goji, Cisco, Cloudera, Ericsson, General Electric, Hewlett Packard, Huawei, Intel, Mark-Logic, Microsoft, NetApp, Oracle, Quanta, Splunk, and VMware, as well as by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funders. |
URL | http://www.icsi.berkeley.edu/pubs/networking/ICSI_discretizedstreams12.pdf |
Bibliographic Notes | Proceedings of the Fourth USENIX Conference on Hot Topics in Cloud Computing (HotCloud 12), pp. 1-6, Boston, Massachusetts |
Abbreviated Authors | M. Zaharia, T. Das, Haoyuan Li, S. Shenker and I. Stoica |
ICSI Research Group | Networking and Security |
ICSI Publication Type | Article in conference proceedings |