Spark: Cluster Computing with Working Sets
Title | Spark: Cluster Computing with Working Sets |
Publication Type | Conference Paper |
Year of Publication | 2010 |
Authors | Zaharia, M., Chowdhury M., Franklin M. J., Shenker S. J., & Stoica I. |
Page(s) | 1-7 |
Other Numbers | 3385 |
Abstract | MapReduce and its variants have been highly successfulin implementing large-scale data-intensive applicationson commodity clusters. However, most of these systemsare built around an acyclic data flow model that is notsuitable for other popular applications. This paper focuseson one such class of applications: those that reusea working set of data across multiple parallel operations.This includes many iterative machine learning algorithms,as well as interactive data analysis tools. We propose anew framework called Spark that supports these applicationswhile retaining the scalability and fault tolerance ofMapReduce. To achieve these goals, Spark introduces anabstraction called resilient distributed datasets (RDDs).An RDD is a read-only collection of objects partitionedacross a set of machines that can be rebuilt if a partitionis lost. Spark can outperform Hadoop by 10x in iterativemachine learning jobs, and can be used to interactivelyquery a 39 GB dataset with sub-second response time. |
Acknowledgment | We thank Ali Ghodsi for his feedback on this paper. Thisresearch was supported by California MICRO, CaliforniaDiscovery, the Natural Sciences and Engineering ResearchCouncil of Canada, as well as the following BerkeleyRAD Lab sponsors: Sun Microsystems, Google, Microsoft,Amazon, Cisco, Cloudera, eBay, Facebook, Fujitsu,HP, Intel, NetApp, SAP, VMware, and Yahoo!. |
URL | http://www.icsi.berkeley.edu/pubs/networking/ICSI_sparkclustercomputing10.pdf |
Bibliographic Notes | Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10), pp. 1-7, Boston, Massachusetts |
Abbreviated Authors | M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker and I. Stoica |
ICSI Research Group | Networking and Security |
ICSI Publication Type | Article in conference proceedings |