Shark: SQL and Rich Analytics at Scale
Title | Shark: SQL and Rich Analytics at Scale |
Publication Type | Technical Report |
Year of Publication | 2012 |
Authors | Xin, R., Rosen J., Zaharia M., Franklin M. J., Shenker S. J., & Stoica I. |
Other Numbers | 3422 |
Abstract | Shark is a new data analysis system that marries query processingwith complex analytics on large clusters. It leverages a noveldistributed memory abstraction to provide a unified engine thatcan run SQL queries and sophisticated analytics functions (e.g., iterativemachine learning) at scale, and efficiently recovers fromfailures mid-query. This allows Shark to run SQL queries up to100 faster than Apache Hive, and machine learning programsup to 100 faster than Hadoop. Unlike previous systems, Sharkshows that it is possible to achieve these speedups while retaininga MapReduce-like execution engine, and the fine-grained faulttolerance properties that such engines provide. It extends such anengine in several ways, including column-oriented in-memory storageand dynamic mid-query replanning, to effectively execute SQL.The result is a system that matches the speedups reported for MPPanalytic databases over MapReduce, while offering fault toleranceproperties and complex analytics capabilities that they lack. |
Acknowledgment | We thank Cliff Engle, Harvey Feng, Shivaram Venkataraman, RamSriharsha, Denny Britz, Antonio Lupher, Patrick Wendell, and PaulRuan for their work on Shark. This research is supported in part byNSF CISE Expeditions award CCF-1139158, gifts from AmazonWeb Services, Google, SAP, Blue Goji, Cisco, Cloudera, Ericsson,General Electric, Hewlett Packard, Huawei, Intel, Microsoft, NetApp,Oracle, Quanta, Splunk, VMware and by DARPA (contract#FA8650-11-C-7136). |
URL | https://www.icsi.berkeley.edu/pubs/networking/ICSI_sharksql12.pdf |
Bibliographic Notes | Technical Report, UCB/EECS-2012-214, University of California at Berkeley, Department of Electrical Engineering and Computer Science, arXiv:1211.6176 [cs.DB] |
Abbreviated Authors | R. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica |
ICSI Research Group | Networking and Security |
ICSI Publication Type | Technical Report |