Shark: Fast Data Analysis Using Coarse-grained Distributed Memory

TitleShark: Fast Data Analysis Using Coarse-grained Distributed Memory
Publication TypeConference Paper
Year of Publication2012
AuthorsEngle, C., Lupher A., Xin R., Zaharia M., Franklin M. J., Shenker S., & Stoica I.
Other Numbers3383
Abstract

Shark is a research data analysis system built on a novelcoarse-grained distributed shared-memory abstraction. Sharkmarries query processing with deep data analysis, providinga unified system for easy data manipulation using SQL andpushing sophisticated analysis closer to data. It scales tothousands of nodes in a fault-tolerant manner. Shark cananswer queries 40X faster than Apache Hive and run machinelearning programs 25X faster than MapReduce programsin Apache Hadoop on large datasets.

Acknowledgment

We would like to thank Peter Alvaro, Eric Yi Liu, TimKraska, Gene Pang, and Andrew Wang for feedback.This research is supported in part by gifts from Google,SAP, Amazon Web Services, Blue Goji, Cloudera, Ericsson,General Electric, Hewlett Packard, Huawei, IBM, Intel,MarkLogic, Microsoft, NEC Labs, NetApp, Oracle, Quanta,Splunk, VMware and by DARPA (contract #FA8650-11-C-7136).

URLhttp://www.icsi.berkeley.edu/pubs/networking/ICSI_sharkfastdata12.pdf
Bibliographic Notes

Demo, ACM SIGMOD/PODS Conference, Scottsdale, Arizona

Abbreviated Authors

C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, S. Shenker, and I. Stoica

ICSI Research Group

Networking and Security

ICSI Publication Type

Article in conference proceedings