Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks

TitleMonotasks: Architecting for Performance Clarity in Data Analytics Frameworks
Publication TypeConference Paper
Year of Publication2017
AuthorsOusterhout, K., Canel C., Ratnasamy S., & Shenker S.
Published inProceedings of the 26th Symposium on Operating Systems Principles SOSP '17
Abstract

In today's data analytics frameworks, many users struggle to reason about the performance of their workloads. Without an understanding of what factors are most important to performance, users can't determine what configuration parameters to set and what hardware to use to optimize runtime. This paper explores a system architecture designed to make it easy for users to reason about performance bottlenecks. Rather than breaking jobs into tasks that pipeline many resources, as in today's frameworks, we propose breaking jobs into monotasks: units of work that each use a single resource. We demonstrate that explicitly separating the use of different resources simplifies reasoning about performance without sacrificing performance. Monotasks provide job completion times within 9% of Apache Spark for typical scenarios, and lead to a model for job completion time that predicts runtime under different hardware and software configurations with at most 28% error. Furthermore, separating the use of different resources allows for new optimizations to improve performance.

Acknowledgment

We indebted to Shivaram Venkataraman, for discussions during the tiny tasks project [24] that led to the idea of breaking jobs into small, single-resource units of work, and to Max Wolffe, for helping to implement disk optimization features in early versions of MonoSpark. We thank Aurojit Panda, Eddie Kohler, and Patrick Wendell for providing helpful feedback on earlier drafts of this paper. Finally, we thank our shepherd, Miguel Castro, for helping to shape the final version of this paper. This research was supported in part by a Hertz Foundation Fellowship, a Google PhD Fellowship, and Intel and other sponsors of UC Berkeley’s NetSys Lab.

URLhttps://dl.acm.org/ft_gateway.cfm?id=3132766&ftid=1913915&dwn=1&CFID=166403858&CFTOKEN=408bbb27495e8d4d-634F5679-B57D-06E5-5332283D22547DE5
ICSI Research Group

Networking and Security