The Case for Tiny Tasks in Compute Clusters

AuthorsOusterhout, K., Panda A., Rosen J., Venkataraman S., Xin R., Ratnasamy S., Shenker S. J., & Stoica I.
We argue for breaking data-parallel jobs in compute clustersinto tiny tasks that each complete in hundreds of milliseconds.Tiny tasks avoid the need for complex skewmitigation techniques: by breaking a large job into millionsof tiny tasks, work will be evenly spread over availableresources by the scheduler. Furthermore, tiny tasksalleviate long wait times seen in today’s clusters for interactivejobs: even large batch jobs can be split into smalltasks that finish quickly. We demonstrate a 5.2x improvementin response times due to the use of smaller tasks.In current data-parallel computing frameworks, hightask launch overheads and scalability limitations preventusers from running short tasks. Recent research has addressedmany of these bottlenecks; we discuss remainingchallenges and propose a task execution framework thatcan efficiently support tiny tasks.


We thank Matei Zaharia, Colin Scott, John Ousterhout,and Patrick Wendell for useful feedback on earlierdrafts of this paper. This research is supported inpart by NSF CISE Expeditions award CCF-1139158 andDARPA XData Award FA8750-12-2-0331; gifts fromAmazon Web Services, Google, SAP, Blue Goji, Cisco,Clearstory Data, Cloudera, Ericsson, Facebook, GeneralElectric, Hortonworks, Huawei, Intel, Microsoft, NetApp,Oracle, Quanta, Samsung, Splunk, VMware andYahoo!; and a Hertz Foundation Fellowship.

