A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness

TitleA Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness
Publication TypeConference Paper
Year of Publication2013
AuthorsCook, H., Moretó M., Bird S., Dao K., Patterson D., & Asanović K.
Other Numbers3463

Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness, interactive and batch applications are often run on disjoint sets of resources, but this incurs additional energy, power, and capital costs. In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground applications, while avoiding degradation in interactive responsiveness. We evaluate these tradeoffs using commercial x86 multicore hardware that supports cache partitioning, and find that real hardware measurements with full applications provide different observations than past simulation-based evaluations. Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance degradation of up to 34% with an average of 6%. With optimal static LLC partitioning, the average energy improvement increases to 12% and the average throughput improvement to 60%, while the worst case slowdown is reduced noticeably to 7% with an average slowdown of only 2%. We also evaluate a practical low-overhead dynamic algorithm to control partition sizes, and are able to realize the potential performance guarantees of the optimal static approach, while increasing background throughput by an additional 19%.


We would especially like to thank everyone at Intel whomade it possible for us to use the cache-partitioning ma-chine in this paper, including Opher Kahn, Andrew Her-drich, Ravi Iyer, Gans Srinivasa, Mark Rowland, Ian Steinerand Henry Gabb. We would also like to Scott Beamer, ChrisCelio, Shoaib Kamil, Leo Meyerovich, and David Sheeldfor allowing us to study their applications. Additionally,we would like to thank our colleagues in the Par Lab fortheir continual advice, support, and, feedback. Researchsupported by Microsoft (Award 024263) and Intel (Award024894) funding and by matching funding by U.C. Discovery(Award DIG07-10227). Additional support comes from ParLab aliates Nokia, NVIDIA, Oracle, and Samsung. M.Moreto was supported by the Spanish Ministry of Scienceunder contract TIN2012-34557, a MEC/Fulbright Fellow-ship, and by an AGAUR award (BE-DGR 2010).

Bibliographic Notes

Proceedings of the International Symposium on Computer Architecture (ISCA-2013), Tel Aviv, Israel

Abbreviated Authors

H. Cook, M. Moretó, S. Bird, K. Dao, D. Patterson, and K. Asanovi?

ICSI Research Group


ICSI Publication Type

Article in conference proceedings