Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

TitleMatrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
Publication TypeMiscellaneous
Year of Publication2016
AuthorsGittens, A., Devarakonda A., Racah E., Ringenburg M., Gerhardt L., Kottalam J., Liu J., Maschhoff K., Canon S., Chhugani J., Sharma P., Yang J., Demmel J., Harrell J., Krishnamurthy V., Mahoney M. W., & Prabhat

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.


This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Oce of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. We would like to thank Doug Jacobsen, Woo-Sun Yang, Tina Declerck and Rebecca Hartman-Baker for assistance with the large scale runs at NERSC. We thank Edgar Solomonik, Penporn Koanantakool and Evangelos Georganas for helpful comments and suggestions on tuning the MPI codes. We would like to acknowledge Craig Tull, Ben Bowen and Michael Wehner for providing the scientific data sets used in the study. This research is partially funded by DARPA Award Number HR0011-12-2-0016, the Center forFuture Architecture Research, a member of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and ASPIRE Lab industrial sponsors and aliates Intel,Google, Hewlett-Packard, Huawei, LGE, NVIDIA, Oracle, and Samsung. This work is supported by Cray, Inc., the Defense Advanced Research Projects Agency XDATA program and DOE Office of Science grants DOE DE-SC0010200 DE-SC0008700, DE-SC0008699. AD is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1106400. Any opinions, findings, conclusions, or recommendations in this paper are solely those of the authors and does not necessarily re ect the position or the policy of the sponsors.

ICSI Research Group

Big Data