Robust Machine Learning for Security

Presented by Yizheng Cheng

Tuesday, August 30, 2019
2:30 p.m.
ICSI Room 6A


Title: Robust Machine Learning for Security
Presented by Yizheng Cheng


Building robust machine learning models has always been a cat-and-mouse game, with new attacks constantly devised to defeat the defenses. Recently, a new paradigm has emerged to train verifiably robust machine learning models for image classification tasks. To end the cat-and-mouse game, verifiably robust training provides the ML model with robustness properties that can be formally verified against any possible bounded attackers.

Verifiably robust training minimizes the over-estimated attack success rate, utilizing the sound over-approximation method. Due to fundamental differences between ML models and traditional software, new sound over-approximation methods have been proposed to provide proofs for the robustness properties. In particular, soundness means that if no successful attacks can be found by the analysis, there indeed doesn’t exist any. If we can apply the training technique for security-relevant classifiers, we can train ML models with robustness properties on the worst-case behavior, even if the adversaries adapt the attacks after knowing the defense.

In this talk, I will first show how we design and implement MixTrain, to efficiently train classifiers with different properties. We are the first to scale to Imagenet-200 dataset, the largest dataset/model ever used for verifiably robust training. Then, I will describe how to train PDF malware classifiers with verifiable robustness properties. For instance, a robustness property can enforce that no matter how many pages from benign documents are inserted into a PDF malware, the classifier must classify the PDF as malicious. By training PDF malware classifiers to be verifiably robust against building-block attacks, we can make it harder for attackers to come up with more sophisticated attacks.

Yizheng Chen is a Postdoctoral Researcher at Columbia University. She received her Ph.D. degree in Computer Science from Georgia Institute of Technology. She is interested in designing and implementing secure machine learning systems, and applying machine learning and graphical models to solve security problems.