Vision Projects

Representation Learning

We study learning mid-level representation from natural non-curated data to achieve efficient and generalizing performance on downstream visual tasks such as recognition, segmentation, and detection.  We exploit instance discrimination, instance grouping, model bias and variance analysis, pixel-to-segment contrastive learning, and visual memory to handle open-set recognition, long-tail distribution, open compound domain adaptation, unsupervised or weakly supervised recognition and segmentation.

Complex-valued Deep Learning

Complex-valued data is ubiquitous in physics and signal processing applications, and complex-valued representations in deep learning have appealing theoretical properties. While these aspects have long been recognized, complex-valued deep learning lags far behind its real-valued counterpart.  Existing methods ignore the rich geometry of complex-valued data, instead opting to use the same techniques and architectures as real-valued data, with undesirable consequences such as decreased robustness, larger model sizes, and poor generalization.

Sound and Vision Integration

Sound carries complementary information to vision and can help scene understanding and navigation. We train a model to tell individual sounds apart without using labels, which can be used to accelerate subsequent training on supervised sound event classification, and to explain how song birds such as zebra finch can develop communication without any external supervision.  We also demonstrate with a low-cost real system that learns echolocation and generates depth images only from sound.

Implicit Deep Learning

Most modern neural networks are defined explicitly as a sequence of layers with various connections.  Any desired property such as translational equivariance needs to be hard-coded into the architecture, which is inflexible and restrictive.  In contrast, implicit models are defined as a set of constraints to satisfy or criteria to optimize at the test time.  This framework can help express a large class of operations such as test-time optimization, planning, dynamics, constraints, and feedback.  Our research explores implicit models to integrate invariance and equivariance constraints in co

Sketch Recognition and Photo Synthesis

Sketches are rapidly executed freehand drawings that make an intuitive and powerful visual expression.  While they lack visual details and have spatial/geometrical distortions, humans can effortlessly envision objects from sketches.  We study translations between sketches, photos, and 3D models at both the object level and the scene level.

3D Point Cloud Parsing

Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. We study a novel approach to learn different non-rigid transformations of the input point cloud so that optimal local neighborhoods can be adopted at each layer. 

Model Learning and Compression

Traditional deep neural network learning seeks an optimal model in a large model space.  However, the optimal model ends up with a lot of redundancy which is then removed during model compression.  We seek an optimal model in a reduced model space without jeopardizing optimality.  We study several techniques such as tied block convolution (TBC), light-cost regularizer (OCNN), and recurrent parameter generator (RPG) where smaller and leaner models are optimized and can be deployed directly with more robustness and generalizability.

Vision-Based Reinforcement Learning

Vision-based reinforcement learning (RL) is successful, but how to generalize it to unknown test environments remains challenging.  It not only needs to process high-dimensional visual inputs, but it is also required to deal with significant variations in new test scenarios, e.g. color/texture changes or moving distractors.

Robotic Manipulation and Locomotion

Existing methods for robotic manipulation and locomotion overlook real world constraints such as data availability, data efficiency, and data quality.  We explore novel approaches that incorporate curriculum learning, latent space information extraction, and invariant states to improve the generalizability of learned control policies against environmental and robot configurational changes. 

Machine Learning for Medical Applications

We explore unsupervised representation learning to not only reduce labeling bias inherent in supervised image-based medical diagnosis, but also allow data-driven discovery of novel pathological, physiological, and camera-related domains.  We also explore machine learning to generate realistic healthy and tumor medical scans to study human visual perception in radiology. 

Machine Learning in the Hyperbolic Space

We study how hyperbolic space can be used to facilitate the formation of hierarchical representations from natural data without any supervision.  We demonstrate that hyperbolic neural networks outperform standard  Euclidean counterparts when their optimization process is improved with a restricted feature space, resulting in higher classification performance, more adversarial robustness, and better out-of-distribution detection capability.

Adversarial Robustness

We study how to increase both classification accuracy and robustness with image-wise and pixel-wise representation learning, where perceptual organization is incorporated.