Most traditional model-based feature learning approaches use one single layer of transformation from the data to a problem-specific feature space (so-called "shallow'' models). They have shown effective in solving many well-constrained and well-formulated problems. Their concise architectures have led to rich theoretical and algorithm results, and allow us to incorporate prior knowledge and intuition at the problem level. However, their limited representation power proves to be insufficient for modeling the emerging Big Data. Also, most of their inference algorithms rely on iterative solutions, which suffer from efficiency and scalability bottlenecks. 

Lately, deep learning has attracted great attention, for its tremendous representation power, linear scalability, and low inference complexity. A deep feed-forward network adopts multiple layers of non-linear feature transformations and could be naturally tuned with a task-driven loss. However, generic deep architectures sometimes referred to as "black-box'' methods, largely ignore the problem-specific formulations and prior knowledge. Instead, they rely on stacking somewhat ad-hoc modules, which makes it prohibitive to interpret their working mechanisms. Despite a few hypotheses and intuitions, It appears indeed difficult to understand why deep models work, how they work, how to generalize them, and how they are related to classical learning models.


I have been passionate about bridging the "shallow'' models that emphasize problem-specific prior and facilitate interpretation and analysis, and deep models that allow for larger learning capacity, in order to devise the next-generation deep architectures that are:

  • Task-specific, namely, being optimized for the specific task by fully exploiting available prior knowledge and domain expertise, rather than applying generic data-driven models as ``black boxes''.
  • Interpretable, namely, being able to learn a representation which consists of disentangled and semantically sensible latent variables and to display more predictable behaviors.

My current work has correlated a large family of classical regularized regression models, including principal component analysis (PCA) and sparse coding (SC), to the latest deep neural networks (DNN) and convolutional neural networks (CNN). I contribute to translating the analytic tools of "shallow'' models to guide the architecture design, interpretation and performance analysis of deep models. The resulting framework interprets the empirical success of deep networks thoroughly. Moreover, it inspires the design of more task-specific deep architectures for various applications, and also improves the performance of existing models via a novel pre-training strategy.

Zhangyang (Atlas) Wang​