Feature Learning Architectures with GPU-acceleration

ScottCase Studies Leave a Comment

Stanford researchers in Andrew Ng’s group used GPUs and Jacket to speed up their work on Feature Learning Architectures. They wanted to know why certain feature learning architectures with random, untrained weights perform so well on object recognition tasks. The complete write up can be found in On random weights and unsupervised feature learning in ICML 2011.

They decide to use GPUs and Jacket for this study because of “the need to quickly evaluate many architectures on thousands of images.” Jacket taps into the immense computing power of GPUs and speeds up research utilizing many images.

This is the architecture used in the study:

 

They started by studying the basis of good performance for systems and found convolutional pooling architectures can be inherently frequency selective and translation invariant, even when initialized with random weights.

Further investigation showed that the key to good performance lies not only in improving the learning algorithms but also in searching for the most suitable architectures.

This picture from their study shows that, when embedded in this particular architecture, even random convolutional filters are selective for oriented features (edges) in the input image. The top row shows example random filters. The bottom rows show the features preferred by this architecture (they differ in the sort of convolution used).

This plot shows the correlation in classification performance between architectures with random weights, and architectures with standard, trained weights. Each point indicates the performance of a particular architecture. As can be seen, the top-performing architectures with random weights are also the top-performing architectures with trained weights. This enables very fast architecture search by using random weights performance to select a good architecture, which can then be trained.

 

In the end, a sizeable component of a system’s performance can come from the intrinsic properties of the architecture, and not from the learning system.

 

Special thanks to Andrew M. Saxe, Pang Wei Koh, Zhenghao Chen, Maneesh Bhand, Bipin Suresh, and Andrew Y. Ng for sharing their research. These guys are doing amazing work over at Stanford and we are patiently waiting to see what they come out with next.

Looking forward to seeing more great applications of GPUs and Jacket from these guys in the future.

 

Leave a Reply

Your email address will not be published. Required fields are marked *