Researchers at the Stanford Artificial Intelligence Laboratory (SAIL) have had more success (building on previous work) using Jacket to speed up their algorithm.
In a paper at this year’s CVPR 2011, entitled “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis”, they explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features.
KTH |
Hollywood2 |
UCF |
Youtube |
|
Best published Results |
92.1% |
50.9% |
85.6% |
71.2% |
Stanford group Results |
93.9% |
53.3% |
86.5% |
75.8% |
Testing their algorithm on four well-known benchmark datasets, they were able to achieve better performance than existing results that have been published so far.
For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) network. An ISA is used for learning features from image patches without supervision.
The standard ISA algorithm however becomes computationally inefficient when the size of the image patches is scaled up. To overcome this problem, they developed a convolutional neural network which makes use of PCA and ISA at alternating levels. The output of the ISA at the a particular level was used to convolve a larger region of the image. The results of this convolution step were fed into the PCA layer for pre-processing before being passed on to the next ISA layer.
Learning spatio-temporal features from video signals was done by using their model to learn features from 3D video blocks rather than 2D image samples.
The trained networks appear to have learned features that are robust to translation, but sensitive to frequency and rotation (at the first level). The features learned at the second level appear to represent more complex shapes such as corners.
The performance of their algorithm, as mentioned earlier, not beats the best published results in accuracy, but is also generally faster both while training and testing. The times taken for feature extraction of their algorithm are given below.
Algorithm |
Frames per Second |
Speedup |
HOG3D |
4.54 |
1.0 |
Stacked ISA (layer 1 only) |
7.14 |
1.6 |
Stacked ISA (flayers 1 and 2) |
2.27 |
0.5 |
Stacked ISA with Jacket (layers 1 and 2) |
10 |
2.2 |
As you can see, using Jacket for their algorithm (dominated by matrix vector and convolution problems), a 4.4X speedup was achieved over the CPU implementation!
Special thanks to Quoc V. Le, Will Y. Zou, Serena Y. Yeung and Andrew Y. Ng from SAIL for sharing their research. We have more success stories from their group. Keep an eye out for more blog posts!