Bi-Directional Ensemble Feature Reconstruction Network for Few-Shot Fine-Grained Classification

1Lanzhou University of Technology, China 2Beijing University of Posts and Telecommunications, China 3SketchX, CVSSP, University of Surrey, United Kingdom

(a) is the traditional metric based method. (b) is the existing method FRN. (b) + (c) is the proposed method (d) in this paper. (b) can help the model increase the inter-class variations, (c) can help the model decrease the intra-class variations, and proposed method (d) can simultaneously increase inter-class variations while reducing intra-class variations by way of a mutual support-query and query-support reconstruction.

Abstract

The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting -- a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. We introduce the snapshot ensemble method in the episodic learning strategy -- a simple trick to further improve model performance without increasing training costs. Experimental results on three widely used fine-grained image classification datasets, as well as general and cross-domain few-shot image datasets, consistently show considerable improvements compared with other methods.

Architecture

The proposed bi-directional feature reconstruction network. FSRM refers to Feature Self-reconstruction Module and FMRM refers to Feature Mutual Reconstruction Module.
 
Feature self-reconstruction module (FSRM).
Feature mutual reconstruction module (FMRM).

Results

qualitative results
5-way few-shot classification performance on the CUB, Dogs and Cars datasets when Conv-4 backbone is used.

qualitative results
5-way few-shot classification performance on the CUB, Dogs and Cars datasets when ResNet-12 backbone is used.
qualitative results
Recovered images of different features by our bi-directional feature reconstruction network for the CUB dataset.

BibTeX

@article{wu2024bien,
  title={Bi-Directional Ensemble Feature Reconstruction Network for Few-Shot Fine-Grained Classification},
  author={Jijie Wu and Dongliang Chang and Aneeshan Sain and Xiaoxu Li and Zhanyu Ma and Jie Cao and Jun Guo and Yi-Zhe Song},
  booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

Copyright: © Jijie Wu | Last updated: 17 Mar 2024 |Template Credit: Nerfies