Improving Food Image Recognition with Noisy Vision Transformer

Food image recognition is a challenging task in computer vision due to the high variability and complexity of food images. In this study, we investigate the potential of Noisy Vision Transformers (NoisyViT) for improving food classification performance. By introducing noise into the learning process, NoisyViT reduces task complexity and adjusts the entropy of the system, leading to enhanced model accuracy. We fine-tune NoisyViT on three benchmark datasets: Food2K (2,000 categories, ~1M images), Food-101 (101 categories, ~100K images), and CNFOOD-241 (241 categories, ~190K images). The performance of NoisyViT is evaluated against state-of-the-art food recognition models. Our results demonstrate that NoisyViT achieves Top-1 accuracies of 95%, 99.5%, and 96.6% on Food2K, Food-101, and CNFOOD-241, respectively, significantly outperforming existing approaches. This study underscores the potential of NoisyViT for dietary assessment, nutritional monitoring, and healthcare applications, paving the way for future advancements in vision-based food computing. Code for reproducing NoisyViT for food recognition is available at NoisyViT_Food.

The code is available as GitHub repository.

 

Subcategories

Flexible Robust Instrumentation Of Electronic Nicotine Delivery Systems (FRIENDS)

We use cookies

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.