Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Fine-grained pose prediction, normalization, and recognition  [PDF]
Ning Zhang,Evan Shelhamer,Yang Gao,Trevor Darrell
Computer Science , 2015,
Abstract: Pose variation and subtle differences in appearance are key challenges to fine-grained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring networks to parts for better accuracy. Identifying parts to find correspondence discounts pose variation so that features can be tuned to appearance. To this end previous methods have examined how to find parts and extract pose-normalized features. These methods have generally separated fine-grained recognition into stages which first localize parts using hand-engineered and coarsely-localized proposal features, and then separately learn deep descriptors centered on inferred part positions. We unify these steps in an end-to-end trainable network supervised by keypoint locations and class labels that localizes parts by a fully convolutional network to focus the learning of feature representations for the fine-grained classification task. Experiments on the popular CUB200 dataset show that our method is state-of-the-art and suggest a continuing role for strong supervision.
Recognizing Fine-Grained and Composite Activities using Hand-Centric Features and Script Data  [PDF]
Marcus Rohrbach,Anna Rohrbach,Michaela Regneri,Sikandar Amin,Mykhaylo Andriluka,Manfred Pinkal,Bernt Schiele
Computer Science , 2015, DOI: 10.1007/s11263-015-0851-8
Abstract: Activity recognition has shown impressive progress in recent years. However, the challenges of detecting fine-grained activities and understanding how they are combined into composite activities have been largely overlooked. In this work we approach both tasks and present a dataset which provides detailed annotations to address them. The first challenge is to detect fine-grained activities, which are defined by low inter-class variability and are typically characterized by fine-grained body motions. We explore how human pose and hands can help to approach this challenge by comparing two pose-based and two hand-centric features with state-of-the-art holistic features. To attack the second challenge, recognizing composite activities, we leverage the fact that these activities are compositional and that the essential components of the activities can be obtained from textual descriptions or scripts. We show the benefits of our hand-centric approach for fine-grained activity classification and detection. For composite activity recognition we find that decomposition into attributes allows sharing information across composites and is essential to attack this hard task. Using script data we can recognize novel composites without having training data for them.
Fine-grained Recognition Datasets for Biodiversity Analysis  [PDF]
Erik Rodner,Marcel Simon,Gunnar Brehm,Stephanie Pietsch,J. Wolfgang W?gele,Joachim Denzler
Computer Science , 2015,
Abstract: In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude with a list of challenging new research directions in the area of visual classification for biodiversity research.
The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition  [PDF]
Jonathan Krause,Benjamin Sapp,Andrew Howard,Howard Zhou,Alexander Toshev,Tom Duerig,James Philbin,Li Fei-Fei
Computer Science , 2015,
Abstract: While models of fine-grained recognition have made great progress in recent years, little work has focused on a key ingredient of making recognition work: data. We use publicly available, noisy data sources to train generic models which vastly improve upon state-of-the-art on fine-grained benchmarks. First, we present an active learning system using non-expert human raters, and improve upon state-of-the-art performance without any text or other metadata associated with the images. Second, we show that training on publicly-available noisy web image search results achieves even higher accuracies, without using any expert-annotated training data, while scaling to over ten thousand fine-grained categories. We analyze the behavior of our models and data and make a strong case for the importance of data over special-purpose modeling: using only an off-the-shelf CNN, we obtain top-1 accuracies of 92.8\% on CUB-200-2011 Birds, 85.4\% on Birdsnap, 95.9\% on FGVC-Aircraft, and 82.6\% on Stanford Dogs.
Fine-Grained Product Class Recognition for Assisted Shopping  [PDF]
Marian George,Dejan Mircic,Gábor S?r?s,Christian Floerkemeier,Friedemann Mattern
Computer Science , 2015,
Abstract: Assistive solutions for a better shopping experience can improve the quality of life of people, in particular also of visually impaired shoppers. We present a system that visually recognizes the fine-grained product classes of items on a shopping list, in shelves images taken with a smartphone in a grocery store. Our system consists of three components: (a) We automatically recognize useful text on product packaging, e.g., product name and brand, and build a mapping of words to product classes based on the large-scale GroceryProducts dataset. When the user populates the shopping list, we automatically infer the product class of each entered word. (b) We perform fine-grained product class recognition when the user is facing a shelf. We discover discriminative patches on product packaging to differentiate between visually similar product classes and to increase the robustness against continuous changes in product design. (c) We continuously improve the recognition accuracy through active learning. Our experiments show the robustness of the proposed method against cross-domain challenges, and the scalability to an increasing number of products with minimal re-training.
Bilinear CNN Models for Fine-grained Visual Recognition  [PDF]
Tsung-Yu Lin,Aruni RoyChowdhury,Subhransu Maji
Computer Science , 2015,
Abstract: We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only. Using networks initialized from the ImageNet dataset followed by domain specific fine-tuning we obtain 84.1% accuracy of the CUB-200-2011 dataset requiring only category labels at training time. We present experiments and visualizations that analyze the effects of fine-tuning and the choice two networks on the speed and accuracy of the models. Results show that the architecture compares favorably to the existing state of the art on a number of fine-grained datasets while being substantially simpler and easier to train. Moreover, our most accurate model is fairly efficient running at 8 frames/sec on a NVIDIA Tesla K40 GPU. The source code for the complete system will be made available at http://vis-www.cs.umass.edu/bcnn.
A Coarse-to-Fine Model for 3D Pose Estimation and Sub-category Recognition  [PDF]
Roozbeh Mottaghi,Yu Xiang,Silvio Savarese
Computer Science , 2015,
Abstract: Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hierarchical representation, where each level of the hierarchy represents objects at a different level of granularity. The hierarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint modelling enables resolving ambiguities that exist in independent modelling of these tasks. We augment PASCAL3D+ dataset with annotations for these tasks and show that our hierarchical model is effective in joint modelling of object detection, 3D pose estimation, and sub-category recognition.
WHOI-Plankton- A Large Scale Fine Grained Visual Recognition Benchmark Dataset for Plankton Classification  [PDF]
Eric C. Orenstein,Oscar Beijbom,Emily E. Peacock,Heidi M. Sosik
Computer Science , 2015,
Abstract: Planktonic organisms are of fundamental importance to marine ecosystems: they form the basis of the food web, provide the link between the atmosphere and the deep ocean, and influence global-scale biogeochemical cycles. Scientists are increasingly using imaging-based technologies to study these creatures in their natural habit. Images from such systems provide an unique opportunity to model and understand plankton ecosystems, but the collected datasets can be enormous. The Imaging FlowCytobot (IFCB) at Woods Hole Oceanographic Institution, for example, is an \emph{in situ} system that has been continuously imaging plankton since 2006. To date, it has generated more than 700 million samples. Manual classification of such a vast image collection is impractical due to the size of the data set. In addition, the annotation task is challenging due to the large space of relevant classes, intra-class variability, and inter-class similarity. Methods for automated classification exist, but the accuracy is often below that of human experts. Here we introduce WHOI-Plankton: a large scale, fine-grained visual recognition dataset for plankton classification, which comprises over 3.4 million expert-labeled images across 70 classes. The labeled image set is complied from over 8 years of near continuous data collection with the IFCB at the Martha's Vineyard Coastal Observatory (MVCO). We discuss relevant metrics for evaluation of classification performance and provide results for a traditional method based on hand-engineered features and two methods based on convolutional neural networks.
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets  [PDF]
Steve Branson,Grant Van Horn,Serge Belongie,Pietro Perona
Computer Science , 2014,
Abstract: We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object's pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric warping functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of state-of-the-art deep convolutional feature implementations and fine-tuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%).
Extended Report: Fine-grained Recognition of Abnormal Behaviors for Early Detection of Mild Cognitive Impairment  [PDF]
Daniele Riboni,Claudio Bettini,Gabriele Civitarese,Zaffar Haider Janjua,Rim Helaoui
Computer Science , 2015,
Abstract: According to the World Health Organization, the rate of people aged 60 or more is growing faster than any other age group in almost every country, and this trend is not going to change in a near future. Since senior citizens are at high risk of non communicable diseases requiring long-term care, this trend will challenge the sustainability of the entire health system. Pervasive computing can provide innovative methods and tools for early detecting the onset of health issues. In this paper we propose a novel method to detect abnormal behaviors of elderly people living at home. The method relies on medical models, provided by cognitive neuroscience researchers, describing abnormal activity routines that may indicate the onset of early symptoms of mild cognitive impairment. A non-intrusive sensor-based infrastructure acquires low-level data about the interaction of the individual with home appliances and furniture, as well as data from environmental sensors. Based on those data, a novel hybrid statistical-symbolical technique is used to detect the abnormal behaviors of the patient, which are communicated to the medical center. Differently from related works, our method can detect abnormal behaviors at a fine-grained level, thus providing an important tool to support the medical diagnosis. In order to evaluate our method we have developed a prototype of the system and acquired a large dataset of abnormal behaviors carried out in an instrumented smart home. Experimental results show that our technique is able to detect most anomalies while generating a small number of false positives.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.