|
中国图象图形学报 2012
Fusing audio-words with visual features for adult video detection
|
Abstract:
Multi-modality based adult video detection is an effective approach for filtering pornographic information.However,existing methods lack accurate representation methods of audio semantics.Therefore,a novel method is presented in this paper to fuse audio-words with visual features for adult video detection.First,we propose a periodicity-based segmentation algorithm of units of energy envelope (EE).Audio streams are divided into sequences of EE.Second,audio semantics representation method based on EE and BoW (Bag-of-Words) is presented to describe the features of the EE as the occurrence probabilities of audio-words.Integrated weighting methods are used to fuse the detection results of audio-words and visual features.Furthermore,we propose a periodicity-based decision algorithm to judge adult videos to cooperate with the preceding periodicity-based segmentation algorithm.Therefore,we make full use of the periodicity.Our experiments show that our approach remarkably improves the detection performance compared with the method based on visual features.The true positive rate achieves 94.44% while the false positive rate is 9.76%.