|
Principal Component Analysis and Neural Networks for Authorship AttributionKeywords: principal components , authorship attribution , stylometry , text categorization , function words , stylistic features , syntactic characteristics , multilayer preceptor , artificial neural network Abstract: A common problem in statistical pattern recognition isthat of feature selection or feature extraction. Feature selectionrefers to a process whereby a data space is transformed into a featurespace that, in theory, has exactly the same dimension as the originaldata space. However, the transformation is designed in such a waythat the data set may be represented by a reduced number of"effective" features and yet retain most of the intrinsic informationcontent of the data; in other words, the data set undergoes adimensionality reduction.In this paper the data collected by counting selected syntacticcharacteristics in around a thousand paragraphs of each of thesample books underwent a principal component analysis performedusing neural networks. Then, first of the principal components areused to distinguish authors of the texts by the use of multilayerpreceptor type artificial neural networks.
|