|
Smoothed Linear Modeling for Smooth Spectral DataDOI: 10.1155/2013/604548 Abstract: Classification and prediction problems using spectral data lead to high-dimensional data sets. Spectral data are, however, different from most other high-dimensional data sets in that information usually varies smoothly with wavelength, suggesting that fitted models should also vary smoothly with wavelength. Functional data analysis, widely used in the analysis of spectral data, meets this objective by changing perspective from the raw spectra to approximations using smooth basis functions. This paper explores linear regression and linear discriminant analysis fitted directly to the spectral data, imposing penalties on the values and roughness of the fitted coefficients, and shows by example that this can lead to better fits than existing standard methodologies. 1. Introduction There are a number of settings in which one wishes to predict some dependent variable from measurements of an optical spectrum. For example, Brown et al. [1] were concerned with the regression problem of predicting the composition of dough on the basis of measurements of the near infrared spectrum emitted from dough samples. Schomacker et al. [2] discussed the classification problem of deciding whether a colon polyp was benign or malignant on the basis of the optical spectrum emitted after illuminating the polyp with a laser. Both these applications involve electromagnetic spectra, but the scope of spectral data modeling is much broader: auditory spectra and chemical chromatography data also fit the same framework as the general problem considered here. Many fields of science now deal with high-dimensional data sets—“large ” data sets have many cases, and “large ” data sets have many measurements per case. “Small large ” problems are particularly challenging and have been the subject of much recent research. Spectral data are generally in the “large ” class and may also have “small .” They are, however, different from “small large ” problems in many other areas such as microarrays in that it is often expected that the regression models should be smooth: if the signal measured at 450?nm is predictive, then one would expect that measured at 449?nm and at 451?nm to be about equally predictive. This is the setting considered in this paper, where it is assumed that subject-matter knowledge motivates models in which the information varies smoothly with wavelength. This feature sets such spectral data apart from the typical statistical high-dimensional data set and leads to considering methods that fit models in which the coefficients are smooth functions of the wavelengths to which
|