SVM - Support Vector Machines |
The SVM,
or Support Vector Machines employs a machine learning method developed in
1990's by Vladimir Naumovich Vapnik (Soviet Academy of Science, Stanford
University, Royal Holloway College London, AT&T Bell Labs New Jersey, NEC
Labs Princeton, Columbia University New York and Alexey Jakovlevich
Chervonenkis. This method was formally used primarily as a classification tool
but later was adapted also for regression and distribution density modeling.
SVM models make use of the theory of empirical risk R and Vapnik-Chervonenkis (VC) dimension of the model. It has been
proven that the following inequality holds with probability (1 - h):
where is risk (or actual mean error of the model), l is number of data rows, α is the model parameters vector is empirical risk and h is non-negative integer VC-dimension of the model. The last term on the right-hand side (the square root) is called VC-confidence. SVM-C – SVM Classification models SVM-R – SVM Regression models SVM-OneClass – Distribution density SVM-kernel transformations Here we provide several simple examples to illustrate common SVM models and use and sense of the parameters. Despite the fact that SVM are usually employed in high-dimensional problems and rather extensive data sets, we restrict ourselves to two-dimensional small samples for easier visualization. For more detailed information see: Support Vector Machines - Pdf manual
Example 1 – Classification
For two continuous variables, X and Y we have four possible categorial outputs: A, B, C, D. The different levels (values) of the categorial variable are not linearly separable in the plane X, Y. This example shows the difference between linear and RBF-transformed SVM classification model. The model is trained on the data shown at the figures below. The plots show the separating hyper planes (in this case ordinary lines) for the linear model (the first plot) and separating non-linear hypersurfaces (in this case curves) for the RBF-SVM models with different value of parameter γ from γ=0.01 to γ=10. Misclass is he number of incorrectly classified cases. Too big value of γ will result in overdetermined models strongly dependent on the particular training data.
Example 2 – Classical Robust and SVM-ε regression
The parameter ε sets the width of an acceptable band around the regression model. Decreasing this parameter at a constant value of γ will increase robustness of the model against outlying values with respect to the regression model f(x). In SVM-regression, the data points outside the intervalare considered outliers. With decreasing ε, we can thus obtain models in a certain sense similar to robust regression (like regression M-estimates) which may be used to detect outliers and to filter contaminated data. The following plots illustrate behavior of classical regression and SVM regression with varying ε and γ. SVM tries to “squeeze” as much data as possible into f(x) +- ε. The sufficiently low parameter γ prevents the model to “go through all points”, as is (nearly) the case on the plot (J) below.
Example 3 – Unsupervised learning, distribution density, influence of γ and ν
The following table of plots illustrates the influence of γ (can be viewed as “stiffness”) and ν (ratio of the “discarded” part of the distribution), roughly said - the model will describe 100(1- ν)% of the distribution with highest density. Observe the following plots to understand the role of the two parameters.
|
|
Last Updated ( 03.06.2013 ) |