SVM - Support Vector Machines
The SVM, or Support Vector Machines employs a machine learning method developed in 1990's by Vladimir Naumovich Vapnik (Soviet Academy of Science, Stanford University, Royal Holloway College London, AT&T Bell Labs New Jersey, NEC Labs Princeton, Columbia University New York and Alexey Jakovlevich Chervonenkis. This method was formally used primarily as a classification tool but later was adapted also for regression and distribution density modeling. SVM models make use of the theory of empirical risk R and Vapnik-Chervonenkis (VC) dimension of the model. It has been proven that the following inequality holds with probability (1 - h):
SVM - Support Vector Machines
where
SVM - Support Vector Machines
is risk (or actual mean error of the model), l is number of data rows, α is the model parameters vector
SVM - Support Vector Machines
is empirical risk and h is non-negative integer VC-dimension of the model. The last term on the right-hand side (the square root) is called VC-confidence.

SVM-C – SVM Classification models
SVM-R – SVM Regression models
SVM-OneClass – Distribution density

SVM-kernel transformations

Here we provide several simple examples to illustrate common SVM models and use and sense of the parameters. Despite the fact that SVM are usually employed in high-dimensional problems and rather extensive data sets, we restrict ourselves to two-dimensional small samples for easier visualization.

For more detailed information see:
PDFSupport Vector Machines - Pdf manual

Example 1 – Classification
For two continuous variables, X and Y we have four possible categorial outputs: A, B, C, D. The different levels (values) of the categorial variable are not linearly separable in the plane X, Y. This example shows the difference between linear and RBF-transformed SVM classification model. The model is trained on the data shown at the figures below. The plots show the separating hyper planes (in this case ordinary lines) for the linear model (the first plot) and separating non-linear hypersurfaces (in this case curves) for the RBF-SVM models with different value of parameter γ from γ=0.01 to γ=10. Misclass is he number of incorrectly classified cases. Too big value of γ will result in overdetermined models strongly dependent on the particular training data.
Example 2 – Classical Robust and SVM-ε regression
The parameter ε sets the width of an acceptable band around the regression model. Decreasing this parameter at a constant value of γ will increase robustness of the model against outlying values with respect to the regression model f(x). In SVM-regression, the data points outside the intervalSVM - Support Vector Machinesare considered outliers. With decreasing ε, we can thus obtain models in a certain sense similar to robust regression (like regression M-estimates) which may be used to detect outliers and to filter contaminated data. The following plots illustrate behavior of classical regression and SVM regression with varying ε and γ. SVM tries to “squeeze” as much data as possible into f(x) +- ε. The sufficiently low parameter γ prevents the model to “go through all points”, as is (nearly) the case on the plot (J) below.
SVM - Support Vector Machines
Example 3 – Unsupervised learning, distribution density, influence of γ and ν
The following table of plots illustrates the influence of γ (can be viewed as “stiffness”) and ν (ratio of the “discarded” part of the distribution), roughly said - the model will describe 100(1- ν)% of the distribution with highest density. Observe the following plots to understand the role of the two parameters.

Last Updated ( 03.06.2013 )