SVM  Support Vector Machines 
< Prev  Next > 

The SVM,
or Support Vector Machines employs a machine learning method developed in
1990's by Vladimir Naumovich Vapnik (Soviet Academy of Science, Stanford
University, Royal Holloway College London, AT&T Bell Labs New Jersey, NEC
Labs Princeton, Columbia University New York and Alexey Jakovlevich
Chervonenkis. This method was formally used primarily as a classification tool
but later was adapted also for regression and distribution density modeling.
SVM models make use of the theory of empirical risk R and VapnikChervonenkis (VC) dimension of the model. It has been
proven that the following inequality holds with probability (1  h):
where is risk (or actual mean error of the model), l is number of data rows, α is the model parameters vector is empirical risk and h is nonnegative integer VCdimension of the model. The last term on the righthand side (the square root) is called VCconfidence. SVMC – SVM Classification models SVMR – SVM Regression models SVMOneClass – Distribution density SVMkernel transformations Here we provide several simple examples to illustrate common SVM models and use and sense of the parameters. Despite the fact that SVM are usually employed in highdimensional problems and rather extensive data sets, we restrict ourselves to twodimensional small samples for easier visualization. For more detailed information see: Support Vector Machines  Pdf manual
Example 1 – Classification
For two continuous variables, X and Y we have four possible categorial outputs: A, B, C, D. The different levels (values) of the categorial variable are not linearly separable in the plane X, Y. This example shows the difference between linear and RBFtransformed SVM classification model. The model is trained on the data shown at the figures below. The plots show the separating hyper planes (in this case ordinary lines) for the linear model (the first plot) and separating nonlinear hypersurfaces (in this case curves) for the RBFSVM models with different value of parameter γ from γ=0.01 to γ=10. Misclass is he number of incorrectly classified cases. Too big value of γ will result in overdetermined models strongly dependent on the particular training data.
Example 2 – Classical Robust and SVMε regression
The parameter ε sets the width of an acceptable band around the regression model. Decreasing this parameter at a constant value of γ will increase robustness of the model against outlying values with respect to the regression model f(x). In SVMregression, the data points outside the intervalare considered outliers. With decreasing ε, we can thus obtain models in a certain sense similar to robust regression (like regression Mestimates) which may be used to detect outliers and to filter contaminated data. The following plots illustrate behavior of classical regression and SVM regression with varying ε and γ. SVM tries to “squeeze” as much data as possible into f(x) + ε. The sufficiently low parameter γ prevents the model to “go through all points”, as is (nearly) the case on the plot (J) below.
Example 3 – Unsupervised learning, distribution density, influence of γ and ν
The following table of plots illustrates the influence of γ (can be viewed as “stiffness”) and ν (ratio of the “discarded” part of the distribution), roughly said  the model will describe 100(1 ν)% of the distribution with highest density. Observe the following plots to understand the role of the two parameters.


Last Updated ( 03.06.2013 ) 
< Prev  Next > 
