Machine learning has redefined the data science world. Today, we have a huge number of tools and techniques to classify and cluster the data. One of the important classification data technique is, the support Vector machine. The support Vector machine is the supervised learning method and used in classification and regression analysis of the data. This method was developed by Bell Labs.
The fundamental question is what are support vectors?
- The support vectors are data points close to the hyperplanes. These data points influence the position and orientation of the hyperplanes.
- Removing support vectors alter the shape of the hyperplanes.
Obviously,the next question is what is a hyperplane? .
These are the decision boundaries classifying data points.
classes are determined by data points on either side of the hyperplanes.
The number of features determine the dimensions of the hyperplanes. If the number of features is two, then, the nature of the hyperplane is line. If the number of features or inputs are three then nature of the hyperplane is two-dimensional plane, but when the number of features or input exceeds,then nature of the plane is difficult to imagine.
The second important question that a data scientist has to answer is what is the margin in the support Vector machine?
When the data points are very close. It is very difficult to classify the objects on the support vectors. Therefore the distance between the vectors on the data points is measured. the higher the data points distance with a vector then better is the support Vector machine is. Maximization of the distance between the nearest data points is called as margin. Once we have the knowledge about a hyperplane. Let us discuss how support vectors are prepared. The first step is each data point is plotted in the n dimensional space where n is the number of features of inputs.
second create a hyperplane. The biggest challenge is how do we select the right hyperplane? We come across many situations in machine learning. The first one is choosing the right hyperplane. The thumb rule says that select the hyperplane that segregates the two classes better, but there may be a situation where multiple hyperplanes do exist in this situation. The margins between the data points and Vector has to be taken into the consideration.
There may be a situation where hyper planes are having outliers. Some of the values are outside the dimensions and one has to eliminate such outliers before taking the decisions.
what are the advantages of support Vector machines.
one the support Vector machines are sufficient in high dimensional spaces. They work. Well, even when a number of dimensional space is greater than the sample size.
The drawbacks of the vector machines are all input data should be labeled and it avoids estimating the probability on the finite data.
What are the applications of support Vector machines?
First the data scientists can do text and Hyper text categorization.
Second, image classification and finally handwritten data record mission.