Machine learning supervised unsupervised

list of supervised machine learning algorithms and machine learning supervised unsupervised reinforcement
Dr.GordenMorse Profile Pic
Dr.GordenMorse,France,Professional
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Big Data Analytics CSCI 4030High dim. Graph Infinite Machine Apps data data data learning Locality Filtering PageRank, Recommen sensitive data SVM SimRank der systems hashing streams Community Web Decision Association Clustering Detection advertising Trees Rules Dimensional Duplicate Spam Queries on Perceptron, ity document Detection streams kNN reduction detection Big Data Analytics CSCI 4030 2 Many algorithms are today classified as machine learning ones.  These algorithms extract information from data.  Produce summary of data, from which decision is made.  Machine learning algorithms learn a model (or classified) from the data.  Discover something about data that will be seen in the future.. Big Data Analytics CSCI 4030 3 E.g., the clustering algorithms allow us to classify future data into one of the clusters.  Machine learning enthusiast call it unsupervised learning.  Unsupervised means that the input data does not tell the clustering algorithm what the clusters should be. Big Data Analytics CSCI 4030 4 In supervised machine learning:  The available data includes information about the correct way to classify at least some of the data.  The data classified already is called the training set.  This is the main subject of today’s lecture. Big Data Analytics CSCI 4030 5 Would like to do prediction: estimate a function f(x) so that y = f(x)  Where y can be: X Y  Real number: Regression  Categorical: Classification X’Y’  Complex object:  Ranking of items etc. Training and test set Estimate y = f(x) on X,Y. Hope that the same f(x)  Data is labeled: also works on unseen X’, Y’  Have many pairs (x, y)  x … vector of binary, categorical, real valued features  y … class (+1, -1, or a real number) Big Data Analytics CSCI 4030 6Big Data Analytics CSCI 4030 7 Plot the height and weight of dogs in three classes: Beagles, Chihuahuas, and Dachshunds.  Each pair (x, y) in the training set consists of:  Feature vector x of the form height, weight.  The associated label y is the variety of the dog.  An example of a training-set pair would be (5 inches, 2 pounds, Chihuahua). Big Data Analytics CSCI 4030 8Big Data Analytics CSCI 4030 9 The horizontal line represents a height of 7 inches and separates Beagles from Chihuahuas and Dachshunds.  The vertical line represents a weight of 3 pounds and separates Chihuahuas from Beagles and Dachshunds. Big Data Analytics CSCI 4030 10 The algorithm that implements function f is:  Is it supervised on unsupervised learning? Big Data Analytics CSCI 4030 11 The algorithm that implements function f is:  Is it supervised on unsupervised learning?  Here, we are performing supervised learning with the same data (weight and height) augmented by classifications (variety) for the training data. Big Data Analytics CSCI 4030 12 y is a real number. ML problem is called regression.  y is a boolean value true-or-false (+1 and −1). The problem is binary classification.  y is a member of some finite set (classes). The problem is multiclass classification.  y is a member of some potentially infinite set. Big Data Analytics CSCI 4030 13 Assume four data points: (1, 2), (2, 1), (3, 4) and (4, 3). Big Data Analytics CSCI 4030 14 Let these points be a training dataset, where the vectors are one-dimensional.  i.e., (1,2) can be thought as a pair (1, 2), where 1 is feature vector x and 2 is the associated label y.  The other points are interpreted, accordingly. Big Data Analytics CSCI 4030 15 Suppose we want to learn the linear function f(x) = ax + b  That best represents the point of the training set.  What is the appropriate value of a and b?  A natural interpretation of best is root-mean- square error (RMSE).  the value of f(x) compared with given value of y. Big Data Analytics CSCI 4030 16 That is we want to minimize RMSE:  This sum is:  Simplifying the sum is: Big Data Analytics CSCI 4030 17 If we then take the derivatives wrt a and b and set them to 0, we get:  Therefore, a = 3/5 and b = 1,  i.e., f(x) = (3/5)x + 1.  For these values the RMSE is 3.2. Big Data Analytics CSCI 4030 18 We will talk about the following methods:  Decision trees  Perceptrons  Support Vector Machines  Neural nets (Neural Networks)  Instance based learning  Main question: How to efficiently train (build a model/find model parameters)? Big Data Analytics CSCI 4030 19 The form of function f is a tree.  Each node of the tree has a function of x that determines to which child or children the search must proceed.  Decision trees are suitable for binary and multiclass classification.  Especially when the dimension of the feature vector is not too large.  Large numbers of features can lead to overfitting. Big Data Analytics CSCI 4030 20