Question? Leave a message!

Machine Learning (ML) and Knowledge Discovery in Databases (KDD)

Machine Learning (ML) and Knowledge Discovery in Databases (KDD) 11
Machine Learning (ML) and Knowledge Knowledge Discovery Discovery in in Databases Databases (KDD) (KDD) Instructor: Rich MaclinCourse Information • Class web page: – Syllabus – Lecture notes – Useful links – Programming assignments • Methods for contact: – Email: (best option) – Offi Office: 315 315 HH HH – Phone: 7268256 • Textbooks: – Machine Learning, Mitchell • Notes based on Mitchell’s Lecture Notes CS 8751 ML KDD Chapter 1 Introduction 2Course Objectives • Specific knowledge of the fields of Machine Learningg g and Knowledge Discovery y in Databases (Data Mining) – Experience with a variety of algorithms – Experience with experimental methodology • Indepth knowledge of several research papers • Programming/implementation practice • Presentation practice CS 8751 ML KDD Chapter 1 Introduction 3Course Components • 2 Midterms, 1 Final – Midterm 1 (150), February 18 – Midterm 2 (150), April 1 – Final (300), Thursday, May 14, 14:0015:55 (comprehensive) • Programming Programming assignments assignments (100), (100), 3 3 (C (C++ or or Java, Java, maybe maybe in in Weka) • Homework assignments (100), 5 • Research Paper Implementation (100) • Research Paper Writeup – Web Page (50) • R Research h P Paper O Oral l P Present tat ti ion (50) (50) • Grading based on percentage (90 gets an A, 80 B) – Minimum Minimum E Effort ffort Requirement Requirement CS 8751 ML KDD Chapter 1 Introduction 4Course Outline • Introduction Mitchell Chapter 1 – Basics/Version Spp aces M2 – ML Terminology and Statistics M5 • Concept/Classification Learning – Decision Trees M3 – Neural Networks M4 – Instance Based Learning M8 – Genetic Algorithms M9 – R Rul le L Learni ing M10 M10 CS 8751 ML KDD Chapter 1 Introduction 5Course Outline (cont) • Unsupervised Learning – Clustering g Jain et al. review p pap per • Reinforcement Learning M13 • Learningy g Theory – Bayesian Methods M6, Russell Norvig Chapter – PAC Learning M7 • Support Vector Methods Burges tutorial •Hy ybrid Methods M12 • Ensembles Opitz Maclin paper, WF7.4 • Mining g Association Rules Appriori ppapper CS 8751 ML KDD Chapter 1 Introduction 6What is Learning Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or taskkd s drawn ffrom thhe same popullatiion more effffectiivelly thhe next time. Simon, 1983 Learning is making useful changes in our minds. Minsky, 1985 Learning is constructing or modifying representations of what is being experienced. McCarthy, 1968 Learning is improving automatically with experience. Mitchell Mitchell, 1997 1997 CS 8751 ML KDD Chapter 1 Introduction 7Why Machine Learning • Data, Data, DATA – Examples • World wide web • Human genome project • Business data (WalMart sales “baskets”) – Idea: sift heap of data for nuggets of knowledge • Some tasks beyond programming –El Example: ddriiviing – Idea: learn by doing/watching/practicing (like humans) • Customizing Customizing software software – Example: web browsing for news information – Idea: observe user tendencies and incorporate CS 8751 ML KDD Chapter 1 Introduction 8Typical Data Analysis Task Data: PatientId=103: EmergencyCSection=yes Age=23, Time=53, FirstPreg=no, Anemia=no, Diabetes=no, PrevPremBirth=no, Ult UltraS Sound d= , El Electi tiveC CSt Sectiion= Age=23, Time=105, FirstPreg=no, Anemia=no, Diabetes=yes, PrevPremBirth=no, UltraSound=abnormal, ElectiveCSection=no Age=23, Time=125, FirstPreg=no, Anemia=no, Diabetes=yes, PrevPremBirth=no, Ult UltraS Sound d= , El Electi tiveC CSSt ectiion=no PatientId=231: EmergencyCSection=no Age=31, Time=30, FirstPreg=yes, Anemia=no, Diabetes=no, PrevPremBirth=no, UltraSound=, ElectiveCSection= Age=31, Time=91, FirstPreg=yes, Anemia=no, Diabetes=no, PrevPremBirth=no, UltraSound=normal, ElectiveCSection=no … Given – 9714 patient records, each describing a pregnancy and a birth – Each patient record contains 215 features (some are unknown) Learn to predict: – Characteristics Characteristics of of patients patients at at high high risk risk for for Emergency Emergency C CSection Section CS 8751 ML KDD Chapter 1 Introduction 9Credit Risk Analysis Data: ProfitableCustomer=No, CustId=103, YearsCredit=9, LoanBalance=2400, Income=52,000, OwnHouse=Yes, OtherDelinqAccts=2, MaxBillingCyclesLate=3 ProfitableCustomer=Yes, CustId=231, YearsCredit=3, LoanBalance=500, Income=36,000, OwnHouse=No, OtherDelinqAccts=0, MaxBillingy gCyclesLate=1 ProfitableCustomer=Yes, CustId=42, YearsCredit=15, LoanBalance=0, Income=90,000, OwnHouse=Yes, OtherDelinqAccts=0, MaxBillingCyclesLate=0 … Rules that might be learned from data: IF OtherDelinquentAccounts 2, AND NumberDelinqquentBillinggCyycles 1 THEN ProfitableCustomer = No Deny Credit Application IF OtherDelinquentAccounts == 0, AND ((Income 30K) OR (YearsofCredit 3)) THEN THEN Profitable ProfitableCCusstomer tomer = Yes Yes Accept Accept Application Application CS 8751 ML KDD Chapter 1 Introduction 10Analysis/Prediction Problems • What kind of direct mail customers buy • What What products products will/won will/won’t t customers customers buy buy • What changes will cause a customer to leave a bank bank • What are the characteristics of a gene •Does a piii cture contain an objbject (d(does a piicture off space contain a metereorite especially one hd headiing ttowardds us)) •… Lots more CS 8751 ML KDD Chapter 1 Introduction 11Tasks too Hard to Program ALVINN Pomerleau drives 70 70 MPH MPH on h hi igh hways CS 8751 ML KDD Chapter 1 Introduction 12Software that Customizes to User CS 8751 ML KDD Chapter 1 Introduction 13Defining a Learning Problem Learning = improving with experience at some task – improve over task T – with respect to performance measure P – based on experience E Ex Ex 1: 1: Learn Learn to to play play checkers checkers T: play checkers P: of g games won E: opportunity to play self Ex 2: Sell more CDs T: sell CDs P: of CDs sold E E: : different different locations/prices locations/prices of of CD CD CS 8751 ML KDD Chapter 1 Introduction 14Key Questions T: play checkers, sell CDs P: P: g games ames won, won, CDs CDs sold sold To generate machine learner need to know: – What experience • Direct or indirect • Learner controlled • Is the experience representative – What exactly should be learned – How to represent the learning function – What algorithm used to learn the learning function CS 8751 ML KDD Chapter 1 Introduction 15Types of Training Experience Direct or indirect Di Direct t obb bservablle, measurablble – sometimes difficult to obtain • Checkers is a move the best move for a situation – sometimes straightforward • Sell CDs how many CDs sold on a day (look at receipts) Indirect must be inferred from what is measurable – Checkers value moves based on outcome of game – Credit Credit assignment assignment p problem roblem CS 8751 ML KDD Chapter 1 Introduction 16Types of Training Experience (cont) Who controls – Learner what is best move at each p point (Exploitation/Exploration) – Teacher is teacher’s move the best (Do we want to j just st em emulate late the the teachers teachers mo moves) es) BIG Question: is experience representative of performance goal – If Checkers learner only plays itself will it be able to pllh ay humans – What if results from CD seller influenced by factors not measured ( (holiday y shopp pping, g, weather,, etc.) ) CS 8751 ML KDD Chapter 1 Introduction 17Choosing Target Function Checkers what does learner do make moves ChooseMove select move based on board ChooseMove : Board→ Move V : Board→ℜ ChooseMove(b) ChooseMove(b): : from from b b pick pick move move with with highest highest value value But how do we define V(b) for boards b Possible definition: V(b) = 100 if b is a final board state of a win V(b) = 100 if b is a final board state of a loss V(() b) = 0 if b is a final board state of a draw if b not final state, V(b) =V(b´) where b´ is best final board reached by starting at b and playing optimally from there Correct, but not operational CS 8751 ML KDD Chapter 1 Introduction 18Representation of Target Function • Collection of rules IF double jjp ump available THEN make double jump • • Neural Neural network network • Polynomial function of problem features w+ w blackPieces(b)+ w redPieces(b)+ 0 1 2 w w blackKings blackKings( (b b) )++ w w redKings redKings( (b b) )++ 3 3 4 4 w redThreatened(b)+ w blackThreatened(b) 5 6 CS 8751 ML KDD Chapter 1 Introduction 19Obtaining Training Examples V (b) : the true target function ˆ V (b) : the learned function V V ( (b b) ) : : the the training training value value train One rule for estimating training values: ˆ V V ( (b b) )←←V V ( (Successor Successor( (b b)) )) train CS 8751 ML KDD Chapter 1 Introduction 20Choose Weight Tuning Rule LMS Weight update rule: Do Do repeatedly repeatedly: : Select a training example b at random 1 1. Compute Compute error error( (b b) ) : : ˆ error(b)= V (b)−V (b) train 2. For each board feature f , update weight w : i i w← w+ c× f×error(b) i i i c is some small constant, say 0.1, to moderate rate of learning CS 8751 ML KDD Chapter 1 Introduction 21Design Choices Determining Type of Training Experience GGi ames againstt expertt TTb ablle off correctt moves Games against self Determining Target Function Board Move Board Value Determining Representation of Learned Function Linear function of features Neural Network Determining Learningg g Algorithm Gradient Descent Linear Programming Completed Design CS 8751 ML KDD Chapter 1 Introduction 22Some Areas of Machine Learning • Inductive Learning: inferring new knowledge from observations (not guaranteed correct) – Concept/Classification Learning identify characteristics of class members (e.g., what makes a CS class fun, what makes a customer buy, etc.) – Unsupervised Learning examine data to infer new characteristics (e.g., break chemicals into similar groups, groups, i infer nfer new new m mathematical athematical rule, rule, etc.) etc.) – Reinforcement Learning learn appropriate moves to achieve delayed goal (e.g., win a game of Checkers, perform perform a a robot robot task task, etc etc.) ) • Deductive Learning: recombine existing knowledgy ge to more effectively solve p problems CS 8751 ML KDD Chapter 1 Introduction 23Classification/Concept Learning • What characteristic(s) predict a smile – Variation Variation on on Sesame Sesame Street Street game: game: why why are are these these things things a a lot lot like like the others (or not) • ML Approach: infer model (characteristics that indicate) of why a face is/is not smiling CS 8751 ML KDD Chapter 1 Introduction 24Unsupervised Learning • Clustering g gp group ppoints into “classes” • Other ideas: – look for mathematical relationships between features – l look k f for anomali lies i in d dat ta b bases (d (dat ta th that t d does not t fit) fit) CS 8751 ML KDD Chapter 1 Introduction 25Reinforcement Learning Problem Policy S start G G G goal Possible actions: up left S down right • • Problem: Problem: feedback feedback (reinforcements) (reinforcements) are are delayed delayed how how to to value intermediate (no goal states) • Idea: online dynamic programming to produce policy function • Policy: action taken leads to highest future reinforcement (if (if policy policy followed) followed) CS 8751 ML KDD Chapter 1 Introduction 26Analytical Learning Problem Backtrack S1 S2 S3 Init S4 S5 S6 Goal S7 S8 S9 S0 • During search processes (planning, etc.) remember work involved in solving tough problems • Reuse the acquired knowledge when presented with similar problems in the future (avoid bad decisions) CS 8751 ML KDD Chapter 1 Introduction 27The Present in Machine Learning The tip of the iceberg: • • First Firstgeneration generation algorithms: algorithms: neural neural nets nets, decision decision trees, regression, support vector machines, kl kernel met thhod ds, Bi Bayesian nettworkks,… • Composite algorithms ensembles • Significant work on assessing effectiveness, limits • Applied Applied t to o s simple imple d data ata b bases ases • Budding industry (especially in data mining) CS 8751 ML KDD Chapter 1 Introduction 28The Future of Machine Learning Lots of areas of impact: • Learn Learn across across multiple multiple data data bases bases, a as s w well ell a as s w web eb and news feeds • • Learn Learn across across multi multimedia media data data • Cumulative, lifelong learning •Ai Agents with h l learniing embbeddddedd • Programming languages with learning embedded • Learning by active experimentation CS 8751 ML KDD Chapter 1 Introduction 29What is Knowledge Discovery in Databases Databases ( (i i.e e., Data Data Mining) Mining) • Depends on who you ask • General General idea: idea: the the analysis analysis of of large large amounts amounts of of data data (and therefore efficiency is an issue) • Interfaces several areas,,y notably machine learning g and database systems • Lots of pp perspectives: – ML: learning where efficiency matters – DBMS: extended techniques for analysis of raw data, automatiid c productiion off kknowlleddge • What is all the hubbub – C Compani ies mak ke l lot ts of f money with ith it it ( (e.g., W WalM lMart) t) CS 8751 ML KDD Chapter 1 Introduction 30Related Disciplines • Artificial Intelligence • Statistics • Psychology and neurobiology • Bioinformatics and Medical Informatics • Philosophy • Computational complexity theory • Control theory • Information theory • Database Systems • ... CS 8751 ML KDD Chapter 1 Introduction 31Issues in Machine Learning • What algorithms can approximate functions well (and when) • How does number of training examples influence accuracy •Hd How does compl lexitity off hhypoththesiis representtatition impact it • How How does does noisy noisy data data influence influence accuracy accuracy • What are the theoretical limits of learnability • How How can can prior prior knowledge knowledge o of f l learner earner help help • What clues can we get from biological learning systems CS 8751 ML KDD Chapter 1 Introduction 32
Website URL