Data analysis Machine learning and knowledge discovery
data knowledge and discovery machine learning meets natural science and machine learning and knowledge discovery in databases
Machine Learning (ML) and
Knowledge Knowledge Discovery Discovery in in Databases Databases (KDD) (KDD)
Instructor: Rich MaclinCourse Information
• Class web page: http://www.d.umn.edu/rmaclin/cs8751/
– Lecture notes
– Useful links
– Programming assignments
• Methods for contact:
– Email: rmaclind.umn.edu (best option)
– Offi Office: 315 315 HH HH
– Phone: 726-8256
– Machine Learning, Mitchell
• Notes based on Mitchell’s Lecture Notes
CS 8751 ML & KDD Chapter 1 Introduction 2Course Objectives
• Specific knowledge of the fields of Machine
Learningg g and Knowledge Discovery y in Databases
– Experience with a variety of algorithms
– Experience with experimental methodology
• In-depth knowledge of several research papers
• Programming/implementation practice
• Presentation practice
CS 8751 ML & KDD Chapter 1 Introduction 3Course Components
• 2 Midterms, 1 Final
– Midterm 1 (150), February 18
– Midterm 2 (150), April 1
– Final (300), Thursday, May 14, 14:00-15:55 (comprehensive)
• Programming Programming assignments assignments (100), (100), 3 3 (C (C++ or or Java, Java, maybe maybe in in
• Homework assignments (100), 5
• Research Paper Implementation (100)
• Research Paper Writeup – Web Page (50)
• R Research h P Paper O Oral l P Present tat ti ion (50) (50)
• Grading based on percentage (90% gets an A-, 80% B-)
– Minimum Minimum E Effort ffort Requirement Requirement
CS 8751 ML & KDD Chapter 1 Introduction 4Course Outline
• Introduction Mitchell Chapter 1
– Basics/Version Spp aces M2
– ML Terminology and Statistics M5
• Concept/Classification Learning
– Decision Trees M3
– Neural Networks M4
– Instance Based Learning M8
– Genetic Algorithms M9
– R Rul le L Learni ing M10 M10
CS 8751 ML & KDD Chapter 1 Introduction 5Course Outline (cont)
• Unsupervised Learning
– Clustering g Jain et al. review p pap per
• Reinforcement Learning M13
• Learningy g Theory
– Bayesian Methods M6, Russell & Norvig Chapter
– PAC Learning M7
• Support Vector Methods Burges tutorial
•Hy ybrid Methods M12
• Ensembles Opitz & Maclin paper, WF7.4
• Mining g Association Rules Appriori ppapper
CS 8751 ML & KDD Chapter 1 Introduction 6What is Learning?
Learning denotes changes in the system that are adaptive in
the sense that they enable the system to do the same task or
taskkd s drawn ffrom thhe same popullatiion more effffectiivelly thhe
next time. Simon, 1983
Learning is making useful changes in our minds. Minsky,
Learning is constructing or modifying representations of
what is being experienced. McCarthy, 1968
Learning is improving automatically with experience.
Mitchell Mitchell, 1997 1997
CS 8751 ML & KDD Chapter 1 Introduction 7Why Machine Learning?
• Data, Data, DATA
• World wide web
• Human genome project
• Business data (WalMart sales “baskets”)
– Idea: sift heap of data for nuggets of knowledge
• Some tasks beyond programming
–El Example: ddriiviing
– Idea: learn by doing/watching/practicing (like humans)
• Customizing Customizing software software
– Example: web browsing for news information
– Idea: observe user tendencies and incorporate
CS 8751 ML & KDD Chapter 1 Introduction 8Typical Data Analysis Task
Age=23, Time=53, FirstPreg?=no, Anemia=no, Diabetes=no, PrevPremBirth=no,
Ult UltraS Sound d=? ?, El Electi tiveC C-St Sectiion=??
Age=23, Time=105, FirstPreg?=no, Anemia=no, Diabetes=yes, PrevPremBirth=no,
Age=23, Time=125, FirstPreg?=no, Anemia=no, Diabetes=yes, PrevPremBirth=no,
Ult UltraS Sound d=? ?, El Electi tiveC C-SSt ectiion=no
Age=31, Time=30, FirstPreg?=yes, Anemia=no, Diabetes=no, PrevPremBirth=no,
Age=31, Time=91, FirstPreg?=yes, Anemia=no, Diabetes=no, PrevPremBirth=no,
– 9714 patient records, each describing a pregnancy and a birth
– Each patient record contains 215 features (some are unknown)
Learn to predict:
– Characteristics Characteristics of of patients patients at at high high risk risk for for Emergency Emergency C C-Section Section
CS 8751 ML & KDD Chapter 1 Introduction 9Credit Risk Analysis
ProfitableCustomer=No, CustId=103, YearsCredit=9, LoanBalance=2400,
Income=52,000, OwnHouse=Yes, OtherDelinqAccts=2,
ProfitableCustomer=Yes, CustId=231, YearsCredit=3, LoanBalance=500,
Income=36,000, OwnHouse=No, OtherDelinqAccts=0,
ProfitableCustomer=Yes, CustId=42, YearsCredit=15, LoanBalance=0,
Income=90,000, OwnHouse=Yes, OtherDelinqAccts=0,
Rules that might be learned from data:
IF Other-Delinquent-Accounts 2, AND
THEN Profitable-Customer? = No Deny Credit Application
IF Other-Delinquent-Accounts == 0, AND
((Income 30K) OR (Years-of-Credit 3))
THEN THEN Profitable Profitable-CCusstomer? tomer? = Yes Yes Accept Accept Application Application
CS 8751 ML & KDD Chapter 1 Introduction 10Analysis/Prediction Problems
• What kind of direct mail customers buy?
• What What products products will/won will/won’t t customers customers buy? buy?
• What changes will cause a customer to leave a
• What are the characteristics of a gene?
•Does a piii cture contain an objbject (d(does a piicture off
space contain a metereorite especially one
hd headiing ttowardds us)?)?
•… Lots more
CS 8751 ML & KDD Chapter 1 Introduction 11Tasks too Hard to Program
ALVINN Pomerleau drives
70 70 MPH MPH on h hi igh hways
CS 8751 ML & KDD Chapter 1 Introduction 12Software that Customizes to User
CS 8751 ML & KDD Chapter 1 Introduction 13Defining a Learning Problem
Learning = improving with experience at some task
– improve over task T
– with respect to performance measure P
– based on experience E
Ex Ex 1: 1: Learn Learn to to play play checkers checkers
T: play checkers
P: % of g games won
E: opportunity to play self
Ex 2: Sell more CDs
T: sell CDs
P: of CDs sold
E E: : different different locations/prices locations/prices of of CD CD
CS 8751 ML & KDD Chapter 1 Introduction 14Key Questions
T: play checkers, sell CDs
P: P: % % g games ames won, won, CDs CDs sold sold
To generate machine learner need to know:
– What experience?
• Direct or indirect?
• Learner controlled?
• Is the experience representative?
– What exactly should be learned?
– How to represent the learning function?
– What algorithm used to learn the learning function?
CS 8751 ML & KDD Chapter 1 Introduction 15Types of Training Experience
Direct or indirect?
Di Direct t -obb bservablle, measurablble
– sometimes difficult to obtain
• Checkers - is a move the best move for a situation?
– sometimes straightforward
• Sell CDs - how many CDs sold on a day? (look at receipts)
Indirect - must be inferred from what is measurable
– Checkers - value moves based on outcome of game
– Credit Credit assignment assignment p problem roblem
CS 8751 ML & KDD Chapter 1 Introduction 16Types of Training Experience (cont)
– Learner - what is best move at each p point?
– Teacher - is teacher’s move the best? (Do we want to
j just st em emulate late the the teachers teachers mo moves??) es??)
BIG Question: is experience representative of
– If Checkers learner only plays itself will it be able to
pllh ay humans??
– What if results from CD seller influenced by factors not
measured ( (holiday y shopp pping, g, weather,, etc.) )?
CS 8751 ML & KDD Chapter 1 Introduction 17Choosing Target Function
Checkers - what does learner do - make moves
ChooseMove - select move based on board
ChooseMove : Board→ Move
V : Board→ℜ
ChooseMove(b) ChooseMove(b): : from from b b pick pick move move with with highest highest value value
But how do we define V(b) for boards b?
V(b) = 100 if b is a final board state of a win
V(b) = -100 if b is a final board state of a loss
V(() b) = 0 if b is a final board state of a draw
if b not final state, V(b) =V(b´) where b´ is best final board
reached by starting at b and playing optimally from there
Correct, but not operational
CS 8751 ML & KDD Chapter 1 Introduction 18Representation of Target Function
• Collection of rules?
IF double jjp ump available THEN
make double jump
• • Neural Neural network? network?
• Polynomial function of problem features?
w+ w blackPieces(b)+ w redPieces(b)+
0 1 2
w w blackKings blackKings( (b b) )++ w w redKings redKings( (b b) )++
3 3 4 4
w redThreatened(b)+ w blackThreatened(b)
CS 8751 ML & KDD Chapter 1 Introduction 19Obtaining Training Examples
V (b) : the true target function
V (b) : the learned function
V V ( (b b) ) : : the the training training value value
One rule for estimating training values:
V V ( (b b) )←←V V ( (Successor Successor( (b b)) ))
CS 8751 ML & KDD Chapter 1 Introduction 20