Supervised and Unsupervised Learning

comparison between supervised and unsupervised learning and supervised and unsupervised learning in data mining
JadenNorton Profile Pic
JadenNorton,United States,Researcher
Published Date:14-07-2017
Your Website URL(Optional)
Comment
Supervised
and
Unsupervised
 Learning
 Ciro
Donalek
 Ay/Bi
199
–
April
2011
Summary
 •  KDD
and
Data
Mining
Tasks
 •  Finding
the
op?mal
approach
 •  Supervised
Models
 –  Neural
Networks
 –  Mul?
Layer
Perceptron
 –  Decision
Trees
 •  Unsupervised
Models
 –  Different
Types
of
Clustering
 –  Distances
and
Normaliza?on
 –  Kmeans
 –  Self
Organizing
Maps
 •  Combining
different
models
 –  CommiOee
Machines
 –  Introducing
a
Priori
Knowledge
 –  Sleeping
Expert
Framework

Knowledge
Discovery
in
Databases
 •  KDD
may
be
defined
as:
"The
non
trivial
process
of
 iden2fying
valid,
novel,
poten2ally
useful,
and
 ul2mately
understandable
pa9erns
in
data".
 •  KDD
is
an
interac?ve
and
itera?ve
process
involving
 several
steps.
You
got
your
data:
what’s
next?
 What
kind
of
analysis
do
you
need?
Which
model
is
more
appropriate
for
it?
…
Clean
your
data
 •  Data
preprocessing
transforms
the
raw
data
 into
a
format
that
will
be
more
easily
and
 effec?vely
processed
for
the
purpose
of
the
 user.
 •  Some
tasks
 •  sampling:
selects
a
representa?ve
subset
 from
a
large
popula?on
of
data;

 Use
standard
 •  Noise
treatment

 formats
 •  strategies
to
handle
missing
data:
some?mes
 your
rows
will
be
incomplete,
not
all
 parameters
are
measured
for
all
samples.

 •  normaliza2on

 •  feature
extrac2on:
pulls
out
specified
data
 that
is
significant
in
some
par?cular
context.

Missing
Data
 •  Missing
data
are
a
part
of
almost
all
research,
and
we
all
have
to
 decide
how
to
deal
with
it.
 •  Complete
Case
Analysis:
use
only
rows
with
all
the
values
 •  Available
Case
Analysis
 •  Subs?tu?on
 –  Mean
Value:
replace
the
missing
value
with
the

 mean
value
for
that
par?cular
aOribute
 –  Regression
Subs?tu?on:
we
can
replace
the

 missing
value
with
historical
value
from
similar
cases
 –  Matching
Imputa?on:
for
each
unit
with
a
missing
y,

 find
a
unit
with
similar
values
of
x
in
the
observed

 data
and
take
its
y
value
 –  Maximum
Likelihood,
EM,
etc
 •  Some
DM
models
can
deal
with
missing
data
beOer
than
others.
 •  Which
technique
to
adopt
really
depends
on
your
data
Data
Mining
 •  Crucial
task
within
the
KDD
 •  Data
Mining
is
about
automa?ng
the
process
of
 searching
for
paOerns
in
the
data.
 •  More
in
details,
the
most
relevant
DM
tasks
are:
 – associa?on
 – sequence
or
path
analysis
 – clustering
 – classificaDon
 – regression
 – visualiza?on
Finding
SoluDon
via
Purposes
 •  You
have
your
data,
what
kind
of
analysis
do
you
need?
 •  Regression
 – predict
new
values
based
on
the
past,
inference
 – compute
the
new
values
for
a
dependent
variable
based
on
the
 values
of
one
or
more
measured
aOributes
 •  Classifica?on:
 – divide
samples
in
classes
 – use
a
trained
set
of
previously
labeled
data
 •  Clustering
 – par??oning
of
a
data
set
into
subsets
(clusters)
so
that
data
in
 each
subset
ideally
share
some
common
characteris?cs
 •  Classifica?on
is
in
a
some
way
similar
to
the
clustering,
but
requires
 that
the
analyst
know
ahead
of
?me
how
classes
are
defined.
Cluster
Analysis
 How
many
clusters
do
you
expect?
Search
for
Outliers
ClassificaDon
 •  Data
mining
technique
used
to
predict
group
membership
for
 data
instances.
There
are
two
ways
to
assign
a
new
value
to
a
 given
class.
 •  Crispy
classificaDon
 – given
an
input,
the
classifier
returns
its
label
 •  ProbabilisDc
classificaDon
 – given
an
input,
the
classifier
returns
its
probabili?es
to
belong
to
 each
class
 – useful
when
some
mistakes
can
be
more

 costly
than
others
(give
me
only
data
90%)
 – winner
take
all
and
other
rules
 •  assign
the
object
to
the
class
with
the
 highest
probability
(WTA)
 •  …but
only
if
its
probability
is
greater
than
40%

 (WTA
with
thresholds)
Regression
/
ForecasDng
 •  Data
table
sta?s?cal
correla?on
 – mapping
without
any
prior
assump?on
on
the
func?onal
 form
of
the
data
distribu?on;
 – machine
learning
algorithms
well
suited
for
this.
 •  Curve
figng
 – find
a
well
defined
and
known

 func?on
underlying
your
data;
 – theory
/
exper?se
can
help.
Machine
Learning
 •  To
learn:
to
get
knowledge
of
by
study,
experience,
 or
being
taught.
 •  Types
of
Learning
 •  Supervised
 •  Unsupervised
Unsupervised
Learning
 •  The
model
is
not
provided
with
the
correct
results
 during
the
training.
 •  Can
be
used
to
cluster
the
input
data
in
classes
on
 the
basis
of
their
sta?s?cal
proper?es
only.
 •  Cluster
significance
and
labeling.
 •  The
labeling
can
be
carried
out
even
if
the
labels
are
 only
available
for
a
small
number
of
objects
 representa?ve
of
the
desired
classes.