digital image processing applications lecture notes and digital speech processing lecture notes. digital speech processing using matlab
SUBJECT: DIGITAL IMAGE AND SPEECH PROCESSING
SUBJECT CODE: ECS-702, BRANCH: EL&TCE
Module I (12 hours)
1. Different stages of Image processing & Analysis Scheme. Components of
Image Processing System, Multiprocessor Interconnections.
2. A Review of various Mathematical Transforms.
3. Image Formation: Geometric Model, Photometric Model.
4. Image Digitization : A review of Sampling and quantization processes. A
Module II (12 Hours)
5. Image Enhancement: Contrast Intensification, Smoothing, Image
6. Restoration : Minimum Mean Square Error Restoration by Homomorphic
7. Image Compression : Schematic diagram of Data Compression Procedure,
Lossless compression coding.
8. Multivalued Image Processing, Multispectral Image Processing, Processing
of color images.
Module III (8 Hours)
Digital Speech Processing
1. The Fundamentals of Digital Speech Processing.
A Review of Discrete-Time Signal & Systems , the Z-transform, the DFT,
Fundamental of Digital Filters, FIR system, IIR Systems.
2. Time Domain Methods for Speech Processing.
Time-Dependent Processing of speech, short-time energy and Average
Magnitude, Short time Average Zero- Crossing Rate.
3. Digital Representation of speech Waveform
Sampling speech signals,statistical model,Instantaneous quantization,
Instantaneous companding, quantization for optimum SNR,Adaptive
quantization,Feed-forward Feedback adaptions.
Module IV (8 Hours)
Linear Predictive Coding of Speech
Block diagram of Simplified Model for Speech Production. Basic Principles of
Linear Predictive Analysis- The Auto Correlation Method. The Prediction Error
Signal. Digital Speech Processing for Man-Machine Communication by voice.
Speaker Recognition Systems- Speaker verification and Speaker Identification
The digital image processing deals with developing a digital system that
performs operations on a digital image.
An image is nothing more than a two dimensional signal. It is defined by the
mathematical function f(x,y) where x and y are the two co-ordinates horizontally and
vertically and the amplitude of f at any pair of coordinate (x, y) is called the
intensity or gray level of the image at that point.
When x, y and the amplitude values of f are all finite discrete quantities, we call the
image a digital image. The field of image digital image processing refers to the
processing of digital image by means of a digital computer.
A digital image is composed of a finite number of elements, each of which has a
particular location and values of these elements are referred to as picture elements,
image elements and pixels.
Motivation and Perspective
Digital image processing deals with manipulation of digital images through a
digital computer. It is a subfield of signals and systems but focus particularly on
images. DIP focuses on developing a computer system that is able to perform
processing on an image. The input of that system is a digital image and the system
process that image using efficient algorithms, and gives an image as an output. The
most common example is Adobe Photoshop. It is one of the widely used
applications for processing digital images.
Some of the major fields in which digital image processing is widely used are
1. Gamma Ray Imaging- Nuclear medicine and astronomical observations.
2. X-Ray imaging X-rays of body.
3. Ultraviolet Band Lithography, industrial inspection, microscopy, lasers.
4. Visual And Infrared Band Remote sensing.
5. Microwave Band Radar imaging. Components of Image Processing System
i) Image Sensors
With reference to sensing, two elements are required to acquire digital
image. The first is a physical device that is sensitive to the energy
radiated by the object we wish to image and second is specialized
image processing hardware.
ii) Specialize image processing hardware
It consists of the digitizer just mentioned, plus hardware that performs other
primitive operations such as an arithmetic logic unit, which performs
arithmetic such addition and subtraction and logical operations in parallel on
It is a general purpose computer and can range from a PC to a
supercomputer depending on the application. In dedicated applications,
sometimes specially designed computer are used to achieve a required level of
It consist of specialized modules that perform specific tasks a well designed
package also includes capability for the user to write code, as a minimum,
utilizes the specialized module. More sophisticated software packages allow
the integration of these modules.
v) Mass storage
This capability is a must in image processing applications. An image of size
1024 x1024 pixels, in which the intensity of each pixel is an 8- bit quantity
requires one megabytes of storage space if the image is not compressed.
Image processing applications falls into three principal categories of storage
i) Short term storage for use during processing
ii) On line storage for relatively fast retrieval
iii) Archival storage such as magnetic tapes and disks vi) Image displays
Image displays in use today are mainly color TV monitors. These monitors are
driven by the outputs of image and graphics displays cards that are an
integral part of computer system
vii) Hardcopy devices
The devices for recording image includes laser printers, film cameras, heat
sensitive devices inkjet units and digital units such as optical and CD ROM disk.
Films provide the highest possible resolution, but paper is the obvious medium
of choice for written applications.
It is almost a default function in any computer system in use today because of
the large amount of data inherent in image processing applications. The key
consideration in image transmission bandwidth.
Elements of Visual Perception
Structure of the human Eye
The eye is nearly a sphere with average approximately 20 mm diameter. The eye is
enclosed with three membranes
a) The cornea and sclera: it is a tough, transparent tissue that covers the anterior
surface of the eye. Rest of the optic globe is covered by the sclera
b) The choroid: It contains a network of blood vessels that serve as the major source
of nutrition to the eyes. It helps to reduce extraneous light entering in the eye
It has two parts
(1) Iris Diaphragms- it contracts or expands to control the amount of light that
enters the eyes.
(2) Ciliary body
c) Retina it is innermost membrane of the eye. When the eye is properly focused, light
from an object outside the eye is imaged on the retina. There are various light
receptors over the surface of the retina
The two major classes of the receptors are-
1) cones- it is in the number about 6 to 7 million. These are located in the
central portion of the retina called the fovea. These are highly sensitive to
color. Human can resolve fine details with these cones because each one is
connected to its own nerve end. Cone vision is called photopic or bright
2) Rods these are very much in number from 75 to 150 million and are
distributed over the entire retinal surface. The large area of distribution and
the fact that several roads are connected to a single nerve give a general overall
picture of the field of view.They are not involved in the color vision and
are sensitive to low level of illumination. Rod vision is called is scotopic or
dim light vision.
The absent of reciprocators is called blind spot
Image Formation in the Eye
The major difference between the lens of the eye and an ordinary optical lens in that the
former is flexible.
The shape of the lens of the eye is controlled by tension in the fiber of the ciliary body. To
focus on the distant object the controlling muscles allow the lens to become thicker in order
to focus on object near the eye it becomes relatively flattened.
The distance between the center of the lens and the retina is called the focal length and it
varies from 17mm to 14mm as the refractive power of the lens increases from its minimum to
When the eye focuses on an object farther away than about 3m.the lens exhibits its lowest
refractive power. When the eye focuses on a nearly object. The lens is most strongly
The retinal image is reflected primarily in the area of the fovea. Perception then takes
place by the relative excitation of light receptors, which transform radiant energy into
electrical impulses that are ultimately decoded by the brain.
Brightness Adaption and Discrimination
Digital image are displayed as a discrete set of intensities. The range of light intensity
levels to which the human visual system can adopt is enormous- on the order of 10
from scotopic threshold to the glare limit. Experimental evidences indicate that subjective
brightness is a logarithmic function of the light intensity incident on the eye.
The curve represents the range of intensities to which the visual system can adopt. But the
visual system cannot operate over such a dynamic range simultaneously. Rather, it is
accomplished by change in its overcall sensitivity called brightness adaptation.
For any given set of conditions, the current sensitivity level to which of the visual system
is called brightness adoption level , B in the curve. The small intersecting curve
represents the range of subjective brightness that the eye can perceive when adapted to this
level. It is restricted at level B , at and below which all stimuli are perceived as
indistinguishable blacks. The upper portion of the curve is not actually restricted. whole
simply raise the adaptation level higher than B .
The ability of the eye to discriminate between change in light intensity at any specific
adaptation level is also of considerable interest.
Take a flat, uniformly illuminated area large enough to occupy the entire field of view of
the subject. It may be a diffuser such as an opaque glass, that is illuminated from behind
by a light source whose intensity, I can be varied. To this field is added an increment of
illumination I in the form of a short duration flash that appears as circle in the center of
the uniformly illuminated field. If I is not bright enough, the subject cannot see any
As I gets stronger the subject may indicate of a perceived change. I is the increment of
illumination discernible 50% of the time with background illumination I. Now, I /I is
called the Weber ratio.
Small value means that small percentage change in intensity is discernible representing
good brightness discrimination.
Large value of Weber ratio means large percentage change in intensity is required
representing poor brightness discrimination .
In this the eye fills the non existing information or wrongly pervious geometrical
properties of objects.
Fundamental Steps in Digital Image Processing
There are two categories of the steps involved in the image processing
1. Methods whose outputs are input are images.
2. Methods whose outputs are attributes extracted from those images.
Color Image Processing Wavelets & Image Morphological Image
Multiresolution Compression Processing
Image Restoration Image Segmentation
Image Enhancement Representation and
Knowledge Base description
Image Acquisition Objects recognition
Fundamental Steps in DIP
i) Image acquisition
It could be as simple as being given an image that is already in digital form. Generally the
image acquisition stage involves processing such as scaling.
ii) Image Enhancement
It is among the simplest and most appealing areas of digital image processing. The idea
behind this is to bring out details that are obscured or simply to highlight certain
features of interest in image. Image enhancement is a very subjective area of image
iii) Image Restoration
It deals with improving the appearance of an image. It is an objective approach, in the
sense that restoration techniques tend to be based on mathematical or probabilistic
models of image processing. Enhancement, on the other hand is based on human
subjective preferences regarding what constitutes a good enhancement result
iv) Color image processing
It is an area that is been gaining importance because of the use of digital images over the
internet. Color image processing deals with basically color models and their implementation
in image processing applications.
v) Wavelets and Multiresolution Processing
These are the foundation for representing image in various degrees of resolution
It deals with techniques reducing the storage required to save an image, or the
bandwidth required to transmit it over the network. It has to major approaches:
a) Lossless Compression
b) Lossy Compression
vii) Morphological processing
It deals with tools for extracting image components that are useful in the representation and
description of shape and boundary of objects. It is majorly used in automated inspection
viii) Representation and Description
It always follows the output of segmentation step that is, raw pixel data, constituting either
the boundary of an image or points in the region itself. In either case converting the data to
a form suitable for computer processing is necessary.
It is the process that assigns label to an object based on its descriptors. It is the last step of
image processing which use artificial intelligence software.
Knowledge about a problem domain is coded into an image processing system in the form
of a knowledge base. This knowledge may be as simple as detailing regions of an image
where the information of the interest in known to be located. Thus limiting search that has
to be conducted in seeking the information. The knowledge base also can be quite complex
such interrelated list of all major possible defects in a materials inspection problems or
an image database containing high resolution satellite images of a region in connection with
change detection application
A Simple Image Model
An image is denoted by a two dimensional function of the form fx, y. The value or
amplitude of f at spatial coordinates x,y is a positive scalar quantity whose physical
meaning is determined by the source of the image. When an image is generated by a
physical process, its values are proportional to energy radiated by a physical source. As a
consequence, f(x,y) must be nonzero and finite; that is 0 f(x,y)
The function f(x,y) may be characterized by two components-
· The amount of the source illumination incident on the scene being viewed.
· The amount of the source illumination reflected back by the objects in the scene
These are called illumination and reflectance components and are denoted by i(x,y) and
r(x,y) respectively. The functions combine as a product to form f(x,y)
We call the intensity of a monochrome image at any coordinate (x,y) the gray level (l) of
the image at that point l= f (x, y) , L d l d L
L is to be positive and L must be finite
L = imin rmin
L = imax rmax
The interval L , L is called gray scale. Common practice is to shift this interval
numerically to the interval 0, L-l where l=0 is considered black and l= L-1 is considered
white on the gray scale. All intermediate values are shades of gray varying from black to
To create a digital image, we need to convert the continuous sensed data into digital from.
This involves two processes sampling and quantization. An image may be continuous
with respect to the x and y coordinates and also in amplitude. To convert it into digital
form we have to sample the function in both coordinates and in amplitudes.
Digitalizing the coordinate values is called sampling
Digitalizing the amplitude values is called quantization
There is a continuous image along the line segment AB.
To sample this function, we take equally spaced samples along line AB. The location of
each samples is given by a vertical tick back (mark) in the bottom part. The samples are
shown as block squares superimposed on function the set of these discrete locations gives
the sampled function.
In order to form a digital image, the gray level values must also be converted (quantized) into
discrete quantities. So we divide the gray level scale into eight discrete levels ranging from
black to white. The vertical tick mark assign the specific value assigned to each of the
eight level values.
The continuous gray levels are quantized simply by assigning one of the eight discrete
gray levels to each sample. The assignment it made depending on the vertical proximity of a
simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces a two
dimensional digital image.
Digital Image Definition
A digital image fm,n described in a 2D discrete space is derived from an analog
image f(x,y) in a 2D continuous space through a sampling process that is frequently
referred to as digitization. Some basic definitions associated with the digital image are
The 2D continuous image f(x,y) is divided into N rows and M columns. The intersection of a
row and a column is termed a pixel. The value assigned to the integer
coordinates m,n with m=0,1,2,..., M-1andn=0,1,2,...,N-1is fm,n. In fact, in most cases
f(x,y) is actually a function of many variables including depth (d), color(µ) and time (t).
There are three types of computerized processes in the processing of image
1) Low level process- these involve primitive operations such as image processing to reduce
noise, contrast enhancement and image sharpening. These kind of processes are
characterized by fact the both inputs and output are images.
2) Mid level image processing - it involves tasks like segmentation, description of those
objects to reduce them to a form suitable for computer processing, and classification of
individual objects. The inputs to the process are generally images but outputs are attributes
extracted from images.
3) High level processing It involves making sense of an ensemble of recognized objects,
as in image analysis, and performing the cognitive functions normally associated with
Representing Digital Images
The result of sampling and quantization is matrix of real numbers. Assume that an image
f(x,y) is sampled so that the resulting digital image has M rows and N Columns. The
values of the coordinates (x,y) now become discrete quantities thus the value of the
coordinates at origin become ( x,y) =(0,0) The next Coordinates value along the first
signify the image along the first row. It does not mean that these are the actual values of
physical coordinates when the image was sampled. Thus the right side of the matrix
represents a digital element, pixel or pel. The matrix can be represented in the following
form as well.
The sampling process may be viewed as partitioning the x-y plane into a grid with the
coordinates of the center of each grid being a pair of elements from the Cartesian
products Z2 which is the set of all ordered pair of elements (Zi, Zj) with Zi and Zj being
integers from Z.
Hence f(x,y) is a digital image if gray level (that is, a real number from the set of real
number R) to each distinct pair of coordinates (x,y). This functional assignment is the
quantization process. If the gray levels are also integers, Z replaces R, and a digital
image become a 2D function whose coordinates and the amplitude value are integers.
Due to processing storage and hardware consideration, the number of gray levels
typically is an integer power of 2. L=2
Then, the number b, of bits required to store a digital image is
B=M N K
When M=N The equation become b=N K
When an image can have 2 gray levels, it is referred to as k- bit . An image with 256
possible gray levels is called an 8-bit image (because 256=2 ). Spatial and Gray Level Resolution
Spatial resolution is the smallest discernible details are an image. Suppose a chart
can be constructed with vertical lines of width w with the space between the also having
width W, so a line pair consists of one such line and its adjacent space thus. The width of
the line pair is 2w and there is 1/2w line pair per unit distance resolution is simply the
smallest number of discernible line pair unit distance.
Gray levels resolution refers to smallest discernible change in gray levels.
Measuring discernible change in gray levels is a highly subjective process reducing the
number of bits R while repairing the spatial resolution constant creates the problem of false
contouring .it is caused by the use of an insufficient number of gray levels on the
smooth areas of the digital image . It is called so because the rides resemble top
graphics contours in a map. It is generally quite visible in image displayed using 16 or
less uniformly spaced gray levels.
Iso Preference Curves
To see the effect of varying N and R simultaneously. There picture are taken having
little, mid level and high level of details.
Different image were generated by varying N and k and observers were then asked to
rank the results according to their subjective quality. Results were summarized in the
form of iso-preference curve in the N-k plane.
The iso-preference curve tends to shift right and upward but their shapes in each of the
three image categories are shown in the figure. A shift up and right in the curve simply
means large values for N and k which implies better picture quality
The result shows that iso-preference curve tends to become more vertical as the detail in the
image increases. The result suggests that for image with a large amount of details only a
few gray levels may be needed. For a fixed value of N, the perceived quality for this type
of image is nearly independent of the number of gray levels used. Pixel Relationships
Neighbors of a pixel
A pixel p at coordinate (x,y) has four horizontal and vertical neighbor whose coordinate
can be given by
(x+1, y) (x-1,y) (x ,y + 1) (x, y-1)
This set of pixel is called the 4-neighbours of p and is denoted by n4(p), Each pixel is at a
unit distance from (x,y) and some of the neighbors of P lie outside the digital image or (x,y)
is on the border of the image .
The four diagonal neighbor of P have coordinates
And are denoted by nd(p) these points, together with the 4-neighbours are called 8
neighbors of P denoted by n8(p)
Let V be the set of gray level values used to define adjacency in a binary image, if V=1
we are referencing to adjacency of pixel with value. Three types of adjacency occurs
4- Adjacency two pixel P and Q with value from V are 4 adjacency if A is in the set n4(P)
8- Adjacency two pixel P and Q with value from V are 8 adjacency if A is in the set n8(P)
M-adjacency two pixel P and Q with value from V are m adjacency if
· Q is in n4 (p) or
· Q is in nd (q) and the set N4(p) È N4(q) has no pixel whose values are from V
For pixel p, q and z with coordinate (x,y), (s,t) and (v,w) respectively D is a distance
function or metric if
D p.q e O Dp.q = O iff p=q D
p.q = D p.q and
D p.q e O Dp.q+D(q,z)
The Eucledean Distance between p and is defined as
De (p,q) = Iy t I
The D4 Education Distance between p and is defined as
De (p,q) = Iy t I
IMAGE ENHENCEMENT IN SPATIAL DOMAIN
The principal objective of enhancement is to process an image so that the result is more suitable
than the original image for a specific application. Image enhancement approaches fall into two
ð Spatial domain methods
ð Frequency domain methods
The term spatial domain refers to the image plane itself and approaches in this categories are
based on direct manipulation of pixel in an image.
Spatial domain process are denoted by the expression
f(x,y)- input image T- operator on f, defined over some neighborhood of f(x,y)
The neighborhood of a point (x,y) can be explain by using as square or rectangular sub image area
centered at (x,y).
The center of sub image is moved from pixel to pixel starting at the top left corner. The operator T
is applied to each location (x,y) to find the output g at that location . The process utilizes only the
pixel in the area of the image spanned by the neighborhood.
Basic Gray Level Transformation Functions
It is the simplest form of the transformations when the neighborhood is of size IXI. In this case g
depends only on the value of f at (x,y) and T becomes a gray level transformation function of the
r- Denotes the gray level of f(x,y)
s- Denotes the gray level of g(x,y) at any point (x,y) Because enhancement at any point in an image deepens only on the gray level at that point,
technique in this category are referred to as point processing.
There are basically three kinds of functions in gray level transformation
Contract stretching -
It produces an image of higher contrast than the original one.
The operation is performed by darkening the levels below m and brightening the levels above m in
the original image.
In this technique the value of r below m are compressed by the transformation function into a
narrow range of s towards black .The opposite effect takes place for the values of r above m.
It is a limiting case where T(r) produces a two levels binary image.
The values below m are transformed as black and above m are transformed as white.
Basic Gray Level Transformation
These are the simplest image enhancement techniques
The negative of in image with gray level in the range 0, l-1 is obtained by using the negative
The expression of the transformation is
Reverting the intensity levels of an image in this manner produces the equivalent of a
photographic negative. This type of processing is practically suited for enhancing white or gray
details embedded in dark regions of an image especially when the black areas are dominant in
The general form of log transform is
Where R e 0
This transformation maps a narrow range of gray level values in the input image into a wider
range of output gray levels. The opposite is true for higher values of input levels. We would use
this transformations to expand the values of dark pixels in an image while compressing the higher
level values. The opposite is true for inverse log transformation.
The log transformation function has an important characteristic that it compresses the dynamic
range of images with large variations in pixel values.
Power law transformation
Power law transformation has the basic function
Where c and y are positive constants.
Power law curves with fractional values of y map a narrow range of dark input values into a wider
range of output values, with the opposite being true for higher values of input gray levels. We may
get various curves by varying values of y.
A variety of devices used for image capture, printing and display respond according to a power law. The process used to correct this power law response phenomenon is called gamma
For eg-CRT devices have intensity to voltage response that is a power function
Gamma correction is important if displaying an image accurately on a computer screen is of
concern. Images that are not corrected properly can look either bleached out or too dark.
Color phenomenon also uses this concept of gamma correction. It is becoming more popular due to
use of images over the internet.
It is important in general purpose contract manipulation. To make an image black we use y1 and
y1 for white image.
Piece wise Linear transformation functions-
The principal advantage of piecewise linear functions is that these functions can be arbitrarily
complex. But their specification requires considerably more user input
It is the simplest piecewise linear transformation function.
We may have various low contrast images and that might result due to various reasons such as
lack of illumination, problem in imaging sensor or wrong setting of lens aperture during image
The idea behind contrast stretching is to increase the dynamic range of gray levels in the image
The location of points (r1,s1) and (r ,s ) control the shape of the curve
a) If r =r and s =s , the transformation is a linear function that deduces no change in gray
1 2 1 2
b) If r1=s1, s1=0 , and s2=L-1, then the transformation become a thresholding function that
creates a binary image
c) Intermediate values of (r1, s1) and (r2, s2) produce various degrees of spread in the gray value
of the output image thus effecting its contract.
Generally r1d r2 and s1 d s2 so that the function is single valued and monotonically increasing
Gray Level Slicing-
Highlighting a specific range of gray levels in an image is often desirable
For example when enhancing features such as masses of water in satellite image and enhancing
flaws in x- ray images.
There are two ways of doing this-
(1) One method is to display a high value for all gray level in the range. Of interest and a low value
for all other gray level.
(2) Second method is to brighten the desired ranges of gray levels but preserve the
background and gray level tonalities in the image
Bit Plane Slicing
Sometimes it is important to highlight the contribution made to the total image appearance by
specific bits. Suppose that each pixel is represented by 8 bits.
Imagine that an image is composed of eight 1-bit planes ranging from bit plane 0 for the least
significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes, plane 0 contains
all the lowest order bits in the image and plane 7 contains all the high order bits
High order bits contain the majority of visually significant data and contribute to more subtle
details in the image.
Separating a digital image into its bits planes is useful for analyzing the relative importance