IntroToSignalProcessing.pdf

An Introduction to Signal Processing

in Chemical Analysis

An illustrated essay with software available for free download

Last updated July, 2011

PDF format: http://terpconnect.umd.edu/~toh/spectrum/IntroToSignalProcessing.pdf

Web format: http://terpconnect.umd.edu/~toh/spectrum/TOC.html

Tom O'Haver

Professor Emeritus

Department of Chemistry and Biochemistry

University of Maryland at College Park

E-mail: toh@umd.edu

Foreword

The interfacing of analytical instrumentation to small computers for the purpose of on-

line data acquisition has now become standard practice in the modern chemistry

laboratory. Using widely-available, low-cost microcomputers and off-the-shelf add-in

components, it is now easier than ever to acquire data quickly in digital form. In what

ways is on-line digital data acquisition superior to the old methods such as the chart

recorder? Some of the advantages are obvious, such as archival storage and retrieval of

data and post-run re-plotting with adjustable scale expansion. Even more important,

however, there is the possibility of performing post-run data analysis and signal

processing. There are a large number of computer-based numerical methods that can be

used to reduce noise, improve the resolution of overlapping peaks, compensate for

instrumental artifacts, test hypotheses, optimize measurement strategies, diagnose

measurement difficulties, and decompose complex signals into their component parts.

These techniques can often make difficult measurements easier by extracting more

information from the available data. Many of these techniques are based on laborious

mathematical procedures that were not practical before the advent of computerized

instrumentation. It is important for chemistry students to appreciate the capabilities and

the limitations of these modern signal processing techniques.

In the chemistry curriculum, signal processing may be covered as part of a course on

instrumental analysis (1, 2), electronics for chemists (3), laboratory interfacing (4), or

basic chemometrics (5). The purpose of this paper is to give a general introduction to

some of the most widely used signal processing techniques and to give illustrations of

their applications in analytical chemistry. This essay covers only elementary topics and is

limited to only basic mathematics. For more advanced topics and for a more rigorous

treatment of the underlying mathematics, refer to the extensive literature on

chemometrics.

This tutorial makes use of a freeware signal-processing program called SPECTRUM that

was used to produce many of the illustrations. Additional examples were developed in

Matlab , a high-performance commercial numerical computing environment and

programming language that is widely used in research. Paragraphs in gray at the end of

each section in this essay describe the related capabilities of each of these programs.

Signal arithmetic

The most basic signal processing functions are those that involve simple signal

arithmetic: point-by-point addition, subtraction, multiplication, or division of two signals

or of one signal and a constant. Despite their mathematical simplicity, these functions can

be very useful. For example, in the left part of Figure 1 (Window 1) the top curve is the

absorption spectrum of an extract of a sample of oil shale, a kind of rock that is is a

source of petroleum.

Figure 1. A simple point-by--point subtraction of two signals allows the background

(bottom curve on the left) to be subtracted from a complex sample (top curve on the left),

resulting in a clearer picture of what is really in the sample (right).

This spectrum exhibits two absorption bands, at about 515 nm and 550 nm, that are due

to a class of molecular fossils of chlorophyll called porphyrins. (Porphyrins are used as

geomarkers in oil exploration). These bands are superimposed on a background

absorption caused by the extracting solvents and by non-porphyrin compounds extracted

from the shale. The bottom curve is the spectrum of an extract of a non-porphyrin-bearing

shale, showing only the background absorption. To obtain the spectrum of the shale

extract without the background, the background (bottom curve) is simply subtracted from

the sample spectrum (top curve). The difference is shown in the right in Window 2 (note

the change in Y-axis scale). In this case the removal of the background is not perfect,

because the background spectrum is measured on a separate shale sample. However, it

works well enough that the two bands are now seen more clearly and it is easier to

measure precisely their absorbances and wavelengths.

In this example and the one below, the assumption is being made that the two signals in

Window 1 have the same x-axis values, that is, that both spectra are digitized at the same

set of wavelengths. Strictly speaking this operation would not be valid if two spectra were

digitized over different wavelength ranges or with different intervals between adjacent

points. The x-axis values much match up point for point. In practice, this is very often the

case with data sets acquired within one experiment on one instrument, but the

experimenter must take care if the instruments settings are changed or if data from two

experiments or two instrument are combined. (Note: It is possible to use the mathematical

technique of interpolation to change the number of points or the x-axis intervals of

signals; the results are only approximate but often close enough in practice).

Sometimes one needs to know whether two signals have the same shape, for example in

comparing the spectrum of an unknown to a stored reference spectrum. Most likely the

concentrations of the unknown and reference, and therefore the amplitudes of the spectra,

will be different. Therefore a direct overlay or subtraction of the two spectra will not be

useful. One possibility is to compute the point-by-point ratio of the two signals; if they

have the same shape, the ratio will be a constant. For example, examine Figure 2.

Figure 2. Do the two spectra on the left have the same shape? They certainly do not

look the same, but that may simply be due to that fact that one is much weaker that the

other. The ratio of the two spectra, shown in the right part (Window 2), is relatively

constant from 300 to 440 nm, with a value of 10 +/- 0.2. This means that the shape of

these two signals is very nearly identical over this wavelength range.

The left part (Window 1) shows two superimposed spectra, one of which is much weaker

than the other. But do they have the same shape? The ratio of the two spectra, shown in

the right part (Window 2), is relatively constant from 300 to 440 nm, with a value of 10

+/- 0.2. This means that the shape of these two signals is the same, within about +/-2 %,

over this wavelength range, and that top curve is about 10 times more intense than the

bottom one. Above 440 nm the ratio is not even approximately constant; this is caused by

noise, which is the topic of the next section.

Simple signal arithmetic operations such as these are easily done in a spreadsheet, any

general-purpose programming language, or a dedicated signal-processing program such

as SPECTRUM, which is available for free download .

SPECTRUM includes addition and multiplication of a signal with a constant; addition,

subtraction, multiplication, and division of two signals; normalization, and a large

number of other basic math functions (log, ln, antilog, square root, reciprocal, etc).

In Matlab , math operations on signals are especially powerful because the variables in

Matlab can be either scalar (single values), vector (like a row or a column in a

spreadsheet), representing one entire signal, spectrum or chromatogram, or matrix (like a

rectangular block of cells in a spreadsheet), representing a set of signals. For example, in

Matlab you could define two vectors a=[1 2 5 2 1] and b=[4 3 2 1 0] . Then

to subtract B from A you would just type a-b , which gives the result [-3 -1 3 1 1] .

To multiply A times B point by point, you would type a.*b , which gives the result [4

6 10 2 0] . If you have an entire spectrum in the variable a , you can plot it just by

typing plot(a) . And if you also had a vector w of x-axis values (such as wavelengths),

you can plot a vs w by typing plot(w,a) . The subtraction of two spectra a and b , as in

Figure 1, can be performed simply by writing a-b . To plot the difference, you would

write plot(a-b) . Likewise, to plot the ratio of two spectra, as in Figure 2, you would

write plot(a./b) . Moreover, Matlab is a programming language that can automate

complex sequences of operations by saving them in scripts and functions.

Signals and noise

Experimental measurements are never perfect, even with sophisticated modern

instruments. Two main types or measurement errors are recognized: systematic error, in

which every measurement is either less than or greater than the "correct" value by a fixed

percentage or amount, and random error, in which there are unpredictable variations in

the measured signal from moment to moment or from measurement to measurement. This

latter type of error is often called noise , by analogy to acoustic noise. There are many

sources of noise in physical measurements, such as building vibrations, air currents,

electric power fluctuations, stray radiation from nearby electrical apparatus, interference

from radio and TV transmissions, random thermal motion of molecules, and even the

basic quantum nature of matter and energy itself.

In spectroscopy, three fundamental type of noise are recognized: photon noise, detector

noise, and flicker (fluctuation) noise. Photon noise (often the limiting noise in

instruments that use photomultiplier detectors), is proportional to the square root of light

intensity, and therefore the SNR is proportional to the square root of light intensity and

directly proportional to the slit width. Detector noise (often the limiting noise in

instruments that use solid-state photodiode detectors) is independent of the light intensity

and therefore the detector SNR is directly proportional to the light intensity and to the

square of the monochromator slit width. Flicker noise, caused by light source instability,

vibration, sample cell positioning errors, sample turbulence, light scattering by suspended

particles, dust, bubbles, etc., is directly proportional to the light intensity, so the flicker

SNR is not decreased by increasing the slit width. Flicker noise can usually be reduced or

eliminated by using specialized instrument designs such as double-beam , dual

wavelength , diode array, and wavelength modulation .

The quality of a signal is often expressed quantitatively as the signal-to-noise ratio (SNR)

which is the ratio of the true signal amplitude (e.g. the average amplitude or the peak

height) to the standard deviation of the noise. Signal-to-noise ratio is inversely

proportional to the relative standard deviation of the signal amplitude. Measuring the

signal-to-noise ratio usually requires that the noise be measured separately, in the absence

of signal. Depending on the type of experiment, it may be possible to acquire readings of

the noise alone, for example on a segment of the baseline before or after the occurrence of

the signal. However, if the magnitude of the noise depends on the level of the signal (as

in photon noise or flicker noise in spectroscopy), then the experimenter must try to

produce a constant signal level to allows measurement of the noise on the signal. In a

few cases, where it is possible to model the shape of the signal exactly by means of a

mathematical function, the noise may be estimated by subtracting the model signal from

the experimental signal.

Figure 3. Window 1 (left) is a single measurement of a very noisy signal. There is

actually a broad peak near the center of this signal, but it is not possible to measure its

position, width, and height accurately because the signal-to-noise ratio is very poor (less

than 1). Window 2 (right) is the average of 9 repeated measurements of this signal,

clearly showing the peak emerging from the noise. The expected improvement in signal-

to-noise ratio is 3 (the square root of 9). Often it is possible to average hundreds of

measurement, resulting is much more substantial improvement.

One of the fundamental problems in signal measurement is distinguishing the noise from

the signal. Sometimes the two can be partly distinguished on the basis of frequency

components : for example, the signal may contain mostly low-frequency components and

the noise may be located a higher frequencies. This is the basis of filtering and

smoothing . But the thing that really distinguishes signal from noise is that random noise

is not the same from one measurement of the signal to the next, whereas the genuine

signal is at least partially reproducible. So if the signal can be measured more than once,

use can be made of this fact by measuring the signal over and over again as fast as

practical and adding up all the measurements point-by-point. This is called ensemble

averaging , and it is one of the most powerful methods for improving signals, when it can

be applied. For this to work properly, the noise must be random and the signal must occur

at the same time in each repeat. An example is shown in Figure 3. 3.

SPECTRUM includes several functions for measuring signals and noise, plus a signal-

generator that can be used to generate artificial signals with Gaussian and Lorentzian

bands, sine waves, and normally-distributed random noise. Matlab has built-in functions

that can be used for measuring and plotting signals and noise, such as mean , max , min,

range , std , plot , hist . You can also create user-defined

to automate commonly-

used algorithms. Some examples that you can download and use are these user-defined

functions to calculate typical peak shapes commonly encountered in analytical chemistry,

gaussian and lorentzian , and typical types of random noise ( whitenoise , pinknoise ) , which

can be useful in modeling and simulating analytical signals and testing measurement

techniques. (If you are viewing this document on-line, you can Ctrl-click on these links to

inspect the code). Once you have created or downloaded those functions, you can use

them to plot a simulated noisy peak such as in Figure 3 by typing

functions

x=[1:256];plot(x,gaussian(x,128,64)+whitenoise(x)) .

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: