| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 86
Neurobehavioral Tests:
Problems, Potential, and Prospects
J. Graham Beaumont
There seems to be general agreement that any monitoring of the
effects of environmental and occupational exposure to neurotoxins
should include behavioral measures. An important element in the
effects of known toxins is the response of the nervous system, includ-
ing peripheral sensory and motor components and higher central ef-
fects upon the function of the forebrain. This response has clear
behavioral aspects following gross acute exposure and significant chronic
exposure to a range of neurotoxins. There are considered to be more
subtle behavioral effects of less severe acute exposure or of sustained
exposure to lower levels of the relevant substances.
The assessment of behavioral effects is considered to be the pri-
mary approach to the systematic monitoring of neurotoxic exposure,
and where mass screening is considered for large populations at risk,
it may be the only practicable approach, at least for initial selection.
It is obvious that automated screening by the use of computer-based
assessment could contribute significantly to the development of
appropriate techniques.
The essential context for the adoption of acceptable assessment
techniques is that the potential behavioral changes should have been
identified and reliable measures of these changes should be avail-
able, which have been demonstrated to be valid, and for which ap-
propriate normative data are available.
- It may also be desirable that the test be stable under conditions of
repeated testing. Particularly when relatively subtle changes, with a
86
OCR for page 87
NEUROBEHAVIORAL TESTS
87
low base-rate in the population (as may be typical of mass screening),
are to be detected, it is essential that the validity (and therefore the
reliability) be exceptionally high.
None of this is in conflict with the preceding chapters, indeed
there is remarkable agreement as to the current state of the field, the
methodological principles that apply, and the standards that should
be adopted. Areas in which there is some potential disagreement are
as to whether the currently available tests are sufficient for their purposes,
and whether the introduction of new test instruments is to be encouraged.
This chapter therefore concentrates principally upon those issues.
TESTS CURRENTLY IN USE
The last three chapters have covered the history and description of
current tests, with particularly helpful tabulations by Hanninen and
Anger, and it would be redundant to repeat much of this material.
It is worth, however, drawing attention to the version of the World
Health Organization's Neurobehavioral Core Test Battery (WHO-NCTB)
in a computer-based form developed by the Institute of Occupational
Health at the University of Milan. The battery is much as in its
original form except that the Santa Ana Rotation Test of the original
battery has, for pragmatic reasons, been replaced by a test which
assesses rather different cognitive functions, and the modality of the
Digit Span task has been changed in a way that is known to alter the
cognitive functions involved (Beaumont, 1985~.
A preliminary study of the psychometric characteristics of this
implementation of the NCTB has been reported (Camerino, 1987~.
This indicates that there are some serious questions concerning the
validity of these tests in terms of their suitability for the assessment
purposes under consideration.
A group of 30 volunteers young, relatively well-educated adults-
were retested at weekly intervals on certain of the tests [excluding
Benton Visual Retention Test (VRT) and Aiming Pursuit], and estimates
of the reliability and validity of the measures were made from the
results. It is a little unclear what reliability should be expected from
an instrument that assesses mood "over the past week," given at
weekly intervals: the range of values of r from -0.24 to +0.87 on the
various individual scales is probably not remarkable. The reliabilities
on the cognitive tasks are more acceptable, being in the range 0.62 to
0.89, if reaction time (RT) variability and Digit Learning are excluded.
The reliabilities of the Digit Learning test at 0.40 and 0.19 (for
occasions 1-2, 2-3, respectively) are clearly quite inadequate and suggest
that the test should be abandoned as part of this assessment.
OCR for page 88
88
J. GRAHAM BEAUMONT
Correlations were also calculated with paper-and-pencil versions,
as a crude measure of construct validity. Correlations were modest
ranging from 0.55 (Serial Digit) to 0.79 (Benson VRT). These values
are not atypical of values that might be expected on psychometric
tests of this type.
However, these results do raise certain doubts about the psychometric
suitability of these tests to the purposes for which they are being
employed. For purposes of debate, assume that the validity of the
measures is on average about 0.75. This is probably rather generous:
reliability limits the upper extent of validity, and reliabilities are in
some cases below this level. In addition, the sample employed was
likely to provide relatively high levels of reliability and validity. At
this level of validity, if we are trying to identify pathological effects
which are present in 50 percent of those tested, the best that the test
can theoretically achieve is 77 percent correct classification of the test
subjects. In practice, a much more unfavorable base-rate of the condition
is likely to apply in the test population. If the incidence to be detected
falls to 1 in 10, the theoretical maximum achievement of the test will
be 90 percent overall correct classification, but of those affected only
50 percent will be correctly identified. Of those achieving "positive"
results on the test, half will be misclassified because they are false
positives. As the base-rate or the validity falls, these statistics become
even more unacceptable. It should be clear that in psychometric
terms, these tests as implemented in the Milan study are insufficiently
powerful to allow any valid assessment of the neurobehavioral functions
under study.
This must be a serious concern because (quite reasonably) the NCTB
has been adopted in a number of centers around the world. Swedish
studies conducted at the National Board of Safety and Health (Iregren,
1986) have used these tasks among a battery of others administered
in both traditional and computer-based formats, as well as some other
automated modes. The computer-based tests include Memory
Reproduction (letter and digit sequences, rather like Digit Span), Simple
and Choice Reaction Time, and Color Word Vigilance, and others are
under development. Studies conducted with the full range of tests
have demonstrated some significant interesting findings between criterion
groups selected for contrast on relevant variables, mostly relating to
solvent exposure. Some reliability data are reported by Iregren in
this volume. The methodological rigor of this approach is to be ap-
plauded, and the data show some reliabilities for certain of the assessments
significantly higher than the Milan data for their battery. Neverthe-
less, with assessments of higher cognitive functions and of affect, the
psychometric adequacy of the instruments remains a problem.
OCR for page 89
NEUROBEHAVIORAL TESTS
89
A battery that shares some provenance with the WHO-NCTB, al-
though it strictly just predates it, is Baker and Letz's Neurobehavioral
Evaluation System (NES). The NES has been adopted by a major
study being carried out by the Institute of Occupational Health in
Birmingham, United Kingdom (Spurgeon and Harrington, 1987~. This
study will use the Clinical Interview Schedule together with the Hogstedt
Symptom Questionnaire, Stress and Arousal Checklist, Cognitive Failures
Questionnaire, and Prospective Memory Test, in addition to the NES
tests. At present only preliminary pilot data are available.
Of course there have been a large number of other studies published
in the literature which have employed a wide variety of tests. A
survey of the literature of the effects of lead on intelligence reveals
the WISC-R to be the most popular test in a traditional format to
have been employed in this research (Yule and Rutter, 1985~. A great
variety of more specific tests of individual functions have also been
employed (Anger, 1985~.
Further contributions concerning the use of computer-based as-
sessment in this domain are to be found in Braconnier (1985~. A
useful collection of papers concerned more generally with the issues
raised by computer-based assessment appeared in Applied Psychology
(e.g., see Huba, 1987~.
The preceding chapters seem to be in agreement that (1) there are
problems evident in the construction of various batteries, (2) most of
the tests currently in use are relatively inadequate, and (3) there is
poverty in the current psychological descriptions of neurotoxic syndromes.
SOME SPECIFIC POINTS
here.
Some specific points made in the preceding chapters are highlighted
Methods in Behavioral Toxicology (Hanninen)
The problems concerning the definition and description of the
neurotoxic deficit are well taken: this is clearly of crucial significance
for any advance in the field and emphasizes the need for more
fundamental research into the cognitive processes affected.
The suggestion that the in-depth study of individual patients might
be profitable is also a valuable one. There are now many good single-
case experimental designs that might be appropriately deployed in
this area, and they should be considered in order to further clarify
the description of the relative deficits.
The -dilemma that Hanninen discusses between the "conservative"
OCR for page 90
go
J. GRAHAM BEAUMONT
and "progressive" approaches is a real one and, to some extent, is
fundamental to much of the discussion that follows. It is of central
importance to decide whether to make the best of the rather poor
tests that are currently in use, or whether to adopt a more radical
reevaluation of current tests and the potential new instruments that
might be created.
Current Status of Test Development (Williamson)
Williamson sensibly highlights the potential for tests that relate
explicitly to psychological theory (although the distinction between
those that relate to "cognitive structure" and those that are "theory-
based" may not be so easy to sustain). If it becomes possible to
elaborate our understanding of the psychological processes (and, perhaps,
as a contribution to that understanding), there is obvious merit in the
use of such tests.
The "potential barrier" of computer-based testing must be taken
seriously. There is clearly no value in developing computer-based
tests if they confer few advantages, introduce extraneous sources of
error, and hinder the wide application of tests. There may be benefits
from the application of computers that outweigh these disadvantages-
at least in parts of the world where they can practicably be used but
it is important to be clear about the advantages in any given case.
The need for more basic research is again emphasized, and the
proposal that the adaptive nature of some of the changes which take
place be considered may be a particularly useful insight.
Human Neurobehavioral Tests (Anger)
Anger's useful and authoritative view is clear and correct about
the potential contribution that the test batteries may make in this
field. It is necessary, however, to ensure that this potential is realized
in practice. It is certainly possible that the relevant changes could be
detected. It is much less certain that current batteries are capable of
detecting the changes (and some reason to believe that they are not).
The case also has to be argued more clearly for the value of cross-
cultural data collection. It is naturally important, indeed essential,
that appropriate local norms be available. However, given that there
are inevitably differences among cultures in education, cognitive processes,
cultural experience, exposure to testing and test materials, and even
(some believe) in intelligence, test performance will differ in different
cultures and subcultural contexts. In this situation, differences underlying
test performance and exposure to toxins will undoubtedly be confounded.
OCR for page 91
NEUROBEHAVIORAL TESTS
91
The results will be difficult, perhaps impossible, to interpret, and
little will have been gained by international comparisons. The idea
of a worldwide pool of test results may be superficially attractive, yet
not based in the psychometric realities of the situation.
THE ADEQUACY OF CURRENT TESTS
The problems inherent in current assessment batteries appear to
be twofold. First, the tests employed have been selected on the basis
of their previous use in experimental studies of the effects of exposure
to neurotoxins. It is natural that, when a test has been shown to
distinguish between a criterion group of exposed individuals and a
control group, this test should be considered suitable for inclusion in
an assessment battery. This is, however, not necessarily the case. Only
if the test can be shown to have sufficient psychometric power for the
role of general screening can it be considered useful in this way. It is
important throughout to maintain a careful distinction between tests
that are useful for group experiments and those that may be used for
individual screening.
Second, there is a temptation to select tests that are generally con-
sidered to be capable of indicating central nervous system (CNS)
dysfunction. Here the temptation has been to take tests that are
believed capable of revealing the effects of dementia, cerebral disease,
or gross trauma, and to adopt them for detection of the effects of
neurotoxins. This procedure is open to two misconceptions: that the
effects of neurotoxins will be the same (in cognitive terms) as the
effects of dementia, cerebral disease, or trauma, and that there are
tests capable of simply discriminating among these other disorders.
There seems to be little basis for accepting either of these proposals.
It is unlikely that CNS poisoning is similar in its effects to other
cerebral pathology, any more than the sunilarity between, say, dementia
and trauma. The history of neuropsychology is littered with failed
attempts to identify, by means of a single measure or small group of
measures, general cerebral pathology. In particular, if the effects are
relatively diffuse, the problem is especially difficult. An example is
the difficulty of distinguishing, by cognitive measures alone, dementia
of the Alzheimer type in the elderly at least in its early stages-
from either functional psychiatric illness or acute systemic illnesses.
Much the same problem must apply to the effects of neurotoxins.
It is therefore not surprising that the battery of tests now generally
employed is not of strong validity and is probably inadequate for the
general detection of the behavioral effects of neurotoxins. There is
OCR for page 92
92
I. GRAHAM BEAUMONT
simply insufficient power in the basic psychological instruments be-
ing employed.
The critical problem is the psychometric power of the tests, and
the critical question is, Is the WHO-NCTB (including related batteries
such as the NES) adequate to the task? It is important at this point to
be clear as to what the task is either to conduct group experiments
or to undertake individual screening.
If the task is to investigate the differences between criterion groups,
then the NCTB may be adequate to the task. Its psychometric power
is still weak, and there might well be better tools available. It is
probably, as a psychometric instrument, best described as "premature."
Nevertheless, the fact that it is available, and already quite widely
adopted, is of some importance, and it is clearly capable of discrimi-
nating between carefully selected groups under favorable conditions.
Its use is certainly justified in this context, although efforts should be
made to dramatically increase the size of the standardization samples
available and to improve the basic reliability of the tests. In the
context of such studies using the NCTB, it might be that computers
are an impediment and that administration in the standard form is to
be preferred.
However, if the aim is to carry out screening for exposed and
affected individuals, the NCTB is likely to be quite inadequate on
psychometric grounds. As discussed above, the available data suggest
that the battery is not reliable enough to permit sufficiently accurate
classification of affected and nonaffected individuals.
This implies that if screening is a goal of the research (or if significant
improvements are to be made in the sensitivity of the tests for detecting
differences between criterion groups), then the whole basis of the
assessments currently employed needs to be reexamined. Better fun-
damental research is needed to generate a psychological description
of the deficits and better models of the effects which can be related to
that description. In achieving this it may well be advantageous to
make better use of new developments in psychometrics and in the
explicit models of cognitive performance. It is at this point that computers
might well be introduced. One way in which this might be done is
described below.
v
SOME NEW DEVELOPMENTS IN
COMPUTER-BASED ASSESSMENT
It seems worth inquiring whether there are alternative approaches
that could potentially provide more satisfactory solutions to the assessment
of cognitive performance. There seem to be at least two potentially
OCR for page 93
NEUROBEHAVIORAL TESTS
93
fruitful avenues of exploration. One is rather better charted: the use
of adaptive testing systems, although it is not considered further here.
The- other is through the explicit incorporation of cognitive models
into intelligent assessment systems. Such systems would not radi-
cally overthrow the traditional psychometric approaches, but would
complement and extend such approaches so that the advantages of
both could contribute to the power inherent in the assessment procedure.
If an intelligent and powerful assessment system is to be devel-
oped, it must incorporate appropriate psychometric models of the
reference domain as well as a psychological (cognitive) model of that
domain. The solution may well come from a progressive integration
of psychometric theory, together with selection of those methods with
greatest utility on the basis of empirical study. There is, after all, no
reason why more than one psychometric model should not be operated
concurrently, and the respective processes cross-referenced, as long
as the assumptions of each are properly respected.
Cognitive Componential Models
Functional models, increasingly explicit in the cognitive domain,
might allow an assessment system to possess an internal representa-
tion of the function that is under examination. One of the fruits of
the growth of cognitive information-processing approaches into the
dominant zeitgeist of contemporary psychology has been the production
of explicit functional models. Some of these models are now presented
in a sufficiently well-articulated form to make them useful in the
description of functional status. Such descriptions can, in turn, be
used in the identification of dysfunctional elements in performance
and in the design and monitoring of instructional and remedial schemes.
Perhaps the most well known of these models relate to reading
ability. Here the interaction between the developmental study of normal
reading ability and neuropsychological investigation of the dysfunc-
tions to be observed in brain-injured patients has stimulated the pro-
duction of general models of reading competency. Over the past few
years the analyses of developmental dyslexia and of adult acquired
dyslexia have converged into a common view of the processes that
may be defective in reading failure.
The point about this and similar models is that each component is
capable of identification by manipulations in an explicit experimental
paradigm. The evidence is derived from studies on normal subjects
by which the processing components can be inferred and from study
of clinical patients in whom the failure of one component of the sys-
tem can be identified.
OCR for page 94
94
J. GRAHAM BEAUMONT
A number of models in a variety of domains (spelling, arithmetic
functions, algebra, reasoning, number-series identification, map
interpretation) illustrate how human abilities can be analyzed in terms
of componential subprocesses. The relationships among the subprocesses
are described in the model. The components in each model, both
functional elements and channels of information transfer, can be as-
sessed by experimental paradigms that are amenable to automated im-
plementation. A system which incorporated an explicit model about
the function under investigation should be capable of intelligently
describing the nature and level of that function in the psychological
domain within which the model has been created.
Inferential Systems
It remains to be shown how explicit cognitive functional models,
in association with adaptive testing systems technology, might be
incorporated into a practical and intelligent assessment system. The
way in which this might be achieved is through the use of an intelli-
gent knowledge-based systems approach.
The differences between traditional psychometrics and "expert systems"
are not as fundamental as might be supposed. Although expert sys-
tems as commonly expressed within a rule-based programming envi-
ronment appear very different from a psychometric test instrument,
they have several fundamental constructs in common. The parallels
become more clear if the elements of each procedure are considered.
Me objectives of the expert system are the test items of the conventional
test; the values, the responses; the questions and user interface are
equivalent to the administration procedures; the rules are represented
in the scoring norms; the inference engine is matched by the psychometric
model being employecl. The goal that the expert system is set is, of
course, the test result of the conventional test instrument.
It is possible to establish the validity of these parallels. The author
has a demonstration system, created under a popular expert system
"shell," that administers the Mill Hill Vocabulary Test in a form
indistin-guishable-from a number of computer-based implementations
of that test which have been realized by procedural programming
systems that simply simulate the conventional administration of the
test. It may well not be the most efficient way to achieve this result,
and the use of the expert system shell may be to some degree artifi-
cial, but it nonetheless provides;evidence-for the parallels that are
being proposed between these kinds of systems.
Given these parallels, it Is a short step to suggest that a cognitive
componential model might be explicitly incorporated within a knowledge-
OCR for page 95
NEUROBEHAVIORAL TESTS
95
based system to permit intelligent assessment of the cognitive func-
tion modeled. This would simply require that the mode} be sufficiently
well articulated to be expressed in terms of the contents of a rule
base. A variety of procedures will, of course, also be defined which
permit data to be established pertinent to the rule-based inferences
that are to be made. These procedures may be prior values held
within the system, they may be the responses to questions put to the
test subject or to the test examiner, or they may be the results of
ancillary procedures (including independent subprocedures defined
within a procedural programming environment). The procedures may
operate at the level of individual test "items" or may refer to a higher
level of "subtest" investigation.
These subprocedural levels may reflect the structures that have
already been developed within the adaptive testing context. The
statistical procedures that have been derived for use within adaptive
testing systems may also operate at this level of the organization of
the system. Traditional psychometric (statistical) techniques may be
applied at this level, within the lower level subprocedures, or at the
level of the implementation of the rule base. The statistical procedures
may operate within the defi~ution of the rules derived from the cognitive
model, or else be applied in parallel with the cognitive model, so that
estimates derived from each inferential process may be compared
and combined in generating the overall test outcome (see Huba, 1987~.
This is, after all, no more than a formalization of what an expert
human examiner does in performing an assessment. Elements of the
assessment procedure are composed into the battery of tests to be
applied, according to some model (often implicit) that the examiner
maintains of the functions to be assessed. The individual tests are
then administered, often with some degree of selection and modification
of the battery, depending upon earlier test results. Statistical estimates
derived from the test are obtained and interpreted in line with hypotheses
generated from the functional model that the test examiner holds. A
psychological description (the "report") is generated which is relevant
to the assessment question being investigated.
The potential advantages of the kind of scheme envisaged above
are that the internal cognitive model is explicit and can be more
rigorously applied (and improved); the investigation of data relevant
to the inferences being tested is systematic and should therefore be
more efficient; and intelligence, in the form of the inferencing procedures,
is automatically and consistently applied to the problem. In addi-
tion, the behavioral description generated- from the system is inevita-
bly formulated in terms of the cognitive mode] being maintained: it
is a psychological and not a statistical description. It must therefore
OCR for page 96
96
I. GRAHAM BEAUMONT
be relevant to the application for which the test is being employed
and be more useful in response to questions about diagnosis, man-
agement, treatment, selection, or adjustment.
IMPLICATIONS FOR NEUROBEHAVIORAL
ASSESSMENT
The adoption of techniques such as these implies that a number of
conditions should be met before the development of assessment pro-
cedures in this area can advance.
The first is a better understanding of the psychological functions
affected by neurotoxins. A vague formulation in terms of effects
upon psychomotor performance, slowing of response, impaired eye-
hand coordination, diminished concentration, recent memory, and
affective state, is insufficient. A better model is needed of the physiological
vectors that are generating these effects, as well as a better-elaborated
description of the effects in behavioral terms.
Second, these functions should be summarized in the form of a
psychological description of the general dysfunctional state that follows
from exposure to neurotoxins. This needs to be sufficiently detailed
to allow a clear account of the psychological processes implicated in
these functions to be deduced.
Third, these processes should be formulated in terms of a cognitive
componential mode! of the relevant functions, in a sufficiently coherent
form to allow decomposition of the observed performance and analysis
of the functional status of the subject.
Fourth, this should be translated into an assessment system in terms
of the individual component elements of performance. These should
be assessed separately by testing routines (either criterion- or norm-
referenced) that will allow an intelligent computer-based diagnostic
analysis of performance.
This analysis will probably be dismissed by those whose immediate
concern is for an instrument that can be used now to address the
very real problems of assessing current levels of neurotoxic exposure.
However, if a valid assessment is needed, one is forced to conclude
that no adequate instrument is currently available. Radical improvements
must be made in our understanding of the target behavioral effects,
which must be based on more extensive fundamental research. These
can then be translated into effective assessment instruments. Such an
approach has already been shown to yield dividends in other
neuropsychological areas [particularly in assessing reading disorders
(Seymour, 1987) and errors in arithmetic processing], and could well
OCR for page 97
AIL ASH
97
be profitably applied to the neurobehavioral testing of exposure to
toxins.
^ gnat and unrelated thought No one seems to have taken serL
ously the possibility of the individual baseline testing of workers
potentiaUy open to exposure Such an approach could completely
transform what could be achieved by psychological assessment. Many
of our psychometric difOculhes would be Mated at a stroke
data on each worker before exposure were avaHable. Compare me
advances made in neuropsychology during World War H. largely
because of me avail of ps~lo~cal data coed Won Educed.
~ a battery were administered on recruitment (and perhaps every
5 or 10 years subsequently), it would be possible to establish me
emus Won co--e ~cho^g ~ an individual worker Aim repave
ease and to dramadcaDy improve our understanding of the relevant
processes in general. Even if legislation to introduce this is unattain-
abl~ the ~boduchon of such a system by ~ namer of major employers
Could at least make a ~orth~hUe contribution. The suggestion ~ no
doubt naive, but of such immense potential value that it deserves to
be discussed.
REFERENCES
Wager, W. K. 1985. Neurobehavioral tests used ~ NIOSH-supported workshe studies,
1973-1983. Neurobehav10ral Toxicology and Teratology 7:359-368.
Beaumont, J. G. 1985. The enact of microcomputer presentation and response medium
on digit span performance. International Journal of ~an-~achine Studies 22:11-
18.
Branco~ier, R. J. 1985. Demand ~ human populations exposed to neurotoxic agents
portable microcomputerized screening device. Neurobehavioral Toxicology and
Teratology 7:379-386.
C_, D. 1987. Pay ~~ ~ ~ Meads ~ ~ Autism_
Fog of WHO-NCTB. Insulate of ~cupab~1 Heals, Owner of ~Han.
Huba, C. J. 1987. ~ probabilistic computer-based test Retails and Aver
expert systems. Applied Psychology 36:35~3~.
Iregren, A. 1986. Effects of Industrial Solvent Interachons: Studies of Behavioral
Enacts ~ ~an. Athlete och Halsa, (ISSN 0346-~21) Solna, Sweden.
Seymour, P. H. K. 1987. Individual connive analyst of competent and impaired
reading. Banish Journal of Psychology 78:48~06.
Spurgeon, A~ and ad. Harr~gton. 1987. The Neuropsychological EHects of Long-
Term Exposure to Organic Solvents Institute of Occupational Heal+, University
of Birm~am, U.~.
Yules We and at. Ruttier. 1985. E~c~ of lead on chOdren's behavior and connive
performance: ~ review. In Dietary and Environmental Lead: Human Realm
Enacts, K. R. ahoy, ed Amsterdam: Elsevier.
OCR for page 98
Representative terms from entire chapter:
graham beaumont