EBook Etextbook PDF For Applied Regression Analysis and Other Multivariable Methods 5Th Edition PDF Docx Kindle Full Chapter
EBook Etextbook PDF For Applied Regression Analysis and Other Multivariable Methods 5Th Edition PDF Docx Kindle Full Chapter
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents vii
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
viii Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents ix
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
x Contents
16.7 Step 5: Evaluating Reliability with Split Samples: Prediction Goal 454
16.8 Example Analysis of Actual Data 457
16.9 Selecting the Most Valid Model 463
Problems 466
References 480
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents xi
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xii Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents xiii
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xiv Contents
Index 1037
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface
This is the fourth revision of our second-level statistics text, originally published in 1978 and
revised in 1987, 1998, and 2008. As with previous versions, this text is intended primarily
for advanced undergraduates, graduate students, and working professionals in the health,
social, biological, and behavioral sciences who engage in applied research in their fields. The
text may also provide professional statisticians with some new insights into the application of
advanced statistical techniques to realistic research problems.
We have attempted in this revision to retain the basic structure and flavor of the earlier
editions, while at the same time making changes to keep pace with current analytic practices
and computer usage in applied research. Notable changes in this fifth edition, discussed in
more detail later, include
i. Clarification of content and/or terminology as suggested by reviewers and read-
ers, including revision of variable and subscript notation used for predictor vari-
ables and regression coefficients to provide consistency over different chapters.
ii. Expanded and updated coverage of some content areas (e.g., confounding and
interaction in regression in Chapter 11, selecting the best regression equation in
Chapter 16, sample size determination in Chapter 27).
iii. A new linear regression example that is carried through and expanded upon in
Chapters 5, 6, 8, 9, 11, 12, 13, and 16.
iv. Some new exercises at the end of selected chapters, including exercises related to
the new example described in item (iii) above.
v. Updated SAS computer output using SAS 9.3 that reflects improvements in out-
put styling.
vi. Two computer appendices on programming procedures for multiple linear regres-
sion models, logistic regression models, Poisson regression models, and mixed
linear models:
a. In-text: SAS
b. Online: SPSS, STATA, and R
In this fifth edition, as in our previous versions, we emphasize the intuitive logic and
assumptions that underlie the techniques covered, the purposes for which these techniques
are designed, the advantages and disadvantages of these techniques, and valid interpretations
xv
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xvi Preface
based on these techniques. Although we describe the statistical calculations required for the
techniques we cover, we rely on computer output to provide the results of such calculations
so the reader can concentrate on how to apply a given technique rather than how to carry
out the calculations. The mathematical formulas that we do present require no more than
simple algebraic manipulations. Proofs are of secondary importance and are generally omit-
ted. Calculus is not explicitly used anywhere in the main text. We introduce matrix notation
to a limited extent in Chapters 25 and 26 because we believe that the use of matrices provides
a more convenient way to understand some of the complicated mathematical aspects of the
analysis of correlated data. We also have continued to include an appendix on matrices for
the interested reader.
This edition, as with the previous editions, is not intended to be a general reference
work dealing with all the statistical techniques available for analyzing data involving several
variables. Instead, we focus on the techniques we consider most essential for use in applied
research. We want the reader to understand the concepts and assumptions involved in these
techniques and how these techniques can be applied in practice, including how computer
packages can help make it easier to perform the analysis of one’s data.
The most notable features of this fifth edition, including the material that has not been
modified from the previous edition, are the following:
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xvii
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xviii Preface
hazards and extended Cox models for survival data.) The computer appendix
will provide a quick and easy reference guide to help the reader avoid having to
spend a lot of time finding information from sometimes confusing help guides in
packages like SAS.
Acknowledgments
We wish to acknowledge several people who contributed to the development of this text,
including early editions as well as this fifth edition. Drs. Kleinbaum and Kupper continue to
be indebted to John Cassel and Bernard Greenberg, two mentors who have provided us with
inspiration and the professional and administrative guidance that enabled us at the begin-
ning of our careers to gain the broad experience necessary to write this text.
Dr. Kleinbaum also wishes to thank John Boring, former Chair of the Department of
Epidemiology at Emory University, for his strong support and encouragement during the
writing of the third and fourth editions and for his deep commitment to teaching excellence.
Dr. Kleinbaum also wishes to thank Dr. Mitch Klein of Emory’s Department of Epidemiology
for his colleagueship, including thoughtful suggestions on and review of previous editions.
Dr. Kleinbaum also thanks Dr. Viola Vaccarino, Chair of the Department of Epidemiology
at Emory University, for continued support and encouragement of his academic life at the
Rollins School of Public Health at Emory University.
Dr. Kupper will forever be indebted to Dr. William Mendenhall, founder and longtime
Chair of the University of Florida Department of Statistics. Dr. Mendenhall gave Dr. Kupper
his start in the field of statistics, and he served as a perfect example of an inspiring teacher
and a caring mentor.
Mr. Nizam wishes to thank Dr. Lance Waller, Chair of the Department of Biostatistics
and Bioinformatics at Emory University, for his strong support and Dr. John Spurrier of the
Department of Statistics at the University of South Carolina for being a wonderful teacher,
advisor, and mentor.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xix
We thank Julia Labadie for her assistance in preparing SAS computer output for this
edition. We also thank Dr. Keith Muller for his contributions to earlier editions as one of
our coauthors.
We thank our spouses—Edna Kleinbaum, Sandy Martin, Janet Nizam, and Abby
Horowitz—for their encouragement and support during the writing of various revisions.
We thank our reviewers of the fifth edition for their helpful suggestions:
Joseph Glaz, University of Connecticut
Lynn Kuo, University of Connecticut
Robert Paige, Missouri University of Science and Technology
Debaraj Sen, Concordia University
Po Yang, DePaul University
We thank the Cengage Learning Statistics and Mathematics team, especially Molly
Taylor, Senior Product Manager, and Laura Wheel, Senior Content Developer, for guiding
us through the publication process for the fifth edition, as well as Jessica Rasile, Content
Project Manager, and Tania Andrabi, Production Manager.
David G. Kleinbaum
Lawrence L. Kupper
Azhar Nizam
Eli S. Rosenberg
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1
Concepts and Examples
of Research
1.1 Concepts
The purpose of most empirical research is to assess relationships among a set of vari-
ables, which are factors that are distinctly measured on observational units (or subjects).
Multivariable1 techniques are concerned with the statistical analysis of such relationships,
particularly when at least three variables are involved. Regression analysis, our primary focus,
is one type of multivariable technique. Other techniques will also be described in this text.
Choosing an appropriate technique depends on the purpose of the research and on the types
of variables under investigation (a subject discussed in Chapter 2).
Research may be classified broadly into three types: experimental, quasi-experimental, or
observational. Multivariable techniques are applicable to all such types, yet the confidence
one may reasonably have in the results of a study can vary with the research type. In most
types, one variable is usually taken to be a response or dependent variable—that is, a variable
to be predicted from other variables. The other variables are called predictor or independent
variables.
If observational units (subjects) are randomly assigned to levels of important predictors,
the study is usually classified as an experiment. Experiments are the most controlled type of
study; they maximize the investigator’s ability to isolate the observed effect of the predictors
from the distorting effects of other (independent) variables that might also be related to the
response.
1
The term multivariable is preferable to multivariate. Statisticians generally use the term multivariate analysis to
describe a method in which several dependent variables can be considered simultaneously. Researchers in the bio-
medical and health sciences who are not statisticians, however, use this term to describe any statistical technique
involving several variables, even if only one dependent variable is considered at a time. In this text, we prefer to avoid
the confusion by using the term multivariable analysis to denote the latter, more general description.
1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2 Chapter 1 Concepts and Examples of Research
1.2 Examples
The examples that follow concern real problems from a variety of disciplines and involve
variables to which the methods described in this book can be applied. We shall return to
these examples later when illustrating various methods of multivariable analysis.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.2 Examples 3
■ Example 1.2 Study of race and social influence in cooperative problem-solving dyads,
illustrating the use of analysis of variance and analysis of covariance.
James (1973) conducted an experiment on 140 seventh- and eighth-grade males to
investigate the effects of two factors—race of the experimenter (E) and race of the compari-
son norm (N)—on social influence behaviors in three types of dyads: white–white; black–
black; and white–black. Subjects played a game of strategy called Kill the Bull, in which
14 separate decisions must be made for proceeding toward a defined goal on a game board.
In the game, each pair of players (dyad) must reach a consensus on a direction at each deci-
sion step, after which they signal the E, who then rolls a die to determine how far they can
advance along their chosen path of six squares. Photographs of the current champion players
(N) (either two black youths [black norm] or two white youths [white norm]) were placed
above the game board.
Four measures of social influence activity were used as the outcome variables of inter-
est. One of these, called performance output, was a measure of the number of times a given
subject attempted to influence his dyad to move in a particular direction.
The major research question focused on the outcomes for biracial dyads. Previous
research of this type had used only white investigators and implicit white comparison
norms, and the results indicated that the white partner tended to dominate the decision
making. James’s study sought to determine whether such an “interaction disability,” previ-
ously attributed to blacks, would be maintained, removed, or reversed when the comparison
norm, the experimenter, or both were black. One approach to analyzing this problem was to
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4 Chapter 1 Concepts and Examples of Research
■ Example 1.3 Study of the relationship of cultural change to health, illustrating the use
of analysis of variance.
Patrick and others (1974) studied the effects of cultural change on health in the U.S.
Trust Territory island of Ponape. Medical and sociological data were obtained on a sample
of about 2,000 people by means of physical exams and a sociological questionnaire. This
Micronesian island has experienced rapid Westernization and modernization since American
occupation in 1945. The question of primary interest was whether rapid social and cultural
change caused increases in blood pressure and in the incidence of coronary heart disease. A
specific hypothesis guiding the research was that persons with high levels of cultural ambigu-
ity and incongruity and low levels of supportive affiliations with others have high levels of
blood pressure and are at high risk for coronary heart disease.
A preliminary step in the evaluation of this hypothesis involved measuring three vari-
ables: attitude toward modern life; preparation for modern life; and involvement in modern
life. Each of these variables was created by isolating specific questions from a sociological
questionnaire. Then a factor analysis2 determined how best to combine the scores on spe-
cific questions into a single overall score that defined the variable under consideration. Two
cultural incongruity variables were then defined. One involved the discrepancy between
attitude toward modern life and involvement in modern life; the other was defined as the
discrepancy between preparation for modern life and involvement in modern life.
These variables were then analyzed to determine their relationship, if any, to blood pres-
sure and coronary heart disease. Individuals with large positive or negative scores on either
of the two incongruity variables were hypothesized to have high blood pressure and to be at
high risk for coronary heart disease.
One approach to analysis involved categorizing both discrepancy scores into high and
low groups. Then a two-way analysis of variance could be performed using blood pressure
2
Factor analysis was described in Chapter 24 of the second edition of this text, but this topic is not included as a topic
in this (fifth) edition.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.3 Concluding Remarks 5
as the outcome variable. We will see later that this problem can also be described as a regres-
sion problem. ■
■ Example 1.4 Study of the association between alcohol consumption frequency and
body-mass index (BMI) in the Behavioral Risk Factor Surveillance System (BRFSS).
The BRFSS is a large and ongoing surveillance project managed by the U.S. Centers
for Disease Control and Prevention (CDC) and conducted by state health departments as
telephone-based interviews, based on random-digit dialing. Its purpose is to “generate infor-
mation about health risk behaviors, clinical preventive practices, and health care access and
use primarily related to chronic diseases and injury”(CDC 2012).
The unpublished example considered here examines the relationship between frequency
of alcohol use in the previous 30 days and the response variable of BMI, a common measure
of body fat defined as (weight in kg)Y(height in m)2. Dozens of studies have demonstrated
cardiovascular benefits of red wine consumption. Yet the relationship between alcohol con-
sumption and BMI, an important risk factor for numerous chronic diseases, is less clear.
An analysis of data from the National Health Interview Survey found a moderate reduction
in BMI associated with increasing drinking frequency, yet an increase in BMI with greater
drinking volume (Breslow and Smothers 2005). These relationships were different for males
and females (an example of interaction; see Chapter 11), who are known to metabolize alco-
hol differently.
This analysis of drinking frequency and BMI considers females who live in the state of
Georgia and who consume nonheavy amounts of alcohol (for the 2010 BRFSS data collec-
tion year). Straight-line regression analysis is used to quantify the same negative association
between drinking frequency and BMI found by others. Multiple regression analysis and analy-
sis of covariance are used to additionally consider the effects of age and other health behaviors
(e.g., sleep quality, exercise, and tobacco use) that are known to be associated with BMI.
This example is unique in that it provides key illustrations of the objectives of regres-
sion techniques for the analysis of public health surveillance data on a health outcome with
numerous determinants. These objectives can differ from those used for the analysis of
data emanating from more controlled health studies (such as randomized controlled clini-
cal trials). In particular, the large sample size associated with the BRFSS provides oppor-
tunities for the detection of statistically significant (and sometimes both unexpected and
meaningful) associations between certain determinants and BMI that might otherwise be
challenging to detect. Such hypothesis-generating regression findings can suggest avenues
for further research. It is important to mention that such surveillance studies limit causal
interpretations of the findings. These and related issues are discussed further in several
chapters that follow.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6 Chapter 1 Concepts and Examples of Research
References
Breslow, R. A., and Smothers, B. A. 2005. “Drinking Patterns and Body Mass Index in Never Smokers:
National Health Interview Survey, 1997–2001.” American Journal of Epidemiology 161(4):
368–76.
Campbell, D. T., and Stanley, J. C. 1963. Experimental and Quasi-experimental Designs for Research.
Chicago: Rand McNally.
CDC Office of Surveillance, Epidemiology, and Laboratory Services. 2012. “Behavioral Risk Factor
Surveillance System: BRFSS Frequently Asked Questions (FAQs).” http://www.cdc.gov/brfss/
faqs.htm.
Hulka, B. S.; Kupper, L. L.; Cassel, J. C.; and Thompson, S. J. 1971. “A Method for Measuring
Physicians’ Awareness of Patients’ Concerns.” HSMHA Health Reports 86: 741–51.
James, S. A. 1973. “The Effects of the Race of Experimenter and Race of Comparison Norm on Social
Influence in Same Race and Biracial Problem-Solving Dyads.” Ph.D. dissertation, Department
of Clinical Psychology, Washington University, St. Louis, Mo.
Kleinbaum, D. G.; Kupper, L. L.; and Morgenstern, H. 1982. Epidemiologic Research. Belmont, Calif.:
Lifetime Learning Publications.
Patrick, R.; Cassel, J. C.; Tyroler, H. A.; Stanley, L.; and Wild, J. 1974. “The Ponape Study of
Health Effects of Cultural Change.” Paper presented at the annual meeting of the Society for
Epidemiologic Research, Berkeley, Calif.
Thompson, S. J. 1972. “The Doctor–Patient Relationship and Outcomes of Pregnancy.” Ph.D.
dissertation, Department of Epidemiology, University of North Carolina, Chapel Hill.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2
Classification of Variables
and the Choice of Analysis
2.1.1 Gappiness
In the classification scheme we call gappiness, we determine whether gaps exist between
successively observed values of a variable (Figure 2.1). If gaps exist between observations, the
variable is said to be discrete; if no gaps exist, the variable is said to be continuous. To speak
more precisely, a variable is discrete if, between any two potentially observable values, a value
exists that is not possibly observable. A variable is continuous if, between any two potentially
observable values, another potentially observable value exists.
Examples of continuous variables are age, blood pressure, cholesterol level, height, and
weight. Discrete variables are often counts, such as of the numbers of deaths or car accidents.
Additionally, nonnumeric information is often numerically coded in data sources using dis-
crete variables. Examples of this are sex (e.g., 0 if male and 1 if female), group identification
(e.g., 1 if group A and 2 if group B), and state of disease (e.g., 1 if a coronary heart disease
case and 0 if not a coronary heart disease case).
© Cengage Learning
Gaps No gaps
7
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
8 Chapter 2 Classification of Variables and the Choice of Analysis
Relative frequency
Relative frequency
© Cengage Learning
(a) Histogram of a continuous variable (b) Line chart of a discrete variable
In analyses of actual data, the sampling frequency distributions for continuous variables
are represented differently from those for discrete variables. Data on a continuous variable
are usually grouped into class intervals, and a relative frequency distribution is determined
by counting the proportion of observations in each interval. Such a distribution is usually
represented by a histogram, as shown in Figure 2.2(a). Data on a discrete variable, on the
other hand, are usually not grouped but are represented instead by a line chart, as shown in
Figure 2.2(b).
Discrete variables can sometimes be treated for analysis purposes as continuous variables.
This is possible when the values of such a variable, even though discrete, are not far apart
and cover a wide range of numbers. In such a case, the possible values, although technically
gappy, show such small gaps between values that a visual representation would approximate
an interval (Figure 2.3).
Furthermore, a line chart, like the one in Figure 2.2(b), representing the frequency dis-
tribution of data on such a variable would probably show few frequencies greater than 1 and
thus would be uninformative. As an example, the variable “social class” is usually measured as
discrete; one measure of social class1 takes on integer values between 11 and 77. When data
on this variable are grouped into classes (e.g., 11–15, 16–20, etc.), the resulting frequency
histogram gives a clearer picture of the characteristics of the variable than a line chart does.
Thus, in this case, treating social class as a continuous variable is sometimes more useful than
treating it as discrete.
Just as it is often useful to treat a discrete variable as continuous, some fundamentally
continuous variables may be grouped into categories and treated as discrete variables in a
given analysis. For example, the variable “age” can be made discrete by grouping its values
into two categories, “young” and “old.” Similarly, “blood pressure” becomes a discrete vari-
able if it is categorized into “low,” “medium,” and “high” groups or into deciles.
FIGURE 2.3 Discrete variable that may be treated as continuous (© Cengage Learning)
1
Hollingshead’s “Two-Factor Index of Social Position,” a description of which can be found in Green (1970).
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2.1 Classification of Variables 9
The decision to categorize a continuous variable into discrete levels is nuanced, requiring
consideration of both pros and cons. On the one hand, a discrete version of a variable might
make the data easier to collect and summarize. This often, in turn, aids in the presentation of
results to colleagues. Yet these advantages must be balanced against the loss of information
that comes with converting a continuous variable into a discrete one. The choice of variable
type often impacts the type of analysis that can ultimately be conducted, and the desire to use
a certain analysis technique may drive decisions about the treatment of variables.
A further consideration concerns when to categorize continuous data. One may catego-
rize a continuous variable either at the time of data collection or at the time of data analysis.
The former choice often allows cheaper, quicker, and/or less precise methodology for data
collection to be employed. Yet this may also introduce human error (e.g., when a clini-
cian is given the extra step of classifying a continuous reading into one of several groups).
Categorization at the time of analysis reduces the likelihood of human error and also allows
for multiple classification schemes to be later considered, since the original continuous data
have not been forfeited.
A related issue is that both continuous and discrete variables can be error-prone. Contin-
uous variables can be measured with error, and discrete variables can be misclassified. When
such error-prone variables are used in regression analyses, incorrect statistical conclusions can
be made (i.e., statistical validity can be compromised). In this textbook, it will be assumed
that variables to be considered are not subject to either measurement error or misclassifica-
tion error. A discussion of rigorous statistical methods for dealing with error-prone variables
in regression analyses is beyond the scope of this textbook, but Gustafson (2004) provides
numerous relevant references to such methods.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10 Chapter 2 Classification of Variables and the Choice of Analysis
A variable that can give not only an ordering but also a meaningful measure of the
istance between categories is called an interval variable. To be interval, a variable must be
d
expressed in terms of some standard or well-accepted physical unit of measurement. Height,
weight, blood pressure, and number of deaths all satisfy this requirement, whereas subjective
measures such as perception of pregnancy, personality type, prestige, and social stress do not.
An interval variable that has a scale with a true zero is occasionally designated as a
ratio or ratio-scale variable. An example of a ratio-scale variable is the height of a person.
Temperature is commonly measured in degrees Celsius, an interval scale. Measurement of
temperature in degrees Kelvin is based on a scale that begins at absolute zero and thus is a
ratio variable. An example of a ratio variable common in health studies is the concentration
of a substance (e.g., cholesterol) in the blood.
Ratio-scale variables often involve measurement errors that follow a nonnormal
distribution and are proportional to the size of the measurement. We will see in Chapter 5
that such proportional errors violate an important assumption of linear regression—namely,
equality of error variance for all observations. Hence, the presence of a ratio variable is a
signal to be on guard for a possible violation of this assumption. In Chapter 14 (on regression
diagnostics), we will describe methods for detecting and dealing with this problem.
As with variables in other classification schemes, the same variable may be considered at
one level of measurement in one analysis and at a different level in another analysis. Thus,
“age” may be considered as interval in a regression analysis or, by being grouped into catego-
ries, as nominal in an analysis of variance.
The various levels of mathematical preciseness are cumulative. An ordinal scale possesses
all the properties of a nominal scale plus ordinality. An interval scale is also nominal and
ordinal. The cumulativeness of these levels allows the researcher to drop back one or more lev-
els of measurement in analyzing the data. Thus, an interval variable may be treated as nominal
or ordinal for a particular analysis, and an ordinal variable may be analyzed as nominal.
2
The term independent variable is a historical term meant to evoke the notion that these measured factors may freely
vary from subject to subject, whereas changes in the dependent variable are thought to depend on and be determined by
the values of a subject’s independent variables. This usage of the term independent differs from the statistical concept
of independence. Two variables are statistically independent when the statistical behavior of one variable is completely
unaffected by the statistical behavior of the other variable. When two variables are independent, they are uncorrelated,
although zero correlation does not imply independence. In most regression analysis situations, there are nonzero cor-
relations among the independent (or predictor) variables. Though not ideal terminology, the phrase independent variable
is still commonly used in practice to denote a predictor variable in regression analysis, and we use this standard termi-
nology in this textbook.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2.2 Overlapping of Classification Schemes 11
Perception Worry
Perception Desire
Perception Birth
Informational Communication
Affective Communication Satisfaction
6
Social Class
Age Control
Education variables
© Cengage Learning
Parity
affect relationships among other independent variables and/or the dependent variables but
be of no intrinsic interest in a particular study. Such variables may be referred to as control or
nuisance variables or, in some contexts, as covariates or confounders.
For example, in Thompson’s (1972) study of the relationship between patient per-
ception of pregnancy and patient satisfaction with medical care, the perception variables
are independent variables (or regressors), and the satisfaction variable is the dependent
(or response) variable (Figure 2.4).
Usually, the distinction between independent and dependent variables is clear, as it is in
the examples we have given. Nevertheless, a variable considered as dependent for purposes
of evaluating one study objective may be considered as independent for purposes of evaluat-
ing a different objective. For example, in Thompson’s study, in addition to determining the
relationship of perceptions as independent variables to patient satisfaction, the researcher
sought to determine the relationships of social class, age, and education to perceptions
treated as dependent variables.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12 Chapter 2 Classification of Variables and the Choice of Analysis
Interval
Different
Co
representations
nti
of variable “age” Ordinal
nuo
us
Nominal Variable “sex”
© Cengage Learning
Dis
cre
te
FIGURE 2.5 Overlap of variable classifications
and not of the variable itself. In reading the diagram, one should consider any variable as
being representable by some point within the triangle. If the point falls below the dashed
line within the triangle, it is classified as discrete; if it falls above that line, it is continuous.
Also, a point that falls into the area marked “interval” is classified as an interval variable, and
similarly for the other two levels of measurement.
As Figure 2.5 indicates, any nominal variable must be discrete, but a discrete variable
may be nominal, ordinal, or interval. Also, a continuous variable must be either ordinal
or interval, although ordinal or interval variables may exist that are not continuous. For
example, “sex” is nominal and discrete; “age” may be considered interval and continuous or,
if grouped into categories, nominal and discrete; and “social class,” depending on how it is
measured and on the viewpoint of the researcher, may be considered ordinal and continuous,
ordinal and discrete, or nominal and discrete.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2.3 Choice of Analysis 13
Classification of Variables
Multiple Continuous Classically all To describe the extent, direction, and strength of the
linear regres- continuous, but relationship between several independent variables and a
sion analysis in practice any continuous dependent variable
type(s) can be
used
Logistic Dichotomous A mixture of vari- To determine how one or more independent variables are
regression ous types can be related to the probability of the occurrence of one of two
analysis used possible outcomes
Poisson Discrete A mixture of vari- To determine how one or more independent variables
regression ous types can be are related to the rate of occurrence of some outcome
analysis used
*Generally, a control variable is a variable that must be considered before any relationships of interest can be quantified; this is because a
control variable may be related to the variables of primary interest and must be taken into account in studying the relationships among the
primary variables. For example, in describing the relationship between blood pressure and physical activity, we would probably consider “age”
and “sex” as control variables because they are related to blood pressure and physical activity and, unless taken into account, could confound
any conclusions regarding the primary relationship of interest.
© Cengage Learning
It considers the types of variable sets usually associated with each method and gives a gen-
eral description of the purposes of each method. In addition to using the table, however,
one must carefully check the statistical assumptions being made. These assumptions will be
described fully later in the text. Table 2.2 shows how these guidelines can be applied to the
examples given in Chapter 1.
Several methods for dealing with multivariable problems are not included in Table 2.1
or in this text—among them, nonparametric methods of analysis of variance, multivariate
multiple regression, and multivariate analysis of variance (which are extensions of the cor-
responding methods given here that allow for several dependent variables), as well as methods
of cluster analysis. In this book, we will cover only the multivariable techniques used most
often by health and social researchers.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Another random document with
no related content on Scribd:
DANCE ON STILTS AT THE GIRLS’ UNYAGO, NIUCHI
I see increasing reason to believe that the view formed some time
back as to the origin of the Makonde bush is the correct one. I have
no doubt that it is not a natural product, but the result of human
occupation. Those parts of the high country where man—as a very
slight amount of practice enables the eye to perceive at once—has not
yet penetrated with axe and hoe, are still occupied by a splendid
timber forest quite able to sustain a comparison with our mixed
forests in Germany. But wherever man has once built his hut or tilled
his field, this horrible bush springs up. Every phase of this process
may be seen in the course of a couple of hours’ walk along the main
road. From the bush to right or left, one hears the sound of the axe—
not from one spot only, but from several directions at once. A few
steps further on, we can see what is taking place. The brush has been
cut down and piled up in heaps to the height of a yard or more,
between which the trunks of the large trees stand up like the last
pillars of a magnificent ruined building. These, too, present a
melancholy spectacle: the destructive Makonde have ringed them—
cut a broad strip of bark all round to ensure their dying off—and also
piled up pyramids of brush round them. Father and son, mother and
son-in-law, are chopping away perseveringly in the background—too
busy, almost, to look round at the white stranger, who usually excites
so much interest. If you pass by the same place a week later, the piles
of brushwood have disappeared and a thick layer of ashes has taken
the place of the green forest. The large trees stretch their
smouldering trunks and branches in dumb accusation to heaven—if
they have not already fallen and been more or less reduced to ashes,
perhaps only showing as a white stripe on the dark ground.
This work of destruction is carried out by the Makonde alike on the
virgin forest and on the bush which has sprung up on sites already
cultivated and deserted. In the second case they are saved the trouble
of burning the large trees, these being entirely absent in the
secondary bush.
After burning this piece of forest ground and loosening it with the
hoe, the native sows his corn and plants his vegetables. All over the
country, he goes in for bed-culture, which requires, and, in fact,
receives, the most careful attention. Weeds are nowhere tolerated in
the south of German East Africa. The crops may fail on the plains,
where droughts are frequent, but never on the plateau with its
abundant rains and heavy dews. Its fortunate inhabitants even have
the satisfaction of seeing the proud Wayao and Wamakua working
for them as labourers, driven by hunger to serve where they were
accustomed to rule.
But the light, sandy soil is soon exhausted, and would yield no
harvest the second year if cultivated twice running. This fact has
been familiar to the native for ages; consequently he provides in
time, and, while his crop is growing, prepares the next plot with axe
and firebrand. Next year he plants this with his various crops and
lets the first piece lie fallow. For a short time it remains waste and
desolate; then nature steps in to repair the destruction wrought by
man; a thousand new growths spring out of the exhausted soil, and
even the old stumps put forth fresh shoots. Next year the new growth
is up to one’s knees, and in a few years more it is that terrible,
impenetrable bush, which maintains its position till the black
occupier of the land has made the round of all the available sites and
come back to his starting point.
The Makonde are, body and soul, so to speak, one with this bush.
According to my Yao informants, indeed, their name means nothing
else but “bush people.” Their own tradition says that they have been
settled up here for a very long time, but to my surprise they laid great
stress on an original immigration. Their old homes were in the
south-east, near Mikindani and the mouth of the Rovuma, whence
their peaceful forefathers were driven by the continual raids of the
Sakalavas from Madagascar and the warlike Shirazis[47] of the coast,
to take refuge on the almost inaccessible plateau. I have studied
African ethnology for twenty years, but the fact that changes of
population in this apparently quiet and peaceable corner of the earth
could have been occasioned by outside enterprises taking place on
the high seas, was completely new to me. It is, no doubt, however,
correct.
The charming tribal legend of the Makonde—besides informing us
of other interesting matters—explains why they have to live in the
thickest of the bush and a long way from the edge of the plateau,
instead of making their permanent homes beside the purling brooks
and springs of the low country.
“The place where the tribe originated is Mahuta, on the southern
side of the plateau towards the Rovuma, where of old time there was
nothing but thick bush. Out of this bush came a man who never
washed himself or shaved his head, and who ate and drank but little.
He went out and made a human figure from the wood of a tree
growing in the open country, which he took home to his abode in the
bush and there set it upright. In the night this image came to life and
was a woman. The man and woman went down together to the
Rovuma to wash themselves. Here the woman gave birth to a still-
born child. They left that place and passed over the high land into the
valley of the Mbemkuru, where the woman had another child, which
was also born dead. Then they returned to the high bush country of
Mahuta, where the third child was born, which lived and grew up. In
course of time, the couple had many more children, and called
themselves Wamatanda. These were the ancestral stock of the
Makonde, also called Wamakonde,[48] i.e., aborigines. Their
forefather, the man from the bush, gave his children the command to
bury their dead upright, in memory of the mother of their race who
was cut out of wood and awoke to life when standing upright. He also
warned them against settling in the valleys and near large streams,
for sickness and death dwelt there. They were to make it a rule to
have their huts at least an hour’s walk from the nearest watering-
place; then their children would thrive and escape illness.”
The explanation of the name Makonde given by my informants is
somewhat different from that contained in the above legend, which I
extract from a little book (small, but packed with information), by
Pater Adams, entitled Lindi und sein Hinterland. Otherwise, my
results agree exactly with the statements of the legend. Washing?
Hapana—there is no such thing. Why should they do so? As it is, the
supply of water scarcely suffices for cooking and drinking; other
people do not wash, so why should the Makonde distinguish himself
by such needless eccentricity? As for shaving the head, the short,
woolly crop scarcely needs it,[49] so the second ancestral precept is
likewise easy enough to follow. Beyond this, however, there is
nothing ridiculous in the ancestor’s advice. I have obtained from
various local artists a fairly large number of figures carved in wood,
ranging from fifteen to twenty-three inches in height, and
representing women belonging to the great group of the Mavia,
Makonde, and Matambwe tribes. The carving is remarkably well
done and renders the female type with great accuracy, especially the
keloid ornamentation, to be described later on. As to the object and
meaning of their works the sculptors either could or (more probably)
would tell me nothing, and I was forced to content myself with the
scanty information vouchsafed by one man, who said that the figures
were merely intended to represent the nembo—the artificial
deformations of pelele, ear-discs, and keloids. The legend recorded
by Pater Adams places these figures in a new light. They must surely
be more than mere dolls; and we may even venture to assume that
they are—though the majority of present-day Makonde are probably
unaware of the fact—representations of the tribal ancestress.
The references in the legend to the descent from Mahuta to the
Rovuma, and to a journey across the highlands into the Mbekuru
valley, undoubtedly indicate the previous history of the tribe, the
travels of the ancestral pair typifying the migrations of their
descendants. The descent to the neighbouring Rovuma valley, with
its extraordinary fertility and great abundance of game, is intelligible
at a glance—but the crossing of the Lukuledi depression, the ascent
to the Rondo Plateau and the descent to the Mbemkuru, also lie
within the bounds of probability, for all these districts have exactly
the same character as the extreme south. Now, however, comes a
point of especial interest for our bacteriological age. The primitive
Makonde did not enjoy their lives in the marshy river-valleys.
Disease raged among them, and many died. It was only after they
had returned to their original home near Mahuta, that the health
conditions of these people improved. We are very apt to think of the
African as a stupid person whose ignorance of nature is only equalled
by his fear of it, and who looks on all mishaps as caused by evil
spirits and malignant natural powers. It is much more correct to
assume in this case that the people very early learnt to distinguish
districts infested with malaria from those where it is absent.
This knowledge is crystallized in the
ancestral warning against settling in the
valleys and near the great waters, the
dwelling-places of disease and death. At the
same time, for security against the hostile
Mavia south of the Rovuma, it was enacted
that every settlement must be not less than a
certain distance from the southern edge of the
plateau. Such in fact is their mode of life at the
present day. It is not such a bad one, and
certainly they are both safer and more
comfortable than the Makua, the recent
intruders from the south, who have made USUAL METHOD OF
good their footing on the western edge of the CLOSING HUT-DOOR
plateau, extending over a fairly wide belt of
country. Neither Makua nor Makonde show in their dwellings
anything of the size and comeliness of the Yao houses in the plain,
especially at Masasi, Chingulungulu and Zuza’s. Jumbe Chauro, a
Makonde hamlet not far from Newala, on the road to Mahuta, is the
most important settlement of the tribe I have yet seen, and has fairly
spacious huts. But how slovenly is their construction compared with
the palatial residences of the elephant-hunters living in the plain.
The roofs are still more untidy than in the general run of huts during
the dry season, the walls show here and there the scanty beginnings
or the lamentable remains of the mud plastering, and the interior is a
veritable dog-kennel; dirt, dust and disorder everywhere. A few huts
only show any attempt at division into rooms, and this consists
merely of very roughly-made bamboo partitions. In one point alone
have I noticed any indication of progress—in the method of fastening
the door. Houses all over the south are secured in a simple but
ingenious manner. The door consists of a set of stout pieces of wood
or bamboo, tied with bark-string to two cross-pieces, and moving in
two grooves round one of the door-posts, so as to open inwards. If
the owner wishes to leave home, he takes two logs as thick as a man’s
upper arm and about a yard long. One of these is placed obliquely
against the middle of the door from the inside, so as to form an angle
of from 60° to 75° with the ground. He then places the second piece
horizontally across the first, pressing it downward with all his might.
It is kept in place by two strong posts planted in the ground a few
inches inside the door. This fastening is absolutely safe, but of course
cannot be applied to both doors at once, otherwise how could the
owner leave or enter his house? I have not yet succeeded in finding
out how the back door is fastened.