Download ebooks file Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee all chapters
Download ebooks file Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee all chapters
com
https://ebookmeta.com/product/handbook-of-regression-
analysis-with-applications-in-r-second-edition-samprit-
chatterjee/
OR CLICK HERE
DOWLOAD NOW
https://ebookmeta.com/product/a-second-course-in-statistics-
regression-analysis-8th-edition-william-mendenhall/
ebookmeta.com
https://ebookmeta.com/product/an-introduction-to-statistical-learning-
with-applications-in-r-second-edition-gareth-james/
ebookmeta.com
https://ebookmeta.com/product/regression-analysis-in-r-a-
comprehensive-view-for-the-social-sciences-1st-edition-jocelyn-e-
bolin/
ebookmeta.com
https://ebookmeta.com/product/hi-cacti-growing-houseplants-happiness-
sabina-palermo/
ebookmeta.com
Ethics the Heart of Leadership 3rd Edition Joanne B Ciulla
https://ebookmeta.com/product/ethics-the-heart-of-leadership-3rd-
edition-joanne-b-ciulla/
ebookmeta.com
https://ebookmeta.com/product/cecilia-the-doll-duet-book-1-1st-
edition-m-k-moore-k-l-fast-moore-m-k/
ebookmeta.com
https://ebookmeta.com/product/basic-surgical-skills-an-illustrated-
guide-graeme-m-downes/
ebookmeta.com
https://ebookmeta.com/product/buddhism-from-a-catholic-perspective-
paul-m-williams/
ebookmeta.com
https://ebookmeta.com/product/bite-of-the-truth-the-black-fan-2-1st-
edition-laura-greenwood/
ebookmeta.com
Digital Logic & Microprocessor Design With Interfacing,
2nd Edition Enoch O. Hwang
https://ebookmeta.com/product/digital-logic-microprocessor-design-
with-interfacing-2nd-edition-enoch-o-hwang/
ebookmeta.com
Handbook of Regression Analysis
With Applications in R
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors
David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey
Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith, and
Ruey S. Tsay
Editors Emeriti
Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G.
Kendall, and Jozef L. Teugels
A complete list of the titles in this series appears at the end of this volume.
Handbook of Regression
Analysis With Applications
in R
Second Edition
Samprit Chatterjee
New York University, New York, USA
Jeffrey S. Simonoff
New York University, New York, USA
This second edition first published 2020
© 2020 John Wiley & Sons, Inc
Edition History
Wiley-Blackwell (1e, 2013)
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by
law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/
permissions.
The right of Samprit Chatterjee and Jeffery S. Simonoff to be identified as the authors of this work has been
asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us
at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that
appears in standard print versions of this book may not be available in other formats.
10 9 8 7 6 5 4 3 2 1
Dedicated to everyone who labors in the field
of statistics, whether they are students,
teachers, researchers, or data analysts.
Contents
Part I
The Multiple Linear Regression Model
2 Model Building 23
2.1 Introduction 23
2.2 Concepts and Background Material 24
2.2.1 Using Hypothesis Tests to Compare Models 24
2.2.2 Collinearity 26
2.3 Methodology 29
2.3.1 Model Selection 29
2.3.2 Example — Estimating Home Prices
(continued) 31
2.4 Indicator Variables and Modeling Interactions 38
2.4.1 Example — Electronic Voting and the 2004
Presidential Election 40
2.5 Summary 46
vii
viii CONTENTS
Part II
Addressing Violations of Assumptions
Part III
Categorical Predictors
Part IV
Non-Gaussian Regression Models
Part V
Other Regression Models
Part VI
Nonparametric and Semiparametric
Models
Bibliography 337
Index 343
Preface to the
Second Edition
The years since the first edition of this book appeared have been fast-moving
in the world of data analysis and statistics. Algorithmically-based methods
operating under the banner of machine learning, artificial intelligence, or
data science have come to the forefront of public perceptions about how to
analyze data, and more than a few pundits have predicted the demise of classic
statistical modeling.
To paraphrase Mark Twain, we believe that reports of the (impending)
death of statistical modeling in general, and regression modeling in particular,
are exaggerated. The great advantage that statistical models have over “black
box” algorithms is that in addition to effective prediction, their transparency
also provides guidance about the actual underlying process (which is crucial
for decision making), and affords the possibilities of making inferences and
distinguishing real effects from random variation based on those models.
There have been laudable attempts to encourage making machine learning
algorithms interpretable in the ways regression models are (Rudin, 2019), but
we believe that models based on statistical considerations and principles will
have a place in the analyst’s toolkit for a long time to come.
Of course, part of that usefulness comes from the ability to generalize
regression models to more complex situations, and that is the thrust of the
changes in this new edition. One thing that hasn’t changed is the philosophy
behind the book, and our recommendations on how it can be best used, and
we encourage the reader to refer to the preface to the first edition for guidance
on those points. There have been small changes to the original chapters, and
broad descriptions of those chapters can also be found in the preface to the
first edition. The five new chapters (Chapters 11, 13, 14, 15, and 16, with
the former chapter 11 on nonlinear regression moving to Chapter 12) expand
greatly on the power and applicability of regression models beyond what
was discussed in the first edition. For this reason many more references are
provided in these chapters than in the earlier ones, since some of the material
in those chapters is less established and less well-known, with much of it still
the subject of active research. In keeping with that, we do not spend much
(or any) time on issues for which there still isn’t necessarily a consensus in the
statistical community, but point to books and monographs that can help the
analyst get some perspective on that kind of material.
Chapter 11 discusses the modeling of time-to-event data, often referred
to as survival data. The response variable measures the length of time until an
event occurs, and a common complicator is that sometimes it is only known
xv
xvi PREFACE TO THE SECOND EDITION
that a response value is greater than some number; that is, it is right-censored.
This can naturally occur, for example, in a clinical trial in which subjects
enter the study at varying times, and the event of interest has not occurred at
the end of the trial. Analysis focuses on the survival function (the probability
of surviving past a given time) and the hazard function (the instantaneous
probability of the event occurring at a given time given survival to that
time). Parametric models based on appropriate distributions like the Weibull
or log-logistic can be fit that take censoring into account. Semiparametric
models like the Cox proportional hazards model (the most commonly-used
model) and the Buckley-James estimator are also available, which weaken
distributional assumptions. Modeling can be adapted to situations where
event times are truncated, and also when there are covariates that change over
the life of the subject.
Chapter 13 extends applications to data with multiple observations for
each subject consistent with some structure from the underlying process. Such
data can take the form of nested or clustered data (such as students all in
one classroom) or longitudinal data (where a variable is measured at multiple
times for each subject). In this situation ignoring that structure results in an
induced correlation that reflects unmodeled differences between classrooms
and subjects, respectively. Mixed effects models generalize analysis of variance
(ANOVA) models and time series models to this more complicated situation.
Models with linear effects based on Gaussian distributions can be generalized
to nonlinear models, and also can be generalized to non-Gaussian distributions
through the use of generalized linear mixed effects models.
Modern data applications can involve very large (even massive) numbers of
predictors, which can cause major problems for standard regression methods.
Best subsets regression (discussed in Chapter 2) does not scale well to very
large numbers of predictors, and Chapter 14 discusses approaches that can
accomplish that. Forward stepwise regression, in which potential predictors
are stepped in one at a time, is an alternative to best subsets that scales
to massive data sets. A systematic approach to reducing the dimensionality
of a chosen regression model is through the use of regularization, in which
the usual estimation criterion is augmented with a penalty that encourages
sparsity; the most commonly-used version of this is the lasso estimator, and it
and its generalizations are discussed further.
Chapters 15 and 16 discuss methods that move away from specified
relationships between the response and the predictor to nonparametric and
semiparametric methods, in which the data are used to choose the form of
the underlying relationship. In Chapter 15 linear or (specifically specified)
nonlinear relationships are replaced with the notion of relationships taking the
form of smooth curves and surfaces. Estimation at a particular location is based
on local information; that is, the values of the response in a local neighborhood
of that location. This can be done through local versions of weighted least
squares (local polynomial estimation) or local regularization (smoothing
splines). Such methods can also be used to help identify interactions between
numerical predictors in linear regression modeling. Single predictor smoothing
PREFACE TO THE SECOND EDITION xvii
SAMPRIT CHATTERJEE
Brooksville, Maine
JEFFREY S. SIMONOFF
New York, New York
October, 2019
Random documents with unrelated
content Scribd suggests to you:
*** END OF THE PROJECT GUTENBERG EBOOK MAGAZINE OF
WESTERN HISTORY, ILLUSTRATED, VOL. I, NO. 1, NOVEMBER
1884 ***
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.