[Ebooks PDF] download Data Structures and Algorithm Analysis in C 2nd Edition China Reprint Edition Weiss full chapters
[Ebooks PDF] download Data Structures and Algorithm Analysis in C 2nd Edition China Reprint Edition Weiss full chapters
https://ebookfinal.com
https://ebookfinal.com/download/data-structures-
and-algorithm-analysis-in-c-2nd-edition-china-
reprint-edition-weiss/
https://ebookfinal.com/download/data-structures-and-algorithm-
analysis-in-java-3rd-edition-edition-mark-a-weiss/
ebookfinal.com
https://ebookfinal.com/download/data-structures-and-algorithm-
analysis-in-java-3rd-edition-dr-clifford-a-shaffer/
ebookfinal.com
https://ebookfinal.com/download/fundamentals-of-data-structures-
in-c-2nd-edition-ellis-horowitz/
ebookfinal.com
https://ebookfinal.com/download/adts-data-structures-and-problem-
solving-with-c-2nd-edition-larry-r-nyhoff/
ebookfinal.com
Data structures and algorithms in Java 2nd Edition Adam
Drozdek
https://ebookfinal.com/download/data-structures-and-algorithms-in-
java-2nd-edition-adam-drozdek/
ebookfinal.com
https://ebookfinal.com/download/data-structures-using-c-second-
edition-reema-thareja/
ebookfinal.com
https://ebookfinal.com/download/data-structures-and-other-objects-
using-c-4th-edition-main/
ebookfinal.com
https://ebookfinal.com/download/data-analysis-of-asymmetric-
structures-advanced-approaches-in-computational-statistics-1st-
edition-takayuki-saito/
ebookfinal.com
https://ebookfinal.com/download/c-programming-program-design-
including-data-structures-5th-edition-d-s-malik/
ebookfinal.com
Data Structures and Algorithm Analysis in C 2nd Edition
China Reprint Edition Weiss Digital Instant Download
Author(s): Weiss, Mark A.
ISBN(s): 9787111312802, 7111312805
Edition: 2
File Details: PDF, 114.91 MB
Year: 2010
Language: english
1•
.a
..
2)
Data Structures
and
CCNO
Or T I 0
.Algorithm Analysis in C
&4Jfl3flft C4M! (.
Data Structures and Algorithm Analysis in C (Second Edition)
*WiW5,2Ot3O$tt*#n*ttt—, *I*BL*flJ?nn
fl, fl t*WSNJ?ncJntX*JJ, #JflJW, EWfl5OOfThic
M*#M$tT&hTL IitC#!ifl
W, #flfletjfl.. iüBtTfi4R0
%fflWtTMiI3flR, W—&M
tnTRfl4fluA, IM TTflW treaptL. k-d*t
www.pearsonhighered.com
ISBN 978-7-111-31280-2
WNa4t: (010) 88378991, 88361066
itftMM: hzjsj@hzbook.com
9 87111 312802
1J9I4t: www.chlna-pub.com
-4* P44 ffh 45.OOYE
()& 2j)
II:!
II# I
liii
‘lit
! ‘i )z Jq
I *
I I
I I I
I
II I I 11
A
English reprint edition copyright © 2010 by Pearson Education Asia Limited
and China Machine Press.
Original English language title: Data Structures and Algorithm Analysis in C,
Second Edition (ISBN 978-0-201-49840-0) by Mark Allen Weiss, Copyright © 1997.
All rights reserved.
Published by arrangement with the original publisher, Pearson Education, Inc.,
publishing as Addison-Wesley.
For sale and distribution in the People’s Republic of China exclusively (except
Taiwan, Hong Kong SAR and Macau SAR).
$wpn, ft:vtR
*1*ItJiM
: 01-2010-4175
*4€:
(CIP) fl
l50mmx2l4mm l6.5EFJ*
ISBN 978-7-111-31280-2
tift: 45.005
, &&-flfl. A*2J°
4WIS$flfli 3b&ftfl”. 1998
U
Peterson*—4fttAfi,
Ilk, flflTh IU+&*L *JJW NT)SAk
L
“t±flf4flt” kIt #JT 34flj4ffiJjfl, N
fl4ZflTrtWflfl, JtflTa1Iiflnfl
iv
If
j+, “i4”
Tt
, #iE
“I1n
11TfliJE0
ft, WtJL*I
iUfl
Ijni
1flI±IE, i9fl1:
*: www.hzbook.com
hzjsj@hzbook.com =
(010) 88379604
J4hj1:
100037
PREFACE
Purpose/Goals
This book describes data structures, methods of organizing large amounts of data,
and algorithm analysis, the estimation of the running time of algorithms. As computers
become faster and faster, the need for programs that can handle large amounts
of input becomes more acute. Paradoxically, this requires more careful attention to
efficiency, since inefficiencies in programs become most obvious when input sizes are
large. By analyzing an algorithm before it is actually coded, students can decide if a
particular solution will be feasible. For example, in this text students look at specific
problems and see how careful implementations can reduce the time constraint for
large amounts of data from 16 years to less than a second. Therefore, no algorithm
or data structure is
presented without an explanation of its running time. In some
cases, minute details that affect the running time of the implementation are explored.
Once a solution method is determined, a program must still be written. As
computers have become more
powerful, the problems they must solve have become
larger and more complex, requiring development of more intricate programs. The
goal of this text is to teach students good programming and algorithm analysis skills
simultaneously so that they can develop such programs with the maximum amount
of efficiency.
This book is suitable for either an advanced data structures (CS7) course or
Approach
I believe it is important for students
to learn how to program for themselves, not
discuss realistic programming issues without including sample code. For this reason,
the book usually provides about one-half to three-quarters of an implementation,
and the student is encouraged to supply the rest. Chapter 12, which is new to this
edition, discusses additional data structures with an emphasis on implementation
details.
vi Preface
The algorithms in this book are presented in ANSI C, which, despite some
flaws, is arguably the most popular systems programming language. The use of C
instead of Pascal allows the use of dynamically allocated arrays (see, for instance,
rehashing in Chapter 5). It also produces simplified code in several places, usually
because the and (&&) operation is short-circuited.
Most criticisms of C center on the fact that it is easy to write code that is barely
readable. Some of the more standard tricks, such as the simultaneous assignment
and testing against 0 via
if (x=y)
aregenerally not used in the text, since the loss of clarity is compensated by only a
few keystrokes and no increased speed. I believe that this book demonstrates that
unreadable code can be avoided by exercising reasonable care.
Overview
Chapter 1 contains review material on discrete math and recursion. I believe the only
way to be comfortable with recursion is to see good uses over and over. Therefore,
recursion is prevalent in this text, with examples in every chapter except Chapter 5.
Chapter 2 deals with algorithm analysis. This chapter explains asymptotic analysis
and its major weaknesses. Many examples are provided, including an in-depth
explanation of logarithmic running time. Simple recursive programs are analyzed
by intuitively converting them into iterative programs. More complicated divide-
and-conquer programs are introduced, but some of the analysis (solving recurrence
relations) is implicitly delayed until Chapter 7, where it is performed in detail.
Chapter 3 covers lists, stacks, and queues. The emphasis here is on coding
these data structures using ADT5, fast implementation of these data structures, and
an exposition of some of their uses. There are almost no programs (just routines),
but the exercises contain plenty of ideas for programming assignments.
Chapter 4 covers trees, with an emphasis on search trees, including external
search trees (B-trees). The uix file system and expression trees are used as examples.
AVL trees and splay trees are introduced but not analyzed. Seventy-five percent of the
code is written, leaving similar cases to be completed by the student. More careful
treatment of search tree implementation details is found in Chapter 12. Additional
coverage of trees, such as file compression and game trees, is deferred until Chapter
10. Data structures for an external medium are considered as the final topic in
severalchapters.
Chapter 5 is a relatively short chapter concerning hash tables. Some analysis is
performed, and extendible hashing is covered at the end of the chapter.
Chapter 6 is about priority queues. Binary heaps are covered, and there is
additional material on some of the theoretically interesting implementations of
priority queues. The Fbonacci heap is discussed in Chapter 11, and the pairing heap
is discussed in Chapter 12.
Preface vii
Chapter 7 covers sorting. It is very specific with respect to coding details and
analysis. All the important general-purpose sorting algorithms are covered and
compared. Four algorithms are analyzed in detail: insertion sort, Sheilsort, heapsort,
and quicksort. The analysis of the average-case running time of heapsort is new to
this edition. External sorting is covered at the end of the chapter.
Chapter 8 discusses the disjoint set algorithm with proof of the running time.
This is a short and specific chapter that can be skipped if Kruskal’s algorithm is not
discussed.
Chapter 9 covers graph algorithms. Algorithms on graphs are interesting, not
only because they frequently occur in practice but also because their running time is
so heavily dependent on the proper use of data structures. Virtually all of the standard
algorithms are presented along with appropriate data structures, pseudocode, and
analysis of running time. To place these problems in a proper context, a short
discussion on complexity theory (including NP-completeness and undecidability) is
provided.
Chapter 10 covers algorithm design by examining common problem-solving
techniques. This chapter is heavily fortified with examples. Pseudocode is used in
these later chapters so that the student’s appreciation of an example algorithm is not
obscured by implementation details.
Chapter 11 deals with amortized analysis. Three data structures from Chapters
4 and 6 and the Fibonacci heap, introduced in this chapter, are analyzed.
Chapter 12 is new to this edition. It covers search tree algorithms, the k-d tree,
and the pairing heap. This chapter departs from the rest of the text by providing
complete and careful implementations for the search trees and pairing heap. The
material is structured so that the instructor can integrate sections into discussions
from other chapters. For example, the top-down red black tree in Chapter 12 can
be discussed under AVL trees (in Chapter 4).
Chapters 1—9
provide enough material for most one-semester data structures
Exercises
Exercises, provided at the end of each chapter, match the order in which material
is presented. The last exercises may address the chapter as a whole rather than a
specific section. Difficult exercises are marked with an asterisk, and more challenging
exercises have two asterisks.
A solutions manual containing solutions to almost all the exercises is available
to instructors from the Addison-Wesley Publishing Company.
viii Preface
References
References are placed at the end of each chapter. Generally the references either
are historical, representing the original source of the material, or they represent
extensions and improvements to the results given in the text. Some references
represent solutions to exercises.
Code Availability
The program code in this book is available via anonymous ftp
example
at is also accessible through the World Wide Web; the URL is
aw.coni. It
Acknowledgments
Many, many people have helped me in the preparation of books in this series. Some
are listed in other versions of the book; thanks to all.
For this edition, I would like to thank my editors at Addison-Wesley, Carter
Shanklin and Susan Hartman. Ten Hyde did another wonderful job with the
production, and Matthew Harris and his staff at Publication Services did their usual
fine work putting the final pieces together.
M.A. W.
Miami, Florida
1996
July,
CONTENTS
Introduction 1
1.1. What’stheBookAbout? 1
Sununary 12
Exercises 12
References 13
2 Algorithm Analysis 15
2.1. Mathematical Background 15
2.2. Model 18
2.3. WhattoAnalyze 18
2.4. Running Time Calculations 20
2.4.1. ASimpleExample 21
2.4.2. GeneraiRules 21
2.4.3. Solutions for the Maximum Subsequence Sum Problem 24
2.4.4. Logarithms in the Running Time 28
2.4.5. CheckingYourAnalysis 33
2.4.6. A Grain of Salt 33
Summary 34
Exercises 35
References 39
x Contents
4 Trees 89
4.1. Preliminaries 89
4.1.1. Implementation of Trees 90
4.1.2. TreeTraversalswithanApplication 91
4.2. BinaryTrees 95
4.2.1. Implementation 96
4.2.2. Expression Trees 97
5 Hashing 149
5.1. General Idea 149
5.2. Hash Function 150
5.3. Separate Chaining 152
5.4. Open Addressing 157
5.4.1. linear Probing 157
5.4.2. QuadratIc Probing 160
5.4.3. Double Hashing 164
5.5. Rehashing 165
5.6. Extendible Hashing 168
Summary 171
ExercIses 172
References 175
Summary 212
Exercises 212
References 216
7 Sorting 219
7.1. Preliminaries 219
7.2. Insertion Sort 220
7.2.1. The Algorithm 220
7.2.2. AnalysisoflnsertionSort 221
Summary 337
Exercises 337
References 343
Index 501
CHAPTER 1
Introduction
In this chapter, we discuss the aims and goals of this text and briefly review
programming concepts and discrete mathematics. We will
See that how a program performs for reasonably large input is just as important
as itsperformance on moderate amounts of input.
Summarize the basic mathematical background needed for the rest of the
book.
Suppose you have a group of N numbers and would like to determine the kth largest.
This is known the selection problem. Most students who have had aprogramming
as
course or two would have no difficulty writing a program to solve this problem.
There are quite a few “obvious”solutions.
One way to solve this problem would be to read the N numbers into an array,
sort the array in decreasing order by some simple algorithm such as bubblesort, and
then return the element in position k.
A somewhat better algorithm might be to read the first k elements into an array
and sort them (in decreasing order). Next, each remaining element is read one by
one. As a new element arrives, it is
ignored if it is smaller than the kth element
in the array. Otherwise, it is placed in its correct spot in the array, bumping one
element out of the array. When the algorithm ends, the element in the kth position
is returned as the answer.
Both algorithms are simple to code, and you are encouraged to do so. The
natural questions, then, are which is better and, more important, is either
algorithm
algorithm good enough? A simulation using a random file of 1 million elements
and k =
500,000 will show that neither algorithm finishes in a reasonable amount
they cannot be considered good algorithms, because they are entirely impractical for
input sizes that a third algorithm can handle in a reasonable amount of time.
A second problem is to solve a popular word puzzle. The input consists of a
two-dimensional array of letters and a list of words. The object is to find the words
in the puzzle. These words may be horizontal, vertical, or diagonal in any direction.
As an example, the puzzle shown in Figure 1.1 contains the words this, two, fat,
and that. The word this begins at 1, column 1, or (1,1), and extends to (1,4);
row
two goes from (1,1) to (3,1); fat goes from (4,1) to (2,3); and that goes from (4,4)
to (1,1).
Again, there are at least two straightforward algorithms that solve the problem.
For each word in the word list, we check each ordered
triple (row, column,
orientation) for the presence of the word. This amounts to lots of nested for loops
but is basically straightforward.
Alternatively, for each ordered quadruple (row, column, orientatton, number
of characters) that doesn’t run off an end of the puzzle, we can test whether the
word indicated is in the word list. Again, this amounts to lots of nested for loops. It
is possible to save some time if the maximum number of characters in any word is
known.
It is relatively easy ro code up either method of solution and solve many of the
real-life puzzles commonly published in magazines. These typically have 16 rows, 16
columns, and 40 or so words. Suppose, however, we consider the variation where
only the puzzle board is given and the word list is essentially an English dictionary.
Both of the solutions proposed require considerable time to solve this problem and
therefore are not acceptable. However, it is possible, even with a large word list, to
solve the problem in a matter of seconds.
An important concept is that, in many problems, writing a working program is
not good enough. If the program is to be run on a large data set, then the running
time becomes an issue. Throughout this book we will see how to estimate the
running time of a program for large inputs and, more important, how to compare
the running times of two programs without actually coding them. We will see
techniques for drastically improving the speed of a program and for determining
program bottlenecks. These techniques will enable us to find the section of the code
on which to concentrate our optimization efforts.
1 2 3 4
1 t h i s
2 w a t s
3 o a h g
4 1 g d t
Chapter 1 Introduction 3
This section lists some of the basic formulas you need to memorize or be able to
1.2.1. Exponents
XAXB =
XAt8
yA
-
XAB
XB
(XA)B =
XAB
+ XN =
2XN X2”
2’ + 2N’ 2N+1
1.2.2. Logarithms
In computer science, all logarithms are to the base 2 unless specified otherwise.
DEFINmON: XA =
B if and only if logy B =
A
ThEOREM 1.1.
logc B
logB= ; C>O
Iog A
PROOF:
Let X loge B, Y
=
loge A, and Z log B. Then, by the definition of
= =
logarithms, C B, C A, and AZ
=
B. Combining these three equalities
= =
yields (C Cx B. Therefore, X
=
YZ, which implies Z X/Y,
= =
ThEOREM 1.2.
logAB =
logA + logB
PROOF:
Let X =
log A, Y =
log B, and Z =
logAB. Then, assuming the default base
of 2, 2X =
A, 2’ =
B, and 2Z =
AB. Combining the last three equalities yields
= =
AB. Therefore, X + Y =
Z, which proves the theorem.
Some other useful formulas, which can all be derived in a similar manner,
follow.
4 Data Structures and Algorithm Analysis in C
log A/B =
log A log B
log(A8) =
BlogA
logX<X forallX>0
logi =
0, log2
=
1, log 1,024 =
10, log 1,048,576 =
20
1.2.3. SerIes
=
—1
and as N tends to x, the sum approaches 1/(1 A). These are the “geometricseries”
formulas.
We can derive the last formula for A’ (0 < A < 1) in the following
manner. Let S be the sum. Then
S =
l+A+A2÷A3÷A4+A5+•
Then
AS =A+A2+A3+A4+A5+
If we subtract these two equations (which is permissible only for a convergent series),
virtually all the terms on the right side cancel, leaving
S AS =
I
We can use this same technique to compute i12’, a sum that occurs
frequently. We write
12 3 4 S
S =
i=1
2
remember this is add the first and last terms (total 3k + 1), the second and next
to
to last terms (total 3k + 1), and so on. Since there are k/2 of these pairs, the total
The next two formulas pop up now and then but are fairly uncommon.
N(N + 1)(2N + 1)
V12 =
6 3
:=1
kø—1
When k —1, the latter formula is not valid. We then need the following
=
formula, which is used far more in computer science than in other mathematical
disciplines. The numbers HN are known as the harmonic numbers, and the sum
is known as a harmonic sum. The error in the following approximation tends to
y 0.57721566, which is known as Euler’s constant.
=
11N =,-=logN
These two formulas are Just general algebraic manipulations.
=
Nf(N)
1(1) =
f(i)
i’1 1=1
i=no
1.2.4. ModularAritbmetic
The two most common ways of proving statements in data structure analysis
are proof by induction and proof by contradiction (and occasionally proof by
intimidation, used by professors only). The best way of proving that a theorem is
false is by exhibiting a counterexample.
Proof by Induction
A proof by induction has The first step is proving a base
two standard parts.
value(s); this step is almost always trivial. Next, an inductive hypothesis is assumed.
Generally this means that the theorem is assumed to be true for all cases up to some
limit k. Using this assumption, the theorem is then shown to be true for the next
F3 =
3, F4 5,..., F,
=
F,.1 F,_2, satisfy F,
=
+ (5/3)’, for i 1. (Some
<
definitions have F0 0, which shifts the series.) To do this, we first verify that
=
the theorem is true for the trivial cases. It is easy to verify that F1 =
1 < 5/3 and
F2 2 < 25/9; this proves the basis. We assume that the theorem is true for i =
1,
2,..., k; this is the inductive hypothesis. To prove the theorem, we need to show
F51 =
Fk + Fk_1
by the definition, and we can use the inductive hypothesis on the right-hand side,
obtaining
Fk+l < (5/3)k + (5f3)’
< (3/5)(5/3)k+1 + (3/5)2(5/3)k+1
< (3/5)(5/3)k+1 + (9/25)(5/3)k+1
which simplifies to
ThEOREM 1.3.
N(N+1)(2N+1)
uN >
1,theni2 =
The proof is by induction. For the basis, it is readily seen that the theorem is true
when N =
1. For the inductive hypothesis, assume that the theorem is true for
Chapter 1 Introduction 7
1 k <
N. We will establish that, under this assumption, the theorem is true
for N + 1. We have
+(N + 1)2
N+1
N(N+1)(2N+l)+(N+l)2
N(2N + 1)
=(N+1) +(N+1)
6
2N2 + 7N + 6
=(N+1)
6
(N + 1)(N + 2)(2N + 3)
6
Thus,
N+l
(N + 1)[(N + 1) + 1][2(N + 1) + 1]
i=1
6
Proof by Counterexainpie
The statement Fk k2 is false. The easiest way to prove this is to compute
F11 =
144> 112.
Proof by Contradiction
Proof by contradiction proceeds by assuming that the theorem is false and showing
that this assumption implies that known property is false, and hence the
some
original assumption was erroneous. A classic example is the proof that there is an
infinite number of primes. To prove this, we assume that the theorem is false, so
that there is some largest prime Pk. Let P1, P2,..., Pt be all the primes in order and
consider
N=PiP2P3•P+1
I nt
F( mt X )
1*1*1 if(X==O)
1* 2*! return 0;
else
return2*F(X_1)+X*X;
Most mathematical functions that we are familiar with are described by a simple
formula. For instance, we can convert temperatures from Fahrenheit to Celsius by
applying the formula
C =
S(F 32)19
Given this formula, it is trivial to write a C function; with declarations and braces
removed, the one-line formula translates to one line of C.
Mathematical functions are sometimes defined in a less standard form. As an
F(0) =
0 and F(X) 2F(X
=
1) + X2. From this definition we see that F(1) 1, =
F(2) =
6, F(3) =
21, ad F(4) 58. A function that is defined in terms of itself
=
There are several important and possibly confusing points about recursion. A
common question is: Isn’tthis just circular logic? The answer is that although we are
function in terms of itself. In other words, evaluating F(S) by computing F(S) would
be circular. Evaluating F(S) by computing F(4) is not circular—unless, of course,
F(4) is evaluated by eventually computing F(S). The two most important issues are
Using recursion for numerical calculations is usually a bad idea. We have done so to illustrate the basic
points.
Chapter 1 Introduction 9
probably the how and why questions. In Chapter 3, the how and why issues are
the completion of the calculation for F (1), which is now seen to be 1. Then F (2),
F(3), and finally F(4) can be determined. All the bookkeeping needed to keep track
of pending function calls (those started but waiting for a recursive call to complete),
along with their variables, is done by the computer automatically. An important
point, however, is that recursive calls will keep on being made until a base case is
reached. For instance, an attempt to evaluate F( —1) will result in calls to F( —2),
and
F(—3), so on. Since this will never get to a base case, the program won’t be able
to compute the answer (which is undefined anyway). Occasionally, a much more
subtle error is made, which is exhibited in Figure 1.3. The error in the program in
Figure 1.3 is that Bad(1) is defined, by line 3, to be Bad(1). Obviously, this doesn’t
give any clue as to what Bad(1) actually is. The computer will thus repeatedly
make calls to Bad( 1) in an attempt to resolve its values. Eventually, its bookkeeping
system will run out of space, and the program will crash. Generally, we would say
that this function doesn’t work for one special case but is correct otherwise. This
isn’t true here, since Bad(2) calls Bad(1). Thus, Bad(2) cannot be evaluated either.
Furthermore, Bad(3), Bad(4), and Bad(S) all make calls to Bad(2). Since Bad(2)
is unevaluable, none of these values are either. In fact, this
program doesn’t work
for any value of N, except 0. With recursive programs, there is no such thing as a
“specialcase.”
These considerations lead to the first two fundamental rules of recursion:
1. Base cases. You must always have some base cases, which can be solved
without recursion.
2. Making progress. For the cases that are to be solved recursively, the recursive
must always be to a case that makes progress toward a base case.
call
.i nt
Bad( unsigned mt N )
1*1*1 if(N==0)
1* 2*! return 0;
else
return Bad(N/3÷1)+N-1;
10 Data Structures and Algorithm Analysis in C
defined in terms of other words. When we look up a word, we might not always
understand the definition, so we might have to look up words in the definition.
Likewise, we might not understand some of those, so we might have to continue
this search for a while. Because the dictionary is finite, eventually either (1) we will
come to a point where we understand all of the words in some definition (and thus
understand that definition and retrace our path through the other definitions) or
(2) we will find that the definitions are circular and we are stuck, or that some word
we need to understand for a definition is not in the dictionary.
have the heading PrintOut(N). Assume that the only 1/0 routines available will
take a single-digit number and output it to the terminal. We will call this routine
PrintDigit; for example, PrintDigit(4) will output a 4 to the terminal.
Recursion provides a very clean solution to this problem. To print out 76234,
we need to first print out 7623 and then print out 4. The second step is easily
accomplished with the statement PrintDigit(N% 10), but the first doesn’t seem any
simpler than the original problem. Indeed it is virtually the same problem, so we can
that the program doesn’t loop indefinitely. Since we haven’t defined a base case yet,
it is clear that we still have something to do. Our base case will be PrintDigit(N) if
tXj is the
largest integer that is less than or
equal to X.
Chapter 1 Introduction 11
void
Printout( unsigned mt N ) /* Print nonnegative N */
if(N>=10)
Print0ut( N / 10 );
PrintDigit( N % 10 );
flIEO1EM 1.4.
First, if N has digit, then the program is trivially correct, since it merely
one
makes a call PrintDigit. Assume then that PrintOut works for all numbers
to
current problem. The mathematical justification for this is proof by induction. This
gives the third rule of recursion:
This rule is important because it means that when designing recursive programs,
you generally don’t need to know the details of the bookkeeping arrangements, and
you don’t have to try to trace through the myriad of recursive calls. Frequently, it is
extremely difficult down the actual sequence of recursive calls. Of course,
to track
in many cases this is an indication of a good use of recursion, since the computer is
being allowed to work out the complicated details.
The main problem with recursion is the hidden bookkeeping costs. Although
these costs are almost always justifiable, because recursive programs not only simplify
the algorithm design but also tend to give cleaner code, recursion should never be
used as a substitute for a simple for loop. We’ll discuss the overhead involved in
recursion in more detail in Section 3.3.
When writing recursive routines, it is crucial to keep in mind the four basic
rules of recursion:
1. Base cases. You must always have some base cases, which can be solved
without recursion.
12 Data Structures and Algorithm Analysis in C
2. Makrng progress. For the cases that are to be solved recursively, the recursive
call must always be to a case that makes progress toward a base case.
3. Design rule. Assume that all the recursive calls work.
4. Compound interest rule. Never duplicate work by solving the same instance
ofa
problem in separate recursive calls.
The fourth rule, which will be justified (along with its nickname) in later sections,
is the reason that it is generally a bad idea to use recursion to evaluate simple
mathematical functions, such as the Fibonacci numbers. As long as you keep these
rules in mind, recursive programming should be straightforward.
Summary
This chapter sets the stage for the rest of the book. The time taken by an algorithm
confronted with large amounts of input will be an important criterion for deciding if
it is a good algorithm. (Of course, correctness is most important.) Speed is relative.
What is fast for one problem on one machine might be slow for another problem or
a different machine. We will begin to address these issues in the next chapter and
will use the mathematics discussed here to establish a formal model.
Exercises
#i nd ude filename
which reads filename and inserts its contents in place of the include statement.
Include statements may be nested; in other words, the file filename may itself
contain an include statement, but, obviously, a file can’t include itself in any
chain. Write a program that reads in a file and outputs the file as modified by
the include statements.
1.5 Prove the following formulas:
a. logX <X for all X > 0
b. log(AB) =
BlogA
1.6 Evaluate the following sums:
a.
,to 4’
b.
Chapter 1 Introduction 13
,=0
**d
,=0
1.7 Estimate
I
=[N12j
a.
N /N\2
b. i3=(2i
i=1
References
There are many good textbooks covering the mathematics reviewed in this chapter.
A small subset is [1], [2], [31, [9], [10], and [11]. Reference [9] is specifically geared
toward the analysis of algorithms. It is the first volume of a three-volume series that
will be cited throughout this text. More advanced material is covered in [5].
Throughout this book we will assume a knowledge of C [8]. Occasionally,
we add a feature where necessary for clarity. We also familiarity with
assume
pointers and recursion (the recursion summary in this chapter is meant to be a quick
review). We will attempt to provide hints on their use where appropriate throughout
the textbook. Readers not familiar with these should consult [12] or any good
intermediate programming textbook.
General programming style is discussed in several books. Some of the classics
are [4], [61, and [7].
Algorithm Analysis
An algorithm is a clearly specified set of simple instructions to be followed to solve
a problem. Once an algorithm is given for a problem and decided (somehow) to be
correct, an important step is to determine how much in the way of resources, such
as time or space, the algorithm will require. An algorithm that solves a problem but
requires a year is hardly of any use. Likewise, an algorithm that requires a gigabyte
of main memory is not (currently) useful on most machines.
In this chapter, we shall discuss
DEFINfflON: T(N) 0(1(N)) if there are positive constants c and no such that
T(N) cf(N)whenN n.
DEFINmON: T(N) =
9(h(N)) if and only if T(N) =
0(h(N)) and T(N)
DEFINm0N: T(N) =
o(p(N)) if T(N) =
0(p(N)) and T(N)
16 Data Structures and Algorithm Analysis in C
The idea of these definitions is to establish a relative order among functions. Given
two functions, there are usually points where one function is smaller than the other
function, so it does not make sense to claim, for instance, [(N) <
g(N). Thus,
we compare their relative rates of growth. When we apply this to the analysis of
algorithms, we shall see why this is the important measure.
Although 1,000N is larger than N2 for small values of N, N2 grows at a
faster rate, and thus N2 will eventually be the larger function. The turning point is
N =
1,000 in this case. The first definition says that eventually there is some point
n0 past which c [(N) is always at least as large as T (N), so that if constant factors
are ignored, [(N) is at least as big as T(N). In our case, we have T(N) =
1,000N,
[(N) =
N2, n0
=
1,000, and c = 1. We could also use no
100. =
10 and c =
Thus, we can say that 1,000N 0(N2) (order N-squared). This notation is
=
“Big-Oh....”
If we use the traditional inequality operators to compare growth rates, then
the first definition says that the growth rate of T(N) is less than or equal to (s)
that of ((N). The second definition, T(N) =
fl(g(N)) (pronounced “omega”),says
that the growth rate of T(N) is greater than or equal to () that of g(N). The
of T(N) equals (=) the growth rate of h(N). The last definition, T(N) o(p(N)) =
(pronounced “little-oh”),says that the growth rate of T (N) is less than (<) the
growth rate of p(N). This is different from Big-Oh, because Big-Oh allows the
possibility that the growth rates are the same.
To prove that some function T(N) 0([(N)), we usually do not apply these
=
definitions formally but instead use a repertoire of known results. In general, this
means that a proof (or determination that the assumption is incorrect) is a very simple
calculation and should not involve calculus, except in extraordinary circumstances
(not likely to occur in an algorithm analysis).
When we say that T(N) 0(f(N)), we are guaranteeing that the function
=
T(N) grows at a rate no faster than ((N); thus f(N) is an upper bound on T(N).
Since this implies that ((N) =
fl(T(N)), we say that T(N) is a lower bound on
((N).
As an example, N3 grows faster than N2, so we can say that N2 =
0(N3)
or N3 =
f1(N2). f(N) =
N2 and g(N) =
2N2 grow at the same rate, so both
[(N) =
0(g(N)) and [(N) =
fl(g(N)) are true. When two functions grow at
the same rate, then the decision of whether or not to signify this with 6() can
depend on the particular context. Intuitively, if g(N) 2N2, then g(N) 0(N4), =
is the best answer. Writing g(N) 6(N2) says not only that g(N)
=
0(N2), but =
RULE 1:
11T1(N) 0(f(N))andT2(N)
=
0(g(N)), then =
Function Nnme
c Constant
log N Logarithmic
log’ N Log-squared
N Linear
NlogN
N’ Quadratic
N’ Cubic
Exponential
RULE 2:
RULE 3:
Iog’ N =
0(N) for any constant k. This tells us that logarithms grow very
slowly.
This information is sufficient to arrange most of the common functions by
growth rate(see Figure 2.1).
Several points are in order. First, it is very bad style to include constants or low-
order terms inside a Big-Oh. Do not say T(N) O(2N2) or T(N) =
0(N2 + N). =
Second, always determine the relative growth rates of two functions [(N)
we can
The limit oscillates: There is no relation (this will not happen in our context).
Using this method almost always amounts to overkill. Usually the relation between
and g(N) N’5, then to decide which of [(N) and g(N) grows faster, one really
=
needs to determine which of log N and N°5 grows faster. This is like determining
lim... f (N)/g(N), where f’(N)and g’(N) are the derivatives of f(N) and g(N), respectively.
18 Data Structures and Algorithm Analysis in C
which of log2
N or N grows faster. This is a simple problem, because it is already
known that N grows faster than any power of a log. Thus, g(N) grows faster than
[(N).
One
stylistic note: It is bad to say [(N) O(g(N)), because the inequality is
implied by the definition. It is wrong to write [(N) O(g(N)), which does not
make sense.
2.2. Model
with real computers, it takes exactly one time unit to do anything (simple). To be
reasonable, we will assume that, like a modern computer, our model has fixed-size
(say, 32-bit) integers and that there are no fancy operations, such as matrix inversion
or sorting, that clearly cannot be done in one time unit. We also assume infinite
memory.
This model clearly has some weaknesses. Obviously, in real life, not all operations
take exactly the same time. In particular, in our model one disk read counts
the same as an addition, even though the addition is typically several orders of
magnitude faster. Also, by assuming infinite memory, we never worry about page
faulting, which can be a real problem, especially for efficient algorithms.
Generally, the quantity required is the worst-case time, unless otherwise specified.
One reason for this is that itprovides a bound for all input, including
particularly bad input, which an average-case analysis does not provide.
The other
reason is that average-case bounds are usually much more difficult to compute. In
some instances, the definition of “average” can affect the result. (For instance, what
solve it, and the performance of these algorithms varies drastically. We will discuss
four algorithms to solve this problem. The running time on some computer (the
exact computer is unimportant) for these algorithms is given in Figure 2.2.
There are several important things worth noting in this table. For smalla
amount of input, the algorithms all run in a blink of the eye, so if only smalla
FIgure 2.2 Running times of several algorithms for maximum subsequence sum
(in seconds)
Algorithm 1 2 3 4
Input N =
10 0.00103 0.00045 0.00066 0.00034
Size N 100 0.47015 0.01112 0.00486 0.00063
N =
1,000 448.77 1.1233 0.05843 0.00333
N =
10,000 NA 111.13 0.68631 0.03042
N =
100,000 NA NA 8.0113 0.29832
20 Data Structures and Algorithm Analysis in C
Al 1.0(N3) Al 2.0(N2)
Mg 3. 0(NlogN)
4
1
Alg4.O(N)
00 10 20 30 40 50 60 70 80 90 100
algorithms
Mg 1. 0(N3)
0(N2)
g 2.
0.4
0.3
0.2
0.1
Al 4.0(N)
0.00 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Figure 2.4 Plot (N vs. seconds) of various maximum subsequence sum algorithms
them!
Chapter 2 Algorithm Analysis 21
Generally, there are several algorithmic ideas, and we would like to eliminate
the bad early, so an analysis is usually required. Furthermore, the ability to do
ones
an analysis usually provides insight into designing efficient algorithms. The analysis
also generally pinpoints the bottlenecks, which are worth coding carefully.
To simplify the analysis, we will adopt the convention that there are no particular
units of time. Thus, we throw away leading constants. We will also throw away
low-order terms, so what we are essentially doing is computing a Big-Oh running
time. Since Big-Oh is an upper bound, we must be careful never to underestimate the
running time of the program. In effect, the answer provided is a guarantee that the
program will terminate within a certain time period. The program may stop earlier
than this, but never later.
i nt
Sum( mt N )
{
mt i, PartialSum;
/* 1*! PartialSum 0; =
1* 3*/ PartialSuni = j * j *
/* 4*/ return PartialSum;
}
The analysis of this program is simple. The declarations count for no time.
Lines 1 and 4 count for one unit each. Line 3 counts for four units per time executed
(two multiplications, one addition, and one assignment) and is executed N times,
for atotal of 4N units. Line 2 has the hidden costs of initializing i, testing i N,
and incrementing i. The total cost of all these is I to initialize, N + 1 for all the
tests, and N for all the increments, which is 2N + 2. We ignore the costs of calling
the function and returning, for a total of 6N + 4. Thus, we say that this function is
0(N).
If we had to perform all this work every time we needed to analyze a program,
the task would quickly become infeasible. Fortunately, since we are giving the
answer in terms of Big-Oh, there are lots of shortcuts that can be taken without
affecting the final answer. For instance, line 3 is obviously an 0(1) statement (per
RULE 1—MR
WOPS
The running time of a for loop is at most the running time of the statements
inside the for ioop (including tests) times the number of iterations.
22 Data Structures and Algorithm Analysis in C
Analyze these inside out. The total running time of a statement inside a group
of nested loops is the running time of the statement multiplied by the product
of the sizes of all the for loops.
As an example, the following program fragment is 0(N2):
for(i =0; i <N; i++)
for(j=0; j<N; j++)
STAT
RULE 3—CONSECIJ11VE MEN:
These just add (which means that the maximum is the one that counts; see
As an example, the following program fragment, which has 0(N) work followed
by 0(N2) work, is also 0(N2):
for(i = i <N; i++)
A[ I 0;] =
A[ I ] .i-= A[ j ] + I +
RULE 4—if/ELSE:
the running time of an if/else statement is never more than the running time of
the test plus the larger of the running times of SI and S2.
long mt
Factorial( mt N )
{
if( N <= 1 )
return 1;
else
*
return N Factorial( N -
1 );
}
Chapter 2 Algorithm Analysis 23
This example is really a poor use of recursion. When recursion is properly used,
it is difficult to convert the recursion into a
simple loop structure. In this case, the
analysis will involve a recurrence relation that needs to be solved. To see what might
happen, consider the following program, which turns out to be a horrible use of
recursion:
long mt
Fib( mnt N )
{
1*1*1 if(N<=1)
1* 2*! return 1;
else
/* 3*/ return Fib( N -
1 ) + Fib( N
-
2 );
}
At first glance, this seems like a very clever use of recursion. However, if the
program is coded up and run for values of N around 30, it becomes apparent that
this program is terribly inefficient. The analysis is fairly simple. Let T(N) be the
running time for the function Fib(N). If N 0 or N 1, then the running time is
= =
some constant value, which is the time to do the test at line 1 and return. We can
function call is Fib(N 1) and hence, by the definition ofT, requires T (N 1) units
of time. A similar argument shows that the second function call requires T(N 2)
units of time. The total time required is then T(N 1) + T(N 2) + 2, where the
2 accounts for the work at line 1 plus the addition at line 3. Thus, for N 2, we
have the following formula for the running time of Fib(N):
T(N) =
—2)+2
T(N —1)+T(N
Since Fib(N) =
Fib(N 1) + Fib(N 2), it is easy
to show by induction that
T(N) Fib(N). In Section 1.2.5, showed that Fib(N) < (5/3)N A similar
we
calculation shows that (for N > 4) Fib(N) (3/2)N, and so the running time of
this program grows exponentially. This is about as bad as possible. By keeping a
simple array and using a for loop, the running time can be reduced substantially.
This program is slow because there is a huge amount of redundant work being
performed, violating the fourth major rule of recursion (the compound interest rule),
which was presented in Section 1.3. Notice that the first call on line 3, Fib(N 1),
actually computes Fib(N 2) at some point. This information is thrown away
and recomputed by the second call on line 3. The amount of information thrown
away compounds recursively and results in the huge running time. This is perhaps
the finest example of the maxim ‘Don’tcompute anything more than once” and
should not scare you away from using recursion. Throughout this book, we shall
see outstanding uses of recursion.
Random documents with unrelated
content Scribd suggests to you:
Keaukanai, 2, 8.
Keauleinakahi ordered to pierce the double canoe of Kaumaielieli and kill Kana
and Niheu, 444.
sword-fish of Kapepeekauila, 444.
warrior in charge of the ocean, 444.
warrior of Kapepeekauila meets and attacks the double canoe; is struck and
killed by Niheu, 444.
Keawekekahialiiokamoku, 364.
Kualii likened to, 388, 392.
[xxiv]ruled Hawaii four generations before Kamehameha, 388.
turned salt water into fresh, 388.
Keawewaihe, 396.
Keelekoha, 382.
Keinohoomanawanui again sees an armed company and says “Our death is close
upon us,” 466.
credited by Kakuhihewa as the cause of victories, 468.
definition of, 466.
discredited by a farmer for the victories, 468.
fears at dagger sign of being discovered, 466.
fears for the result of Kalelealuaka’s wish, 464.
gains victory in battles with Pueonui’s men, 468.
made an officer of Kakuhihewa’s, 468.
seeing an armed company approaching, fears death, 466.
termed by Kalelealuaka a coward, 466.
Kekamaluahaku, 24.
Kekea, or Albino, 8.
Kekea Kapu, 4.
Kekuawalu, 394.
Kemau, 192.
Keohe, 344.
Keoneoio (Maui), Koi returning from Kauai stayed over at, 232.
Kiinoho and Kiihele accompany Kepakailiula to Hana but not permitted to land,
506.
brothers of Hina, 498.
decide to find a wife for Kepakailiula, 500.
definition of, 498.
directed by dream, start for Paliuli, 498.
join in the fight, 508.
left Paliuli in charge of the gods, 502.
made joint kings of Oahu by Kepakailiula, reserving to himself and Kakuhihewa
rulers’ rights, 510.
mourn on leaving Paliuli, 502.
Puna chiefs of high rank, 498.
Kikakapu, butterfly-fish, 576.
put up in place of kapu stick, 576.
sacred fish, 240.
Kikenuiaewa, 24;
of Ewa, 342.
Koakea, 186;
heights of, adjoining Waipio, 208.
Umi meets Piimaiwaa at, 182.
Kohikohiokalani, 24.
Kolea and mate fly up and inquire of Makalii of the loud-voiced god Kaeha, 524.
reports Makalii’s message, 524.
said to Ulili, “Let us fly high above Kana and call to him”, 444.
told of Kaulu hiding in the palm leaf, 524.
Kolea and Ulili are told by Hina wherein Niheu’s strength lies, 446.
fall down on the hill of Haupu, 444.
met Kapepeekauila, barely escaped death; sent to tell Keauleinakahi, his
warrior, 444.
not a formidable pair, 444.
seeing Hina being taken, flew and held Niheu by the hair, 446.
swift messengers of Kapepeekauila, sent to look for Kana and Niheu, 444.
Kolohia, 374.
Kona and Koolau (Molokai) continue the battle against Kekaha, 418.
bones of chief of, 320;
rebels from, arrive, 330.
district, defeated king of, 394.
district, the largest, 338.
dividing line of, 360–62.
dwell in; house stands in, 286, 304.
Ehunuikaimalino king of, 228.
first meets the eye, 374.
given to Ehu, 206;
Kapalilua, 336.
Heapae chief of, 320;
Lono at temples in, 330.
Kapaihiahilina sails for, 356;
returns to, 362.
Kauhi through, sees not its people, 338.
Kauhiakama reports on, 336.
[xxviii]known from below, 378.
men from, 344.
Moihala chief of, son of Heapae, 320.
stands forth to sight, 28;
plainly seen, 374.
term for the lower regions, 378.
the sun warmed the selfish chiefs of, 394.
Umi desired to live in, 228–30.
whose stone floor burns, 394.
Koolau and Kona (Molokai) chiefs battle against those of Kekaha, 418.
chiefs of, gave up to Kualii all Molokai, 420.
chiefs of, hear war is to be carried into Kalaupapa, 418.
defeated, lands on the, side come into Paepae’s possession, 418.
war canoes from all the side of, go to battle, 418.
Kowili, 372.
Ku (Kualii), 30, 372, 376, 380, 386, 390–96, 414–16, 420, 428.
arrayed in his feather cloak, 384, 416.
encompassed by, is the island, 400.
haole from Tahiti, a god, 394.
has left but few priests, 386, 416.
holds up the rain, 378;
led to earth, 380.
indeed, whose is Tahiti for, 374.
is brought forth in the forest, 384.
is indeed king, 384–86, 416.
puts on his loin-cloth for war, 382.
returning to Oahu; sailing to Kauai, 374.
the lehua eater, 286, 304.
uncomparable, 390–94.
urged to be merciful and spare his wrath, 388.
Ku and Hina, male attendant reports the conduct of their two charges to, 542.
parents of Kepakailiula, 498, 540.
son and daughter of, brought up under strict kapu, 540.
Kuaimakani, 180.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com