Data Structures Into Java Ucb Cs61b Textbook Itebooks download
Data Structures Into Java Ucb Cs61b Textbook Itebooks download
Itebooks download
https://ebookbell.com/product/data-structures-into-java-ucb-
cs61b-textbook-itebooks-23836184
https://ebookbell.com/product/serialization-and-persistent-objects-
turning-data-structures-into-efficient-databases-1st-edition-jiri-
soukup-4697496
https://ebookbell.com/product/data-structure-simplified-
implementation-using-c-jitendra-singh-11074350
https://ebookbell.com/product/data-structures-the-fun-way-an-amusing-
adventure-with-coffeefilled-examples-1st-edition-jeremy-
kubica-44883122
Data Structures With C Using Stl 2nd Edition William H Ford William R
Topp
https://ebookbell.com/product/data-structures-with-c-using-stl-2nd-
edition-william-h-ford-william-r-topp-46638186
Data Structures And Algorithms With Javascript Michael Mcmillan
https://ebookbell.com/product/data-structures-and-algorithms-with-
javascript-michael-mcmillan-47308410
https://ebookbell.com/product/data-structures-and-algorithms-in-c-4th-
edition-adam-drozdek-48951104
https://ebookbell.com/product/data-structures-and-algorithm-analysis-
in-c-3rd-edition-clifford-a-shaffer-49161310
https://ebookbell.com/product/data-structures-and-program-design-
using-java-layla-s-mayboudi-49422932
https://ebookbell.com/product/data-structures-and-algorithms-in-swift-
implement-stacks-queues-dictionaries-and-lists-in-your-apps-1st-
edition-elshad-karimov-50195754
CS 61B Reader
Paul N. Hilfinger
University of California, Berkeley
Acknowledgments. Thanks to the following individuals for finding many of the
errors in earlier editions: Dan Bonachea, Michael Clancy, Dennis Hall, Joseph Hui,
Yina Jin, Zhi Lin, Amy Mok, Barath Raghavan Yingssu Tsai, Emily Watt, Howard
Wu, and Zihan Zhou.
Copyright c 2000, 2001, 2002, 2004, 2005, 2006, 2007, 2008, 2009, 2011, 2012,
2013, 2014 by Paul N. Hilfinger. All rights reserved.
Contents
1 Algorithmic Complexity 7
1.1 Asymptotic complexity analysis and order notation . . . . . . . . . . 9
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Demonstrating “Big-Ohness” . . . . . . . . . . . . . . . . . . 13
1.3 Applications to Algorithm Analysis . . . . . . . . . . . . . . . . . . . 13
1.3.1 Linear search . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Quadratic example . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Explosive example . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.4 Divide and conquer . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.5 Divide and fight to a standstill . . . . . . . . . . . . . . . . . 17
1.4 Amortization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Complexity of Problems . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Some Properties of Logarithms . . . . . . . . . . . . . . . . . . . . . 21
1.7 A Note on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Meeting a Specification 49
3.1 Doing it from Scratch . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 The AbstractCollection Class . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Implementing the List Interface . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 The AbstractList Class . . . . . . . . . . . . . . . . . . . . . 53
3
4 CONTENTS
5 Trees 91
5.1 Expression trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Basic tree primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Representing trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.1 Root-down pointer-based binary trees . . . . . . . . . . . . . 96
5.3.2 Root-down pointer-based ordered trees . . . . . . . . . . . . . 96
5.3.3 Leaf-up representation . . . . . . . . . . . . . . . . . . . . . . 97
5.3.4 Array representations of complete trees . . . . . . . . . . . . 98
5.3.5 Alternative representations of empty trees . . . . . . . . . . . 99
5.4 Tree traversals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4.1 Generalized visitation . . . . . . . . . . . . . . . . . . . . . . 101
5.4.2 Visiting empty trees . . . . . . . . . . . . . . . . . . . . . . . 103
5.4.3 Iterators on trees . . . . . . . . . . . . . . . . . . . . . . . . . 104
7 Hashing 133
7.1 Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Open-address hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3 The hash function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
12 Graphs 219
12.1 A Programmer’s Specification . . . . . . . . . . . . . . . . . . . . . . 220
12.2 Representing graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
12.2.1 Adjacency Lists . . . . . . . . . . . . . . . . . . . . . . . . . . 221
12.2.2 Edge sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
12.2.3 Adjacency matrices . . . . . . . . . . . . . . . . . . . . . . . . 227
12.3 Graph Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12.3.1 Marking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12.3.2 A general traversal schema. . . . . . . . . . . . . . . . . . . . 229
12.3.3 Generic depth-first and breadth-first traversal . . . . . . . . . 230
12.3.4 Topological sorting. . . . . . . . . . . . . . . . . . . . . . . . 230
12.3.5 Minimum spanning trees . . . . . . . . . . . . . . . . . . . . . 231
12.3.6 Single-source shortest paths . . . . . . . . . . . . . . . . . . . 234
12.3.7 A* search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
12.3.8 Kruskal’s algorithm for MST . . . . . . . . . . . . . . . . . . 239
Chapter 1
Algorithmic Complexity
The obvious way to answer to the question “How fast does such-and-such a program
run?” is to use something like the UNIX time command to find out directly. There
are various possible objections to this easy answer. The time required by a program
is a function of the input, so presumably we have to time several instances of the
command and extrapolate the result. Some programs, however, behave fine for most
inputs, but sometimes take a very long time; how do we report (indeed, how can we
be sure to notice) such anomalies? What do we do about all the inputs for which we
have no measurements? How do we validly apply results gathered on one machine
to another machine?
The trouble with measuring raw time is that the information is precise, but
limited: the time for this input on this configuration of this machine. On a different
machine whose instructions take different absolute or relative times, the numbers
don’t necessarily apply. Indeed, suppose we compare two different programs for
doing the same thing on the same inputs and the same machine. Program A may
turn out faster than program B. This does not imply, however, that program A will
be faster than B when they are run on some other input, or on the same input, but
some other machine.
In mathematese, we might say that a raw time is the value of a function
Cr (I, P, M ) for some particular input I, some program P , and some “platform”
M (platform here is a catchall term for a combination of machine, operating sys-
tem, compiler, and runtime library support). I’ve invented the function Cr here to
mean “the raw cost of. . . .” We can make the figure a little more informative by
summarizing over all inputs of a particular size
where |I| denotes the “size” of input I. How one defines the size depends on the
problem: if I is an array to be sorted, for example, |I| might denote I.length. We
say that Cw measures worst-case time of a program. Of course, since the number
of inputs of a given size could be very large (the number of arrays of 5 ints, for
example, is 2160 > 1048 ), we can’t directly measure Cw , but we can perhaps estimate
it with the help of some analysis of P . By knowing worst-case times, we can make
7
8 CHAPTER 1. ALGORITHMIC COMPLEXITY
(aloud, this is “f (n) is in big-Oh of g(n)”) to mean that the function f is eventually
10 CHAPTER 1. ALGORITHMIC COMPLEXITY
2|g(n)|
n=M
|f (n)|
|h′ (n)|
|h(n)|
n n
(a) (b)
Figure 1.1: Illustration of big-Oh notation. In graph (a), we see that |f (n)| ≤ 2|g(n)|
for n > M , so that f (n) ∈ O(g(n)) (with K = 2). Likewise, h(n) ∈ O(g(n)),
illustrating the g can be a very over-cautious bound. The function f is also bounded
below by both g (with, for example, K = 0.5 and M any value larger than 0) and by
h. That is, f (n) ∈ Ω(g(n)) and f (n) ∈ Ω(h(n)). Because f is bounded above and
below by multiples of g, we say f (n) ∈ Θ(g(n)). On the other hand, h(n) 6∈ Ω(g(n)).
In fact, assuming that g continues to grow as shown and h to shrink, h(n) ∈ o(g(n)).
Graph (b) shows that o(·) is not simply the set complement of Ω(·); h′ (n) 6∈ Ω(g ′ (n)),
but h′ (n) 6∈ o(g ′ (n)), either.
for some constants K > 0 and M . That is, O(g(n)) is the set of functions that
“grow no more quickly than” |g(n)| does as n gets sufficiently large. Somewhat
confusingly, f (n) here does not mean “the result of applying f to n,” as it usually
does. Rather, it is to be interpreted as the body of a function whose parameter is n.
Thus, we often write things like O(n2 ) to mean “the set of all functions that grow
no more quickly than the square of their argument1 .” Figure 1.1a gives an intuitive
idea of what it means to be in O(g(n)).
Saying that f (n) ∈ O(g(n)) gives us only an upper bound on the behavior of f .
For example, the function h in Figure 1.1a—and for that matter, the function that
1
If we wanted to be formally correct, we’d use lambda notation to represent functions (such as
Scheme uses) and write instead O(λn. n2 ), but I’m sure you can see how such a degree of rigor
would become tedious very soon.
1.2. EXAMPLES 11
It’s easy to see that if h(n) ∈ o(g(n)), then h(n) 6∈ Ω(g(n)); no constant K can
work in the definition of Ω(·). It is not the case, however, that all functions that
are outside of Ω(g(n)) must be in o(g(n)), as illustrated in Figure 1.1b.
1.2 Examples
You may have seen the big-Oh notation already in calculus courses. For example,
Taylor’s theorem tells us2 that (under appropriate conditions)
xn [n] X xk
f (x) = f (y) + f [k](0)
|n! {z } 0≤k<n k!
error term | {z }
approximation
for some y between 0 and x, where f [k] represents the kth derivative of f . Therefore,
if g(x) represents the maximum absolute value of f [n] between 0 and x, then we
could also write the error term as
X xk
f (x) − f [k](0)
0≤k<n
k!
xn
∈ O( g(x)) = O(xn g(x))
n!
2
Yes, I know it’s a Maclaurin series here, but it’s still Taylor’s theorem.
12 CHAPTER 1. ALGORITHMIC COMPLEXITY
Table 1.1: Some examples of order relations. In the above, names other than n
represent constants, with ǫ > 0, 0 ≤ δ ≤ 1, p > 1, and k, k ′ > 1.
1.3. APPLICATIONS TO ALGORITHM ANALYSIS 13
for fixed n. This is, of course, a much weaker statement than the original (it allows
the error to be much bigger than it really is).
You’ll often seen statements like this written with a little algebraic manipulation:
X xk
f (x) ∈ f [k](0) + O(xn g(x)).
0≤k<n
k!
To make sense of this sort of statement, we define addition (and so on) between
functions (a, b, etc.) and sets of functions (A, B, etc.):
a + b = λx.a(x) + b(x)
A + B = {a + b | a ∈ A, b ∈ B}
A + b = {a + b | a ∈ A}
a + B = {a + b | b ∈ B}
√
Similar definitions apply for multiplication, subtraction, and division. So if a is x
√
and b is lg x, then a + b is a function whose value is x + lg x for every (postive)
x. O(a(x)) + O(b(x)) (or just O(a) + O(b)) is then the set of functions you can get
√
by adding a member of O( x) to a member of O(lg x). For example, O(a) contains
√ √
5 x+3 and O(b) contains 0.01 lg x−16, so O(a)+O(b) contains 5 x+0.01 lg k−13,
among many others.
• Coming up with one or more functions that bound the quantity we’ve decided
to measure, usually in the worst case.
If we define Csort (N ) as the worst-case number of times the comparison x < A[j-1]
is executed for N = A.length, we see that for each value of i from 1 to A.length-1,
the program executes the comparison in the inner loop (on j) at most i times.
Therefore,
Csort (N ) = 1 + 2 + . . . + N − 1
= N (N − 1)/2
∈ Θ(N 2 )
and suppose we want to compute Cboom (M )—the number of times Q is called for
a given M in the worst case. If M = 0, this is 0. If M > 0, then Q gets executed
once in computing the argument of the first recursive call, and then it gets executed
however many times the two inner calls of boom with arguments of M − 1 execute
16 CHAPTER 1. ALGORITHMIC COMPLEXITY
= 2M − 1
and so Cboom (M ) ∈ Θ(2M ).
of L and U followed by the cost of executing isInB either with ⌊(N − 1)/2⌋ or with
⌈(N − 1)/2⌉ as the new value of N 5 . Either quantity is no more than ⌈(N − 1)/2⌉.
If N ≤ 1, there are two comparisons against N in the worst case.
Therefore, the following recurrence describes the cost, CisInB (i), of executing
this function when U − L + 1 = i.
CisInB (1) = 2
CisInB (i) = 1 + CisInB (⌈(i − 1)/2⌉), i > 1.
This is a bit hard to deal with, so let’s again make the reasonable assumption that
the value of the cost function, whatever it is, must increase as N increases. Then
′
we can compute a cost function, CisInB that is slightly larger than CisInB , but
easier to compute.
′
CisInB (1) = 2
′ ′
CisInB (i) = 1 + CisInB (i/2), i > 1 a power of 2.
This is a slight over-estimate of CisInB , but that still allows us to compute upper
′
bounds. Furthermore, CisInB is defined only on powers of two, but since isInB’s
cost increases as N increases, we can still bound CisInB (N ) conservatively by
′
computing CisInB of the next higher power of 2. Again with the massage:
′ ′
CisInB (i) = 1 + CisInB (i/2), i > 1 a power of 2.
′
= 1 + 1 + CisInB (i/4), i > 2 a power of 2.
..
.
· · + 1} +2
= |1 + ·{z
lg N
The quantity lg N is the logarithm of N base 2, or roughly “the number of times one
can divide N by 2 before reaching 1.” In summary, we can say CisIn (N ) ∈ O(lg N ).
Similarly, one can in fact derive that CisIn (N ) ∈ Θ(lg N ).
5
The notation ⌊x⌋ means the result of rounding x down (toward −∞) to an integer, and ⌈x⌉
means the result of rounding x up to an integer.
18 CHAPTER 1. ALGORITHMIC COMPLEXITY
We can approximate the arguments of both of the internal calls by N/2 as before,
ending up with the following approximation, Cmung (N ), to the cost of calling mung
with argument N = U − L + 1 (we are counting the number of times the test in the
first line executes).
Cmung (1) = 1
Cmung (i) = 1 + 2Cmung (i/2), i > 1 a power of 2.
So,
so, taking r = 2,
Cmung (N ) = 4N − 1
or Cmung (N ) ∈ Θ(N ).
1.4 Amortization
So far, we have considered the time spent by individual operations, or individual
calls on a certain function of interest. Sometimes, however, it is fruitful to consider
the cost of whole sequence of calls, especially when each call affects the cost of later
calls.
Consider, for example, a simple binary counter. Incrementing this counter causes
it to go through a sequence like this:
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 0 1 1
0 0 1 0 0
···
0 1 1 1 1
1 0 0 0 0
···
Increment: Flip the bits of the counter from right to left, up to and including the
first 0-bit encountered (if any).
Clearly, if we are asked to give a worst-case bound on the cost of the increment
operation for an N -bit counter (in number of flips), we’d have to say that it is
Θ(N ): all the bits can be flipped. Using just that bound, we’d then have to say
that the cost of performing M increment operations is Θ(M · N ).
But the costs of consecutive increment operations are related. For example, if
one increment flips more than one bit, the next increment will always flip exactly
one (why?). In fact, if you consider the pattern of bit changes, you’ll see that the
units (rightmost) bit flips on every increment, the 2’s bit on every second increment,
the 4’s bit on every fourth increment, and in general, then 2k ’s bit on every (2k )th
increment. Therefore, over any sequence of M consecutive increments, starting at
0, there will be
M
|{z} + ⌊M/2⌋ + ⌊M/4⌋ + . . . + ⌊M/2n ⌋, where n = ⌊lg M ⌋
| {z } | {z } | {z }
unit’s flips 2’s flips 4’s flips 2n ’s
flips
2n−2 + . . . + 1} +(M − 2n )
= |2n + 2n−1 + {z
=2n+1 −1
n
= 2 −1+M
< 2M flips
In other words, this is the same result we would get if we performed M incre-
ments each of which had a worst-case cost of 2 flips, rather than N . We call 2 flips
the amortized cost of an increment. To amortize in the context of algorithms is to
treat the cost of each individual operation in a sequence as if it were spread out
among all the operations in the sequence6 . Any particular increment might take up
to N flips, but we treat that as N/M flips credited to each increment operation in
the sequence (and likewise count each increment that takes only one flip as 1/M flip
for each increment operation). The result is that we get a more realistic idea of how
much time the entire program will take; simply multiplying the ordinary worst-case
time by M gives us a very loose and pessimistic estimate. Nor is amortized cost the
same as average cost; it is a stronger measure. If a certain operation has a given
average cost, that leaves open the possibility that there is some unlikely sequence
of inputs that will make it look bad. A bound on amortized worst-case cost, on the
other hand, is guaranteed to hold regardless of input.
Another way to reach the same result uses what is called the potential method7 .The
idea here is that we associate with our data structure (our bit sequence in this case)
a non-negative potential that represents work we wish to spread out over several op-
erations. If ci represents the actual cost of the ith operation on our data structure,
6
The word amortize comes from an Old French word meaning “to death.” The original meaning
from which the computer-science usage comes (introduced by Sleator and Tarjan), is “to gradually
write off the initial cost of something.”
7
Also due to D. Sleator.
20 CHAPTER 1. ALGORITHMIC COMPLEXITY
ai = ci + Φi+1 − Φi , (1.1)
where Φi denotes the saved-up potential before the ith operation. That is, we give
ourselves the choice of increasing Φ a little on any given operation and charging this
increase against ai , causing ai > ci when Φ increases. Alternatively, we can also
decrease ai below ci by having an operation reduce Φ, in effect using up previously
saved increases. Assuming we start with Φ0 = 0, the total cost of n operations is
X X
ci ≤ (ai + Φi − Φi + 1) (1.2)
0≤i<n 0≤i<n
X
= ( ai ) + Φ 0 − Φ n
0≤i<n
X
= ( ai ) − Φ n
0≤i<n
X
≤ ai ,
0≤i<n
algorithm. But this tells us nothing about whether the best-possible algorithm is
any faster than this—it puts no lower bound on the time required for the best al-
gorithm. For example, the worst-case time for isIn is Θ(N ). However, isInB is
much faster. Indeed, one can show that if the only knowledge the algorithm can
have is the result of comparisons between X and elements of the array, then isInB
has the best possible bound (it is optimal), so that the entire problem of finding an
element in an ordered array has worst-case time Θ(lg N ).
Putting an upper bound on the time required to perform some problem simply
involves finding an algorithm for the problem. By contrast, putting a good lower
bound on the required time is much harder. We essentially have to prove that no
algorithm can have a better execution time than our bound, regardless of how much
smarter the algorithm designer is than we are. Trivial lower bounds, of course, are
easy: every problem’s worst-case time is Ω(1), and the worst-case time of any prob-
lem whose answer depends on all the data is Ω(N ), assuming that one’s idealized
machine is at all realistic. Better lower bounds than those, however, require quite
a bit of work. All the better to keep our theoretical computer scientists employed.
lg xy = lg x + lg y
lg x/y = lg x − lg y
lg xp = p lg x
It is strictly increasing and strictly concave, meaning that its values lie above any line
segment joining points (x, lg x) and (z, lg z). To put it algebraically, if 0 < x < y < z,
then
y−x z−y
lgy > lg x + lg z.
z−x z−x
Therefore, if 0 < x + y < k, the value of lg x + lg y is maximized when x = y = k/2.
22 CHAPTER 1. ALGORITHMIC COMPLEXITY
Exercises
1.1. Demonstrate the following, or give counter-examples where indicated. Show-
ing that a certain O(·) formula is true means producing suitable K and M for
the definition at the beginning of §1.1. Hint: sometimes it is useful to take the
logarithms of two functions you are comparing.
c. O(f (n) + g(n)) = O(f (n)) + O(g(n)). This is a bit of trick question, really,
to make you look at the definitions carefully. Under what conditions is the
equation true?
d. There is a function f (x) > 0 such that f (x) 6∈ O(x) and f (x) 6∈ Ω(x).
e. There is a function f (x) such that f (0) = 0, f (1) = 100, f (2) = 10000, f (3) =
106 , but f (n) ∈ O(n).
f. n3 lg n ∈ O(n3.0001 ).
e. If f1 (x), f2 (x), . . . are a bunch of functions that are all in Ω(1), then
X
F (N ) = |fi (x)| ∈ Ω(N ).
1≤i≤N
Chapter 2
Most of the “classical” data structures covered in courses like this represent some
sort of collection of data. That is, they contain some set or multiset1 of values,
possibly with some ordering on them. Some of these collections of data are asso-
ciatively indexed; they are search structures that act like functions mapping certain
indexing values (keys) into other data (such as names into street addresses).
We can characterize the situation in the abstract by describing sets of opera-
tions that are supported by different data structures—that is by describing possible
abstract data types. From the point of view of a program that needs to represent
some kind of collection of data, this set of operations is all that one needs to know.
For each different abstract data type, there are typically several possible imple-
mentations. Which you choose depends on how much data your program has to
process, how fast it has to process the data, and what constraints it has on such
things as memory space. It is a dirty little secret of the trade that for quite a few
programs, it hardly matters what implementation you choose. Nevertheless, the
well-equipped programmer should be familiar with the available tools.
I expect that many of you will find this chapter frustrating, because it will talk
mostly about interfaces to data types without talking very much at all about the
implementations behind them. Get used to it. After all, the standard library behind
any widely used programming language is presented to you, the programmer, as a
set of interfaces—directions for what parameters to pass to each function and some
commentary, generally in English, about what it does. As a working programmer,
you will in turn spend much of your time producing modules that present the same
features to your clients.
2.1 Iterators
If we are to develop some general notion of a collection of data, there is at least one
generic question we’ll have to answer: how are we going to get items out of such a
collection? You are familiar with one kind of collection already—an array. Getting
1
A multiset or bag is like a set except that it may contain multiple copies of a particular data
value. That is, each member of a multiset has a multiplicity: a number of times that it appears.
23
24 CHAPTER 2. DATA TYPES IN THE ABSTRACT
items out of an array is easy; for example, to print the contents of an array, you
might write
Arrays have a natural notion of an nth element, so such loops are easy. But what
about other collections? Which is the “first penney” in a jar of penneys? Even if
we do arbitrarily choose to give every item in a collection a number, we will find
that the operation “fetch the nth item” may be expensive (consider lists of things
such as in Scheme).
The problem with attempting to impose a numbering on every collection of items
as way to extract them is that it forces the implementor of the collection to provide
a more specific tool than our problem may require. It’s a classic engineering trade-
off: satisfying one constraint (that one be able to fetch the nth item) may have
other costs (fetching all items one by one may become expensive).
So the problem is to provide the items in a collection without relying on indices,
or possibly without relying on order at all. Java provides two conventions, realized as
interfaces. The interface java.util.Iterator provides a way to access all the items
in a collection in some order. The interface java.util.ListIterator provides a
way to access items in a collection in some specific order, but without assigning an
index to each item2 .
that allocates and returns an Iterator (Figure 3.3 includes an example). Often the
actual type of this iterator will be hidden (even private); all the user of the class
needs to know is that the object returned by iterator provides the operations
hasNext and next (and sometimes remove). For example, a general way to print
all elements of a collection of Strings (analogous to the previous array printer)
might be
package java.util;
/** An object that delivers each item in some collection of items
* each of which is a T. */
public interface Iterator <T> {
/** True iff there are more items to deliver. */
boolean hasNext ();
/** Advance THIS to the next item and return it. */
T next ();
/** Remove the last item delivered by next() from the collection
* being iterated over. Optional operation: may throw
* UnsupportedOperationException if removal is not possible. */
void remove ();
}
The programmer who writes this loop needn’t know what gyrations the object i
has to go through to produce the requested elements; even a major change in how
C represents its collection requires no modification to the loop.
This particular kind of for loop is so common and useful that in Java 2, version
1.5, it has its own “syntactic sugar,” known as an enhanced for loop. You can write
for (String i : C)
System.out.print (i + " ");
to get the same effect as the previous for loop. Java will insert the missing pieces,
turning this into
for (Iterator<String> ρ = C.iterator (); ρ.hasNext(); ) {
String i = ρ.next ();
System.out.println (i + " ");
}
where ρ is some new variable introduced by the compiler and unused elsewhere
in the program, and whose type is taken from that of C.iterator(). This en-
hanced for loop will work for any object C whose type implements the interface
java.lang.Iterable, defined simply
public interface Iterable<T> {
Iterator<T> iterator ();
}
Thanks to the enhanced for loop, simply by defining an iterator method on a type
you define, you provide a very convenient way to sequence through any subparts
that objects of that type might contain.
Well, needless to say, having introduced this convenient shorthand for Iterators,
Java’s designers were suddenly in the position that iterating through the elements
26 CHAPTER 2. DATA TYPES IN THE ABSTRACT
of an array was much clumsier than iterating through those of a library class. So
they extended the enhanced for statement to encompass arrays. So, for example,
these two methods are equivalent:
/** The sum of the /** The sum of the elements
* elements of A */ * of A */
int sum (int[] A) { int sum (int[] A) {
int S; int S;
S = 0; S = 0;
for (int x : A) =⇒ for (int κ = 0; κ < A.length; κ++)
S += x; {
} int x = A[κ];
S += x;
}
}
where κ is a new variable introduced by the compiler.
package java.util;
/** Abstraction of a position in an ordered collection. At any
* given time, THIS represents a position (called its cursor )
* that is just after some number of items of type T (0 or more) of
* a particular collection, called the underlying collection. */
public interface ListIterator<T> extends Iterator<T> {
/* Exceptions: Methods that return items from the collection throw
* NoSuchElementException if there is no appropriate item. Optional
* methods throw UnsupportedOperationException if the method is not
* supported. */
/* Required methods: */
/** True unless THIS is past the last item of the collection */
boolean hasNext ();
/** True unless THIS is before the first item of the collection */
boolean hasPrevious ();
/* nextIndex () - 1 */
int previousIndex ();
/* Optional methods: */
/** Remove the item returned by the most recent call to .next ()
* or .previous (). There must not have been a more recent
* call to .add(). */
void remove ();
/** Replace the item returned by the most recent call to .next ()
* or .previous () with X in the underlying collection.
* There must not have been a more recent call to .add() or .remove. */
void set (T x);
}
Map
AbstractMap SortedMap
Figure 2.3: The Java library’s Map-related types (from java.util). Ellipses rep-
resent interfaces; dashed boxes are abstract classes, and solid boxes are concrete
(non-abstract) classes. Solid arrows indicate extends relationships, and dashed
arrows indicate implements relationships. The abstract classes are for use by
implementors wishing to add new collection classes; they provide default implemen-
tations of some methods. Clients apply new to the concrete classes to get instances,
and (at least ideally), use the interfaces as formal parameter types so as to make
their methods as widely applicable as possible.
2.2. THE JAVA COLLECTION ABSTRACTIONS 29
Collection
List Set
SortedSet
AbstractCollection
AbstractList AbstractSet
LinkedList Stack
Figure 2.4: The Java library’s Collection-related types (from java.util). See Fig-
ure 2.3 for the notation.
30 CHAPTER 2. DATA TYPES IN THE ABSTRACT
We have no idea what kinds of objects C0 and C1 are (they might be completely
different implementations of Collection), in what order their iterators deliver ele-
ments, or whether they allow repetitions. This method relies solely on the properties
described in the interface and its comments, and therefore always works (assuming,
as always, that the programmers who write classes that implement Collection
do their jobs). We don’t have to rewrite it for each new kind of Collection we
implement.
2.2. THE JAVA COLLECTION ABSTRACTIONS 31
package java.util;
/** A collection of values, each an Object reference. */
public interface Collection<T> extends Iterable<T> {
/* Constructors. Classes that implement Collection should
* have at least two constructors:
* CLASS (): Constructs an empty CLASS
* CLASS (C): Where C is any Collection, constructs a CLASS that
* contains the same elements as C. */
/* Required methods: */
package java.util;
public interface Set<T> extends Collection<T> { }
The methods, that is, are all the same. The differences are all in the comments.
The one-copy-of-each-element rule is reflected in more specific comments on several
methods. The result is shown in Figure 2.7. In this definition, we also include the
methods equals and hashCode. These methods are automatically part of any inter-
face, because they are defined in the Java class java.lang.Object, but I included
them here because their semantic specification (the comment) is more stringent than
for the general Object. The idea, of course, is for equals to denote set equality.
We’ll return to hashCode in Chapter 7.
Views
package java.util;
/** A Collection that contains at most one null item and in which no
* two distinct non-null items are .equal. The effects of modifying
* an item contained in a Set so as to change the value of .equal
* on it are undefined. */
public interface Set<T> extends Collection<T> {
/* Constructors. Classes that implement Set should
* have at least two constructors:
* CLASS (): Constructs an empty CLASS
* CLASS (C): Where C is any Collection, constructs a CLASS that
* contains the same elements as C, with duplicates removed. */
/** The sum of the values of x.hashCode () for all x in THIS, with
* the hashCode of null taken to be 0. */
int hashCode ();
Figure 2.7: The interface java.util.Set. Only methods with comments that are
more specific than those of Collection are shown.
2.2. THE JAVA COLLECTION ABSTRACTIONS 35
package java.util;
/** An ordered sequence of items, indexed by numbers 0 .. N-1,
* where N is the size() of the List. */
public interface List<T> extends Collection<T> {
/* Required methods: */
/** The Kth element of THIS, where 0 <= K < size(). Throws
* IndexOutOfBoundsException if K is out of range. */
T get (int k);
/* Optional methods: */
/** Cause item K of THIS to be X, and items K+1, K+2, ... to contain
* the previous values of get(K), get(K+1), .... Throws
* IndexOutOfBoundsException unless 0<=K<=size(). */
void add (int k, T x);
/** Same effect as add (size (), x); always returns true. */
boolean add (T x);
/** Remove item K, moving items K+1, ... down one index position,
* and returning the removed item. Throws
* IndexOutOfBoundsException if there is no item K. */
Object remove (int k);
As a result, there are a lot of possible operations on List that don’t have to be
defined, because they fall out as a natural consequence of operations on sublists.
There is no need for a version of remove that deletes items i through j of a list, or
for a version of indexOf that starts searching at item k.
Iterators (including ListIterators) provide another example of a view of Col-
lections. Again, you can access or (sometimes) modify the current contents of a
Collection through an iterator that its methods supply. For that matter, any Col-
lection is itself a view—the “identity view” if you want.
Whenever there are two possible views of the same entity, there is a possibility
that using one of them to modify the entity will interfere with the other view. It’s
not just that changes in one view are supposed to be seen in other views (as in the
example of clearing a sublist, above), but straightforward and fast implementations
of some views may malfunction when the entity being viewed is changed by other
means. What is supposed to happen when you call remove on an iterator, but the
item that is supposed to be removed (according to the specification of Iterator)
has already been removed directly (by calling remove on the full Collection)? Or
suppose you have a sublist containing items 2 through 4 of some full list. If the full
list is cleared, and then 3 items are added to it, what is in the sublist view?
Because of these quandries, the full specification of many view-producing meth-
ods (in the List interface, these are iterator, listIterator, and subList) have
a provision that the view becomes invalid if the underlying List is structurally mod-
ified (that is, if items are added or removed) through some means other than that
view. Thus, the result of L.iterator() becomes invalid if you perform L.add(...),
or if you perform remove on some other Iterator or sublist produced from L. By
contrast, we will also encounter views, such as those produced by the values method
on Map (see Figure 2.12), that are supposed to remain valid even when the under-
lying object is structurally modified; it is an obligation on the implementors of new
kinds of Map that they see that this is so.
package java.lang;
/** Describes types that have a natural ordering. */
public interface Comparable<T> {
/** Returns
* * a negative value iff THIS < Y under the natural ordering
* * a positive value iff THIS > Y;
* * 0 iff X and Y are "equivalent".
* Throws ClassCastException if X and Y are incomparable. */
int compareTo (T y);
}
Figure 2.9: The interface java.lang.Comparable, which marks classes that define
a natural ordering.
package java.util;
/** An ordering relation on certain pairs of objects. If */
public interface Comparator<T> {
/** Returns
* * a negative value iff X < Y according to THIS ordering;
* * a positive value iff X > Y;
* * 0 iff X and Y are "equivalent" under the order;
* Throws ClassCastException if X and Y are incomparable.
*/
int compare (T x, T y);
package java.util;
public interface SortedSet<T> extends Set<T> {
/* Constructors. Classes that implement SortedSet should define
* at least the constructors
* CLASS (): An empty set ordered by natural order (compareTo).
* CLASS (CMP): An empty set ordered by the Comparator CMP.
* CLASS (C): A set containing the items in Collection C, in
* natural order.
* CLASS (S): A set containing a copy of SortedSet S, with the
* same order.
*/
/** A view of all items in THIS that are strictly less than X. */
SortedSet<T> headSet (T x);
/** A view of all items, y, in THIS such that X0 <= y < X1. */
SortedSet<T> subSet (T X0, T X1);
}
2.4 An Example
Consider the problem of reading in a sequence of pairs of names, (ni , mi ). We wish
to create a list of all the first members, ni , in alphabetical order, and, for each of
them, a list of all names mi that are paired with them, with each mi appearing
once, and listed in the order of first appearance. Thus, the input
John Mary George Jeff Tom Bert George Paul John Peter
Tom Jim George Paul Ann Cyril John Mary George Eric
Ann: Cyril
George: Jeff Paul Eric
John: Mary Peter
Tom: Bert Jim
We can use some kind of SortedMap to handle the ni and for each, a List to handle
the mi . A possible method (taking a Reader as a source of input and a PrintWriter
as a destination for output) is shown in Figure 2.16.
42 CHAPTER 2. DATA TYPES IN THE ABSTRACT
package java.util;
public interface Map<Key, Val> {
/* Constructors: Classes that implement Map should
* have at least two constructors:
* CLASS (): Constructs an empty CLASS
* CLASS (M): Where M is any Map, constructs a CLASS that
* denotes the same abstract mapping as C. */
/* Required methods: */
/* Optional methods: */
/** True iff E is a Map.Entry and both represent the same (key,value)
* pair (i.e., keys are both null, or are .equal, and likewise for
* values).
boolean equals(Object e);
/** An integer hash value that depends only on the hashCode values
* of getKey() and getValue() according to the formula:
* (getKey() == null ? 0 : getKey().hashCode ())
* ^ (getValue() == null ? 0 : getValue.hashCode ()) */
int hashCode();
}
package java.util;
public interface SortedMap<Key,Val> extends Map<Key,Val> {
/* Constructors: Classes that implement SortedMap should
* have at least four constructors:
* CLASS (): An empty map whose keys are ordered by natural order.
* CLASS (CMP): An empty map whose keys are ordered by the Comparator CMP.
* CLASS (M): A map that is a copy of Map M, with keys ordered
* in natural order.
* CLASS (S): A map containing a copy of SortedMap S, with
* keys obeying the same ordering.
*/
import java.util.*;
import java.io.*;
class Example {
as the body of the operation. This provides an elegant enough way not to implement
something, but it raises an important design issue. Throwing an exception is a
dynamic action. In general, the compiler will have no comment about the fact that
you have written a program that must inevitably throw such an exception; you will
discover only upon testing the program that the implementation you have chosen
for some data structure is not sufficient.
An alternative design would split the interfaces into smaller pieces, like this:
etc.. . .
2.5. MANAGING PARTIAL IMPLEMENTATIONS: DESIGN OPTIONS 47
With such a design the compiler could catch attempts to call unsupported methods,
so that you wouldn’t need testing to discover a gap in your implementation.
However, such a redesign would have its own costs. It’s not quite as simple as
the listing above makes it appear. Consider, for example, the subList method in
ConstantList. Presumably, this would most sensibly return a ConstantList, since
if you are not allowed to alter a list, you cannot be allowed to alter one of its views.
That means, however, that the type List would need two subList methods (with
differing names), the one inherited from ConstantList, and a new one that produces
a List as its result, which would allow modification. Similar considerations apply
to the results of the iterator method; there would have to be two—one to return a
ConstantIterator, and the other to return Iterator. Furthermore, this proposed
redesign would not deal with an implementation of List that allowed one to add
items, or clear all items, but not remove individual items. For that, you would either
still need the UnsupportedOperationException or an even more complicated nest
of classes.
Evidently, the Java designers decided to accept the cost of leaving some problems
to be discovered by testing in order to simplify the design of their library. By
contrast, the designers of the corresponding standard libraries in C++ opted to
distinguish operations that work on any collections from those that work only on
“mutable” collections. However, they did not design their library out of interfaces;
it is awkward at best to introduce new kinds of collection or map in the C++ library.
48 CHAPTER 2. DATA TYPES IN THE ABSTRACT
Chapter 3
Meeting a Specification
49
50 CHAPTER 3. MEETING A SPECIFICATION
import java.util.*;
import java.lang.reflect.Array;
public class ArrayCollection<T> implements Collection<T> {
private T[] data;
ebookbell.com