125 Problems in Text Algorithms 1st Edition Maxime Crochemore 2024 scribd download
125 Problems in Text Algorithms 1st Edition Maxime Crochemore 2024 scribd download
https://ebookfinal.com/download/jewels-of-stringology-1st-edition-
maxime-crochemore/
ebookfinal.com
https://ebookfinal.com/download/genetic-algorithms-in-elixir-solve-
problems-using-evolution-1st-edition-sean-moriarity/
ebookfinal.com
https://ebookfinal.com/download/algorithms-illuminated-
part-4-algorithms-for-np-hard-problems-3rd-edition-tim-roughgarden/
ebookfinal.com
https://ebookfinal.com/download/algorithms-and-ordering-heuristics-
for-distributed-constraint-satisfaction-problems-1st-edition-mohamed-
wahbi/
ebookfinal.com
Vehicle Scheduling in Port Automation Advanced Algorithms
for Minimum Cost Flow Problems Second Edition Hassan
Rashidi
https://ebookfinal.com/download/vehicle-scheduling-in-port-automation-
advanced-algorithms-for-minimum-cost-flow-problems-second-edition-
hassan-rashidi/
ebookfinal.com
https://ebookfinal.com/download/charles-dickens-and-europe-1st-
edition-maxime-leroy/
ebookfinal.com
https://ebookfinal.com/download/management-accounting-text-problems-
and-cases-6th-edition-m-y-khan/
ebookfinal.com
https://ebookfinal.com/download/implementing-useful-algorithms-
in-c-1st-edition-dmytro-kedyk/
ebookfinal.com
https://ebookfinal.com/download/algorithms-in-a-nutshell-1st-edition-
george-t-heineman/
ebookfinal.com
125 Problems in Text Algorithms 1st Edition Maxime
Crochemore Digital Instant Download
Author(s): Maxime Crochemore, Thierry Lecroq, Wojciech Rytter
ISBN(s): 9781108835831, 110883583X
Edition: 1
File Details: PDF, 9.53 MB
Year: 2021
Language: english
125 Problems in Text Algorithms
String matching is one of the oldest algorithmic techniques, yet still one of the most
pervasive in computer science. The past 20 years have seen technological leaps in
applications as diverse as information retrieval and compression. This copiously
illustrated collection of puzzles and exercises in key areas of text algorithms and
combinatorics on words offers graduate students and researchers a pleasant and direct
way to learn and practice with advanced concepts.
The problems are drawn from a large range of scientific publications, both classic
and new. Building up from the basics, the book goes on to showcase problems in
combinatorics on words (including Fibonacci or Thue–Morse words), pattern
matching (including Knuth–Morris–Pratt and Boyer–Moore–like algorithms), efficient
text data structures (including suffix trees and suffix arrays), regularities in words
(including periods and runs) and text compression (including Huffman, Lempel–Ziv
and Burrows–Wheeler–based methods).
M A X I M E C RO C H E M O R E
Gustave Eiffel University
T H I E R RY L E C RO Q
University of Rouen Normandy
W O J C I E C H RY T T E R
University of Warsaw
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
www.cambridge.org
Information on this title: www.cambridge.org/9781108835831
DOI: 10.1017/9781108869317
© Maxime Crochemore, Thierry Lecroq, Wojciech Rytter 2021
Illustrations designed by Hélène Crochemore
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2021
Printed in the United Kingdom by TJ Books Ltd, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Crochemore, Maxime, 1947– author. | Lecroq, Thierry, author. |
Rytter, Wojciech, author.
Title: One twenty five problems in text algorithms / Maxime Crochemore,
Thierry Lecroq, Wojciech Rytter.
Other titles: 125 problems in text algorithms
Description: New York : Cambridge University Press, 2021. |
The numerals 125 are superimposed over “One twenty five” on the title page. |
Includes bibliographical references and index.
Identifiers: LCCN 2021002037 (print) | LCCN 2021002038 (ebook) |
ISBN 9781108835831 (hardback) | ISBN 9781108798853 (paperback) |
ISBN 9781108869317 (epub)
Subjects: LCSH: Text processing (Computer science)–Problems, exercises, etc. |
Computer algorithms–Problems, exercises, etc.
Classification: LCC QA76.9.T48 C758 2021 (print) |
LCC QA76.9.T48 (ebook) | DDC 005.13–dc23
LC record available at https://lccn.loc.gov/2021002037
LC ebook record available at https://lccn.loc.gov/2021002038
ISBN 978-1-108-83583-1 Hardback
ISBN 978-1-108-79885-3 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
Preface page ix
1 The Very Basics of Stringology 1
2 Combinatorial Puzzles 17
1 Stringologic Proof of Fermat’s Little Theorem 18
2 Simple Case of Codicity Testing 19
3 Magic Squares and the Thue–Morse Word 20
4 Oldenburger–Kolakoski Sequence 22
5 Square-Free Game 24
6 Fibonacci Words and Fibonacci Numeration System 26
7 Wythoff’s Game and Fibonacci Word 28
8 Distinct Periodic Words 30
9 A Relative of the Thue–Morse Word 33
10 Thue–Morse Words and Sums of Powers 34
11 Conjugates and Rotations of Words 35
12 Conjugate Palindromes 37
13 Many Words with Many Palindromes 39
14 Short Superword of Permutations 41
15 Short Supersequence of Permutations 43
16 Skolem Words 45
17 Langford Words 48
18 From Lyndon Words to de Bruijn Words 50
3 Pattern Matching 53
19 Border Table 54
20 Shortest Covers 56
21 Short Borders 58
v
vi Contents
22 Prefix Table 60
23 Border Table to the Maximal Suffix 62
24 Periodicity Test 65
25 Strict Borders 67
26 Delay of Sequential String Matching 70
27 Sparse Matching Automaton 72
28 Comparison-Effective String Matching 74
29 Strict Border Table of the Fibonacci Word 76
30 Words with Singleton Variables 78
31 Order-Preserving Patterns 81
32 Parameterised Matching 83
33 Good-Suffix Table 85
34 Worst Case of the Boyer–Moore Algorithm 88
35 Turbo-BM Algorithm 90
36 String Matching with Don’t Cares 92
37 Cyclic Equivalence 93
38 Simple Maximal Suffix Computation 96
39 Self-Maximal Words 98
40 Maximal Suffix and Its Period 100
41 Critical Position of a Word 103
42 Periods of Lyndon Word Prefixes 105
43 Searching Zimin Words 107
44 Searching Irregular 2D Patterns 110
7 Miscellaneous 275
108 Binary Pascal Words 276
109 Self-Reproducing Words 278
110 Weights of Factors 280
111 Letter-Occurrence Differences 282
112 Factoring with Border-Free Prefixes 283
113 Primitivity Test for Unary Extensions 286
114 Partially Commutative Alphabets 288
115 Greatest Fixed-Density Necklace 290
116 Period-Equivalent Binary Words 292
117 Online Generation of de Bruijn Words 295
118 Recursive Generation of de Bruijn Words 298
119 Word Equations with Given Lengths of Variables 300
120 Diverse Factors over a Three-Letter Alphabet 302
121 Longest Increasing Subsequence 304
122 Unavoidable Sets via Lyndon Words 306
123 Synchronising Words 309
124 Safe-Opening Words 311
125 Superwords of Shortened Permutations 314
Bibliography 318
Index 332
Preface
ix
x Preface
In this chapter we introduce basic notation and definitions of words and sketch
several constructions used in text algorithms.
Texts are central in ‘word processing’ systems, which provide facilities
for the manipulation of texts. Such systems usually process objects that are
quite large. Text algorithms occur in many areas of science and information
processing. Many text editors and programming languages have facilities for
processing texts. In molecular biology, for example, text algorithms arise in
the analysis of biological molecular sequences.
Words
The zero letter sequence is called the empty word and is denoted by ε. The
set of all finite words on an alphabet A is denoted by A∗ , and A+ = A∗ \ {ε}.
The length of a word x, length of the sequence, is denoted by |x|. We
denote by x[i], for i = 0,1, . . . ,|x| − 1, the letter at position or index i
on a non-empty word x. Then x = x[0]x[1] · · · x[|x| − 1] is also denoted by
x[0 . . |x| − 1]. The set of letters that occur in the word x is denoted by alph (x).
For the example x = abaaab we have |x| = 6 and alph (x) = {a,b}.
The product or concatenation of two words x and y is the word composed
of the letters of x followed by the letters of y. It is denoted by xy or by x · y
to emphasise the decomposition of the resulting word. The neutral element for
the product is ε and we denote respectively by zy −1 and x −1 z the words x and
y when z = xy.
A conjugate , rotation or cyclic shift of a word x is any word y that
factorises into vu, where uv = x. This makes sense because the product of
words is obviously non-commutative. For example, the set of conjugates of
abba, its conjugacy class because conjugacy is an equivalence relation, is
{aabb,abba,baab,bbaa} and that of abab is {abab,baba}.
A word x is a factor (sometimes called substring ) of a word y if y = uxv
for two words u and v. When u = ε, x is a prefix of y, and when v = ε, x
is a suffix of y. Sets Fact (x), Pref (x) and Suff (x) denote the sets of factors,
prefixes and suffixes of x respectively.
The Very Basics of Stringology 3
i 0 1 2 3 4 5 6 7 8
y[i] b a b a a b a b a
starting positions 1 4 6
ending positions 3 6 8
For words x and y, |y|x denotes the number of occurrences of x in y. Then, for
instance, |y| = {|y|a : a ∈ alph (y)}.
The word x is a subsequence or subword of y if the latter decomposes
into w0 x[0]w1 x[1] . . . x[|x| − 1]w|x| for words w0 , w1 , . . . , w|x| .
A factor or a subsequence x of a word y is said to be proper if x = y.
Periodicity
a a b a a baa
-
3 aa baab aa
-
6 aab aabaa
-
7 aa baabaa
-
8 a abaabaa
0 i i+p−q i+p
x a a a
-
p q
ab a ba ababa ababa
abaababaaba
b a b a a baa babaabaa
The Very Basics of Stringology 5
The extreme situation is displayed in the picture and shows (when generalised)
that the condition required on periods in the statement of the Periodicity lemma
cannot be weakened.
Regularities
ab b a b a a bbaba ababab a b a b a b
abab a b
6 The Very Basics of Stringology
The picture illustrates the result of the lemma. The word abbaba is
primitive and there are only two occurrences of it in its square, while ababab
is not primitive and has four occurrences in its square.
The notion of run or maximal periodicity encompasses several types of
regularities occurring in words. A run in the word x is a maximal occurrence
of a periodic factor. To say it more formally, it is an interval [i . . j ] of positions
on x for which exp(x[i . . j ]) ≥ 2 and both x[i − 1 . . j ] and x[i . . j + 1] have
periods larger than that of x[i . . j ] when they exist. In this situation, since the
occurrence is identified by i and j , we also say abusively that x[i . . j ] is a run.
Another type of regularity consists in the appearance of reverse factors or
of palindromes in words. The reverse or mirror image of the word x is the
word x R = x[|x| − 1]x[|x| − 2] · · · x[0]. Associated with this operation is the
notion of palindrome : a word x for which x R = x.
For example, noon and testset are English palindromes. The first is
an even palindrome of the form uuR while the second is an odd palindrome
of the form uauR with a letter a. The letter a can be replaced by a short
word, leading to the notion of gapped palindromes as useful when related to
folding operations like those occurring in sequences of biological molecules.
As another example, integers whose decimal expansion is an even palindrome
are multiples of 11, such as 1661 = 11 × 151 or 175571 = 11 × 15961.
Ordering
Remarkable Words
Besides Lyndon words, three sets of words have remarkable properties and are
often used in examples. They are Thue–Morse words, Fibonacci words and de
Bruijn words. The first two are prefixes of (one-way) infinite words. Formally
an infinite word on the alphabet A is a mapping from natural numbers to A.
Their set is denoted by A∞ .
The notion of (monoid) morphism is central to defining some infinite sets
of words or an associate infinite word. A morphism from A∗ to itself (or
another free monoid) is a mapping h : A∗ → A∗ satisfying h(uv) = h(u)h(v)
for all words u and v. Consequently, a morphism is entirely defined by the
images h(a) of letters a ∈ A.
The Thue–Morse word is produced by iterating the Thue–Morse mor-
phism μ from {a,b}∗ to itself, defined by
μ(a) = ab,
μ(b) = ba.
Iterating the morphism from letter a gives the list of Thue–Morse words μk (a),
k ≥ 0, that starts with
τ0 = μ0 (a) = a
τ1 = μ1 (a) = ab
τ2 = μ2 (a) = abba
τ3 = μ3 (a) = abbabaab
τ4 = μ4 (a) = abbabaabbaababba
τ5 = μ5 (a) = abbabaabbaababbabaababbaabbabaab
Iterating the morphism from letter a gives the list of Fibonacci words φ k (a),
k ≥ 0, that starts with
fib0 = φ 0 (a) = a
fib1 = φ 1 (a) = ab
fib2 = φ 2 (a) = aba
fib3 = φ 3 (a) = abaab
fib4 = φ 4 (a) = abaababa
fib5 = φ 5 (a) = abaababaabaab
fib6 = φ 6 (a) = abaababaabaababaababa
ba a bb b
10 The Very Basics of Stringology
Automata
2
c a
b,c
a b
a b a
0 1 3 4
c b
b,c
c
Trie
2
a
a a
0 1 5 6
b a
a
3 4
b
Suffix Structures
Suffix structures that store the suffixes of a word are important data structures
used to produce efficient indexes. Tries can be used as such but their size can be
quadratic. One solution to cope with that is to compact the trie, resulting in the
Suffix tree of the word. It consists in eliminating non-terminal nodes with only
one outgoing edge and in labelling arcs by factors of the word accordingly.
Eliminated nodes are sometimes called implicit nodes of the Suffix tree and
remaining nodes are called explicit nodes.
Below are the trie T (Suff (aabab)) of suffixes of aabab (on the left)
and its Suffix tree ST (aabab) (on the right). To get a complete linear-size
12 The Very Basics of Stringology
structure, each factor of the word that labels an arc needs to be represented by
a pair of integers such as (position, length).
b a b abab
a 2 3 4 5 5
a 1 b a 1 b
a b ab
6 7 8 6 8
0 0
b b
a b ab
9 10 11 9 11
A second solution to reduce the size of the Suffix trie is to minimise it,
which means considering the minimal deterministic automaton accepting the
suffixes of the word, its Suffix automaton . Below (left) is S(aabab), the
Suffix automaton of aabab.
a a b a b a a b a b
0 1 2 3 4 5 0 1 2 3 4 5
b b
b a b
6
It is known that S(x) possesses fewer than 2|x| states and fewer than 3|x|
arcs, for a total size O(|x|), that is, linear in |x|. The Factor automaton F(x) of
the word, minimal deterministic automaton accepting its factors, can even be
smaller because all its states are terminal. In the above picture, the right part is
the Factor automaton of aabab in which state 6 of S(aabab) is merged with
state 3.
Suffix Array
The Suffix array of a word is also used to produce indexes but proceeds
differently than with trees or automata. It consists primarily in sorting the non-
empty suffixes of the word to allow binary search for its factors. To get actually
efficient searches another feature is considered: the longest common prefixes
of successive suffixes in the sorted list.
The information is stored in two arrays, SA and LCP. The array SA is
the inverse of the array Rank that gives the rank of each suffix attached at
its starting position.
Below are the tables associated with the example word aababa. Its sorted
list of suffixes is a, aababa, aba, ababa, ba and baba whose starting
The Very Basics of Stringology 13
i 0 1 2 3 4 5
x[i] a a b a b a
Rank[i] 1 3 5 2 4 0
r 0 1 2 3 4 5 6 7 8 9 10 11 12
SA[r] 5 0 3 1 4 2
LCP[r] 0 1 1 3 0 2 0 0 1 0 0 0 0
where lcp denotes the longest common prefix between two words. This gives
LCP[0 . . 6] for the example. The next values in LCP[7 . . 12] correspond to the
same information for suffixes starting at positions d and f when the pair (d,f )
appears in the binary search. Formally, for such a pair, the value is stored at
position |x| + 1 + (d + f )/2. For example, in the above LCP array the value
1 corresponding to the pair (0,2), maximal length of prefixes between x[5 . . 5]
and x[3 . . 5], is stored at position 8.
The table Rank is used in applications of the Suffix array that are mainly
other than searching.
Compression
The most powerful compression methods for general texts are based either on
the Ziv–Lempel factorisation of words or on easier techniques on top of the
Burrows–Wheeler transform of words. We give a glimpse of both.
When processing a word online, the goal of Ziv–Lempel compression
scheme is to capture information that has been met before. The associated
factorisation of a word x is u0 u1 · · · uk , where ui is the longest prefix of
ui · · · uk that appears before this occurrence in x. When it is empty, the first
letter of ui · · · uk , which does not occur in u0 · · · ui−1 , is chosen. The factor
ui is sometimes called abusively the longest previous factor at position
|u0 · · · ui−1 | on x.
For example, the factorisation of the word abaabababaaababb is a · b ·
a · aba · baba · aabab · b.
14 The Very Basics of Stringology
There are several variations to define the factors of the decomposition; here
are a few of them. The factor ui may include the letter immediately following
the occurrence of the longest previous factor at position |u0 · · · ui−1 |, which
amounts to extending a factor occurring before. Previous occurrences of factors
may be chosen among the factors u0 , . . . , ui−1 or among all the factors of
u0 · · · ui−1 (to avoid an overlap between occurrences) or among all factors
occurring before. This results in a large variety of text compression software
based on the method.
When designing word algorithms the factorisation is also used to reduce
some online processing by storing what has already been done on previous
occurrences of factors.
The Burrows–Wheeler transform of a word x is a reversible mapping that
transforms x ∈ Ak into BW(x) ∈ Ak . The effect is mostly to group together
letters having the same context in x. The encoding proceeds as follows. Let us
consider the sorted list of rotations (conjugates) of x. Then BW(x) is the word
composed of the last letters of sorted rotations, referred to as the last column
of the corresponding table.
For the example word banana, rotations are listed below on the left and
their sorted list on the right. Then BW(banana) = nnbaaa.
0 b a n a n a 5 a b a n a n
1 a n a n a b 3 a n a b a n
2 n a n a b a 1 a n a n a b
3 a n a b a n 0 b a n a n a
4 n a b a n a 4 n a b a n a
5 a b a n a n 2 n a n a b a
Two conjugate words have the same image by the mapping. Choosing the
Lyndon word as a representative of the class of a primitive word, the mapping
becomes bijective. To recover the original word x other than a Lyndon word,
it is sufficient to keep the position on BW(x) of the first letter of x.
The main property of the transformation is that occurrences of a given letter
are in the same relative order in BW(x) and in the sorted list of all letters. This
is used to decode BW(x).
To do it on nnbaaa from the above example, we first sort the letters getting
the word aaabnn. Knowing that the first letter of the initial word appears at
position 2 on nnbaaa, we can start the decoding: the first letter is b followed
by letter a at the same position 2 on aaabnn. This is the third occurrence
of a in aaabnn corresponding to its third occurrence in nnbaaa, which is
followed by n, and so on.
The Very Basics of Stringology 15
The decoding process is similar to following the cycle in the graph below
from the correct letter. Starting from a different letter produces a conjugate of
the initial word.
BW(banana) n n b a a a
sorted letters a a a b n n
The style of the algorithmic language used here is relatively close to real
programming languages but at a higher abstraction level. We adopt the
following conventions:
• Indentation means the structure of blocks inherent to compound instruc-
tions.
• Lines of code are numbered in order to be referred to in the text.
• The symbol introduces a comment.
• The access to a specific attribute of an object is signified by the name of
the attribute followed by the identifier associated with the object between
brackets.
• A variable that represents a given object (table, queue, tree, word, automa-
ton) is a pointer to this object.
• The arguments given to procedures or to functions are managed by the ‘call
by value’ rule.
• Variables of procedures and functions are local to them unless otherwise
mentioned.
• The evaluation of boolean expressions is performed from left to right in a
lazy way.
• Instructions of the form (m1,m2, . . .) ← (exp1,exp2, . . .) abbreviate the
sequence of assignments m1 ← exp1 , m2 ← exp2 , . . . .
Algorithm Trie below is an example of how algorithms are written. It
produces the trie of a dictionary X, finite set of words X. It successively
considers each word of X during the for loop of lines 2–10 and inserts them
into the structure letter by letter during execution of the for loop of lines 4–9.
When the latter loop is over, the last considered state t, ending the path from
the initial state and labelled by the current word, is set as terminal at line 10.
16 The Very Basics of Stringology
Notes
Basic elements on words introduced in this section follow their presentation
in [74]. They can be found in other textbooks on text algorithms, like those
by Crochemore and Rytter [96], Gusfield [134], Crochemore and Rytter [98]
and Smyth [228]. The notions are also introduced in some textbooks dealing
with the wider topics of combinatorics on words, such as those by Lothaire
[175–177], or in the tutorial by Berstel and Karhumäki [34].
2 Combinatorial Puzzles
18 Combinatorial Puzzles
In 1640 the great French number theorist Pierre de Fermat proved the following
property:
If p is a prime number and k is any natural number
then p divides k p − k.
Solution
To prove the property we consider conjugacy classes of words of the same
length. For example, the conjugacy class containing aaaba is the set
C(aaaba) = {aaaab,aaaba,aabaa,abaaa,baaaa}. The next fact is
a consequence of the Primitivity Lemma.
Notes
When a word w = uq of length n on a k-letter alphabet has a primitive root u
of length d, we have n = qd and the conjugacy class of w contains d elements.
Running d over the divisors of n we get the equality k n = {dψk (d) :
d divisor of n}, where ψk (m) denotes the number of classes of primitive words
of length m. It proves the theorem when n is prime. Further details are in the
book by Lothaire [175, chapter 1].
2 Simple Case of Codicity Testing 19
Solution
A proof idea is given on page 5 as a consequence of the Periodicity Lemma.
Below is a self-contained inductive proof.
If {x,y} is a code, the conclusion follows by definition. Conversely, let us
assume {x,y} is not a code and prove the equality xy = yx. The equality
holds if one of the words is empty, so we are left to consider the two words are
not empty.
The proof is by induction on the length of |xy|. The induction base is the
simple case x = y, for which the equality obviously holds.
Assume that x = y. Then one of the words is a proper prefix of the other
and assume w.l.o.g. that x is a proper prefix of y: y = xz for a non-empty
word z. Then {x,z} is not a code because the two distinct concatenations of x’s
and y’s producing the same word translate into two distinct concatenations of
x’s and z’s producing the word.
The inductive hypothesis applies because |xz| < |xy| and yields xz = zx.
Consequently xy = xxz = xzx = yx, which shows that the equality holds for
x and y, and achieves the proof.
Notes
The same type of proof shows that {x,y} is not a code if x k = y for two
positive integers k and .
We do not know if there is a special codicity test for three words in terms of a
fixed set of inequalities. For a finite number of words, an efficient polynomial-
time algorithm using a graph-theoretical approach is given in Problem 52.
20 Combinatorial Puzzles
The goal of the problem is to build magic squares with the help of the infinite
Thue–Morse word t on the binary alphabet {0,1} (instead of {a,b}). The word
t is μ∞ (0) obtained by iterating the morphism μ defined by μ(0) = 01 and
μ(1) = 10:
t = 01101001100101101001 · · · .
The n × n array Sn , where n = 2m for a positive natural number m is defined,
for 0 ≤ i,j < n, by
Sn [i,j ] = t[k](k + 1) + (1 − t[k])(n2 − k),
where k = i.n + j . The generated array S4 is
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
The array is a magic square because it contains all the integers from 1 to 16
and the sum of elements on each row is 34, as well as the sums on each column
and on each diagonal.
Question. Show the n × n array Sn is a magic square for any natural number
n power of 2.
Solution
To understand the structure of the array Sn let Tn be the Thue–Morse
2-dimensional word of shape n × n, where n = 2m , defined, for 0 ≤ i,j < n,
by Tn [i,j ] = t[i.n + j ]. The picture displays T4 and T8 , where ∗ substitutes
for 0 and space substitutes for 1.
* * * *
* * * *
* * * *
* * * *
* * * * * *
* * * * * *
* * * * * *
* * * * * *
3 Magic Squares and the Thue–Morse Word 21
Correctness for rows. According to property (i) each block in a row is of type
0110 or type 1001. Consider a block 0110 whose first element is the kth
element in the array. Then
S[k,k + 1,k + 2,k + 3] = [n2 − k, k + 2, k + 3, n2 − k − 3],
which sums to 2n2 + 2. For a block whose type is different from 0110 we get
[k + 1,n2 − k − 1,n2 − k − 2,k + 4], whose sum is the same value. Since we
have n/4 such blocks in a row, the sum of all their contributions is
n n
· (2n2 + 2) = (n2 + 1),
4 2
as required.
The correctness for columns can be shown similarly.
Correctness for diagonals. Let us consider only the diagonal from (0,0) to
(n − 1,n − 1) since the other diagonal can be treated similarly. Entries on the
diagonal are 1,1+(n+1),1+2(n+1), . . . ,1+(n−1)(n+1), listed bottom-up.
Their sum is
n−1
n n
n + (n + 1) i = n + (n + 1) (n − 1) = (n2 + 1),
2 2
i=0
as required.
This achieves the proof that Sn is a magic square.
Notes
More on magic squares and their long history may be found on Wikipedia:
https://en.wikipedia.org/wiki/Magic_square.
22 Combinatorial Puzzles
4 Oldenburger–Kolakoski Sequence
Question. Show that we can generate online the first n symbols of the
sequence K in O(n) time and O(log n) space.
Solution
As h is defined, h(x) = y if and only if y starts with 2 and blocks(y) = x.
How to generate hk+1 (2) from hk (2). Let x = hk (2). Then y = hk+1 (2) =
h(x) results by replacing the letter x[i] of x either by x[i] occurrences of letter
2 if i is even or by x[i] occurrences of letter 1 if i is odd. The word K is the
limit of Kk = hk (2) when k goes to infinity. The first iterations of h give
⎧
⎪
⎪ h(2) = 22
⎨ 2
h (2) = 22 11
⎪
⎪ h 3 (2) = 22 11 2 1
⎩ 4
h (2) = 22 11 2 1 22 1
We leave for the reader the following technical fact.
Observation. n = O(log |Kn |) and nk=0 |Kk | = O(|Kn |).
Let T be the parsing tree associated with Kn . Its leaves correspond to
positions on Kn . For a position i, 0 ≤ i < |Kn |, RightBranch(i) denotes
4 Oldenburger–Kolakoski Sequence 23
the path from the ith leaf upwards to the first node on the leftmost branch of
the tree (see picture).
2 2
2 2 1 1
2 2 1 1 2 1
2 2 1 1 2 1 2 2 1
2 2 1 1 2 1 2 2 1 2 2 1 1 2
2 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 2 1 2 1 1
The figure illustrates the parsing tree of K6 = h6 (2). Each level represents
hk (2) for k = 0,1, . . . ,6. The RightBranch of position 10 (circled leaf)
consists of the thick edges and their endpoints. It starts from the leaf and goes
up to finish at the first node on the leftmost branch.
To every node on the RightBranch is attached one bit of information: the parity
of the numbers of nodes to the left on its level.
If for each node we know its label and whether it is a left child, then
from RightBranch(i) the symbol at position (i + 1) as well as the whole
RightBranch(i + 1) are computed in logarithmic space and amortised constant
time due to the observation (since lengths of paths are logarithmic and the size
of the whole tree is linear). The process works as follows on a suffix of the
RightBranch. It goes up the tree to find the first left child, then goes down to
the right from its parent and continues until it reaches the next leaf. Basically
it goes up to the lowest common ancestor of leaves i and i + 1 and in a certain
sense each iteration can be seen as an in-order traversal of the parsing tree
using small memory.
The RightBranch may grow upwards, as happens when changing
RightBranch(13) to RightBranch(14) in the example. This is a top-level
description of the algorithm and technical details are omitted.
Notes
The Oldenburger–Kolakoski sequence, often referred to as just the Kolakoski
sequence, was designed by Oldenburger [197] and later popularised by
Kolakoski [166]. The sequence is an example of a smooth word, see [46]. Our
sketch of the algorithm is a version of the algorithm by Nilsson [195]; see also
https://en.wikipedia.org/wiki/Kolakoski_sequence.
24 Combinatorial Puzzles
5 Square-Free Game
Ann behaves like a finite deterministic automaton whose output has six states.
A possible sequence of moves starting with 1 2, potentially winning for
Ann, is
1 2 1 2 2 0 1 0 0 2 1 2 2 0.
Question. (A) Show that Ann always wins against Ben in the odd square-
free game of any even length n.
(B) Describe a winning strategy for Ann in the square-free game over an
alphabet of size 9.
[Hint: To prove (A) show w contains no odd-square. For point (B) mix a
simple even-square strategy with the former strategy.]
Solution
Point (A). We show point (A) by contradiction that Ann’s strategy is win-
ning and assume the word w (history of the game) contains an odd-square
uu (|u| > 1).
5 Square-Free Game 25
where the letters bi and bj correspond to Ben’s moves and the others to Ann’s
moves.
Since uu is a square we get b0 = a0 , a1 = b1 , . . . , bk = ak . Due to Ann’s
strategy we have a1 = b0 , a2 = b1 , etc.; that is, each two adjacent letters in uu
are distinct. In particular, this implies that Ben never repeats the last move of
Ann in uu.
Consequently all moves of Ann are the same; that is, all letters ai , aj are the
same. Hence ak = ak but at the same time ak = bk since uu is a square. This
implies bk = ak and that Ben repeats the last move of Ann, a contradiction.
This completes the proof for this case.
where as before the letters bi ,bj correspond to Ben’s moves and the others to
Ann’s moves.
Similarly to the previous case we can prove that Ben always makes a move
different from the last move of Ann, except that it can happen that ak = b0 .
If so, a1 = bk , since a1 = 3 − ak − bk , and later a1 = a2 = · · · = ak .
Consequently ak = bk but at the same time ak = bk , since uu is a square, a
contradiction.
If ak = b0 all moves of Ben are different from those of Ann, who
consequently always does the same move in uu. This leads to a contradiction
in the same way as in case 1.
This completes the proof of this case and shows that Ann’s strategy is
winning.
Point (B). If the game concerns non-trivial even squares on the alphabet
{0,1,2} a winning strategy for Ann is extremely simple: in her kth move
she adds the kth letter of any (initially fixed) square-free word over the same
alphabet.
Combining in a simple way strategies (using them simultaneously) for non-
trivial odd and even square-free games, Ann gets a winning strategy avoiding
general non-trivial squares on a 9-letter alphabet. The alphabet now consists of
pairs (e,e ) of letters in {0,1,2}. The history of the game is a word of the form
26 Combinatorial Puzzles
Notes
The solution of the game presented in the problem is described in [132], where
the number of letters was additionally decreased to 7 using more complicated
arguments. However, a flaw was discovered by Kosinski et al.; see [169], where
the number of letters is reduced just to 8.
Let pos (k,c), k > 0, denote the position of the kth occurrence of letter c in
the infinite Fibonacci word f.
Question. Show how to compute the position of the kth occurrence of letter
a in the Fibonacci word f in time O(log k). The same applies for the letter b.
Solution
To understand the structure of Fibonacci representations, let us consider the
rectangle Rn whose rows are representations of the first | fibn | = Fn+2 natural
numbers. Representations are possibly right padded with 0’s to get n digits.
The rectangles are given by the recurrence shown in the picture below.
0
0 0 0 .
Rn+1
0 0 1 0 0 .
0
R1 = R2 = 1 0 R3 = 0 1 0 Rn+2 =
1
0 1 0 0 1 0 1
1 0 1 Rn . .
Answer to the second question. The limit of tables Rn is the infinite table R∞
of Fibonacci representations of all consecutive natural numbers in increasing
order. In each row, letters to the right of the rightmost occurrence of 1 are
non-significant digits equal to zero.
Zeros in the first column of R∞ correspond to a’s in the Fibonacci word.
Rows starting with 0’s are of the form
0 · x0, 0 · x1, 0 · x2, . . . ,
where
x0, x1, x2, . . .
is the sequence of representations of consecutive natural numbers.
Hence the kth zero corresponds to xk−1 and occurs at position 0 · xk−1 ,
which gives r(pos(k,a)) = 0 · r(k − 1).
Similarly we get r(pos(k,b)) = 10 · r(k − 1), since all rows containing 1
in the first column of R∞ start in fact with 10.
28 Combinatorial Puzzles
Notes
The problem material is by Rytter [216].
Question. Is there any close relation between Wythoff’s game and the
infinite Fibonacci word?
Solution
Losing configurations in Wythoff’s game are closely related to the Fibonacci
word. Let WytLost denote the set of losing configurations. It contains pairs of
the form (m,n), 0 < m < n:
Denoting by (mk ,nk ) the kth lexicographically smallest pair of the set we get
Fact 2.
(i) M ∩ N = ∅ and M ∪ N = {1,2,3, . . .}.
(ii) nk = mk + k for every k > 0.
Fact 2 is used to derive Fact 1. It is enough to prove that both properties (i)
and (ii) hold for the sets M = {pos(k,a) + 1 : k > 0} and N = {pos(k,b) +
1 : k > 0}.
Property (i) obviously holds and property (ii) follows from the hint pre-
sented and proved in Problem 6:
where r(i) stands for the Fibonacci representation of the natural number i. To
show that pos(k,b) + 1 − pos(k,a) + 1 = k it is sufficient to prove that for
any Fibonacci representation x of a positive integer we have (10x)F −(0x)F =
(x)F +1, where (y)F denotes the number i for which r(i) = y. But this follows
directly from the definition of the Fibonacci representation and achieves the
proof.
Exploring the Variety of Random
Documents with Different Content
In some the merest skime.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com