0% found this document useful (0 votes)
213 views

125 Problems in Text Algorithms 1st Edition Maxime Crochemore 2024 scribd download

Text

Uploaded by

zafatikurra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views

125 Problems in Text Algorithms 1st Edition Maxime Crochemore 2024 scribd download

Text

Uploaded by

zafatikurra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Visit https://ebookfinal.

com to download the full version and


explore more ebook

125 Problems in Text Algorithms 1st Edition Maxime


Crochemore

_____ Click the link below to download _____


https://ebookfinal.com/download/125-problems-in-text-
algorithms-1st-edition-maxime-crochemore/

Explore and download more ebook at ebookfinal.com


Here are some recommended products that might interest you.
You can download now and explore!

Jewels of Stringology 1st Edition Maxime Crochemore

https://ebookfinal.com/download/jewels-of-stringology-1st-edition-
maxime-crochemore/

ebookfinal.com

Genetic Algorithms in Elixir Solve Problems Using


Evolution 1st Edition Sean Moriarity

https://ebookfinal.com/download/genetic-algorithms-in-elixir-solve-
problems-using-evolution-1st-edition-sean-moriarity/

ebookfinal.com

Algorithms Illuminated Part 4 Algorithms for NP Hard


Problems 3rd Edition Tim Roughgarden

https://ebookfinal.com/download/algorithms-illuminated-
part-4-algorithms-for-np-hard-problems-3rd-edition-tim-roughgarden/

ebookfinal.com

Algorithms and Ordering Heuristics for Distributed


Constraint Satisfaction Problems 1st Edition Mohamed Wahbi

https://ebookfinal.com/download/algorithms-and-ordering-heuristics-
for-distributed-constraint-satisfaction-problems-1st-edition-mohamed-
wahbi/
ebookfinal.com
Vehicle Scheduling in Port Automation Advanced Algorithms
for Minimum Cost Flow Problems Second Edition Hassan
Rashidi
https://ebookfinal.com/download/vehicle-scheduling-in-port-automation-
advanced-algorithms-for-minimum-cost-flow-problems-second-edition-
hassan-rashidi/
ebookfinal.com

Charles Dickens and Europe 1st Edition Maxime Leroy

https://ebookfinal.com/download/charles-dickens-and-europe-1st-
edition-maxime-leroy/

ebookfinal.com

Management Accounting Text Problems and Cases 6th Edition


M. Y. Khan

https://ebookfinal.com/download/management-accounting-text-problems-
and-cases-6th-edition-m-y-khan/

ebookfinal.com

Implementing Useful Algorithms in C 1st Edition Dmytro


Kedyk

https://ebookfinal.com/download/implementing-useful-algorithms-
in-c-1st-edition-dmytro-kedyk/

ebookfinal.com

Algorithms in a Nutshell 1st Edition George T. Heineman

https://ebookfinal.com/download/algorithms-in-a-nutshell-1st-edition-
george-t-heineman/

ebookfinal.com
125 Problems in Text Algorithms 1st Edition Maxime
Crochemore Digital Instant Download
Author(s): Maxime Crochemore, Thierry Lecroq, Wojciech Rytter
ISBN(s): 9781108835831, 110883583X
Edition: 1
File Details: PDF, 9.53 MB
Year: 2021
Language: english
125 Problems in Text Algorithms

String matching is one of the oldest algorithmic techniques, yet still one of the most
pervasive in computer science. The past 20 years have seen technological leaps in
applications as diverse as information retrieval and compression. This copiously
illustrated collection of puzzles and exercises in key areas of text algorithms and
combinatorics on words offers graduate students and researchers a pleasant and direct
way to learn and practice with advanced concepts.
The problems are drawn from a large range of scientific publications, both classic
and new. Building up from the basics, the book goes on to showcase problems in
combinatorics on words (including Fibonacci or Thue–Morse words), pattern
matching (including Knuth–Morris–Pratt and Boyer–Moore–like algorithms), efficient
text data structures (including suffix trees and suffix arrays), regularities in words
(including periods and runs) and text compression (including Huffman, Lempel–Ziv
and Burrows–Wheeler–based methods).

M a x i m e C r o c h e m o r e is Emeritus Professor at Université Gustave Eiffel and


of King’s College London. He holds an honorary doctorate from the University of
Helsinki. He is the author of more than 200 articles on algorithms on strings and their
applications, and co-author of several books on the subject.
T h i e r r y L e c r o q is a professor in the Department of Computer Science at the
University of Rouen Normandy (France). He is currently head of the research team
Information Processing in Biology and Health of the Laboratory of Computer Science,
Information Processing and System. He has been one of the coordinators of the
working group in stringology of the French National Centre for Scientific Research for
more than 10 years.

Wo j c i e c h R y t t e r is a professor at the Faculty of Mathematics, Informatics


and Mechanics, University of Warsaw. He is the author of a large number of
publications on automata, formal languages, parallel algorithms and algorithms on
texts. He is a co-author of several books on these subjects, including Efficient Parallel
Algorithms, Text Algorithms and Analysis of Algorithms and Data Structures. He is a
member of Academia Europaea.
125 Problems in Text Algorithms
With Solutions

M A X I M E C RO C H E M O R E
Gustave Eiffel University

T H I E R RY L E C RO Q
University of Rouen Normandy

W O J C I E C H RY T T E R
University of Warsaw
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781108835831
DOI: 10.1017/9781108869317
© Maxime Crochemore, Thierry Lecroq, Wojciech Rytter 2021
Illustrations designed by Hélène Crochemore
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2021
Printed in the United Kingdom by TJ Books Ltd, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Crochemore, Maxime, 1947– author. | Lecroq, Thierry, author. |
Rytter, Wojciech, author.
Title: One twenty five problems in text algorithms / Maxime Crochemore,
Thierry Lecroq, Wojciech Rytter.
Other titles: 125 problems in text algorithms
Description: New York : Cambridge University Press, 2021. |
The numerals 125 are superimposed over “One twenty five” on the title page. |
Includes bibliographical references and index.
Identifiers: LCCN 2021002037 (print) | LCCN 2021002038 (ebook) |
ISBN 9781108835831 (hardback) | ISBN 9781108798853 (paperback) |
ISBN 9781108869317 (epub)
Subjects: LCSH: Text processing (Computer science)–Problems, exercises, etc. |
Computer algorithms–Problems, exercises, etc.
Classification: LCC QA76.9.T48 C758 2021 (print) |
LCC QA76.9.T48 (ebook) | DDC 005.13–dc23
LC record available at https://lccn.loc.gov/2021002037
LC ebook record available at https://lccn.loc.gov/2021002038
ISBN 978-1-108-83583-1 Hardback
ISBN 978-1-108-79885-3 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents

Preface page ix
1 The Very Basics of Stringology 1

2 Combinatorial Puzzles 17
1 Stringologic Proof of Fermat’s Little Theorem 18
2 Simple Case of Codicity Testing 19
3 Magic Squares and the Thue–Morse Word 20
4 Oldenburger–Kolakoski Sequence 22
5 Square-Free Game 24
6 Fibonacci Words and Fibonacci Numeration System 26
7 Wythoff’s Game and Fibonacci Word 28
8 Distinct Periodic Words 30
9 A Relative of the Thue–Morse Word 33
10 Thue–Morse Words and Sums of Powers 34
11 Conjugates and Rotations of Words 35
12 Conjugate Palindromes 37
13 Many Words with Many Palindromes 39
14 Short Superword of Permutations 41
15 Short Supersequence of Permutations 43
16 Skolem Words 45
17 Langford Words 48
18 From Lyndon Words to de Bruijn Words 50

3 Pattern Matching 53
19 Border Table 54
20 Shortest Covers 56
21 Short Borders 58

v
vi Contents

22 Prefix Table 60
23 Border Table to the Maximal Suffix 62
24 Periodicity Test 65
25 Strict Borders 67
26 Delay of Sequential String Matching 70
27 Sparse Matching Automaton 72
28 Comparison-Effective String Matching 74
29 Strict Border Table of the Fibonacci Word 76
30 Words with Singleton Variables 78
31 Order-Preserving Patterns 81
32 Parameterised Matching 83
33 Good-Suffix Table 85
34 Worst Case of the Boyer–Moore Algorithm 88
35 Turbo-BM Algorithm 90
36 String Matching with Don’t Cares 92
37 Cyclic Equivalence 93
38 Simple Maximal Suffix Computation 96
39 Self-Maximal Words 98
40 Maximal Suffix and Its Period 100
41 Critical Position of a Word 103
42 Periods of Lyndon Word Prefixes 105
43 Searching Zimin Words 107
44 Searching Irregular 2D Patterns 110

4 Efficient Data Structures 111


45 List Algorithm for Shortest Cover 112
46 Computing Longest Common Prefixes 113
47 Suffix Array to Suffix Tree 115
48 Linear Suffix Trie 119
49 Ternary Search Trie 122
50 Longest Common Factor of Two Words 124
51 Subsequence Automaton 126
52 Codicity Test 128
53 LPF Table 130
54 Sorting Suffixes of Thue–Morse Words 134
55 Bare Suffix Tree 137
56 Comparing Suffixes of a Fibonacci Word 139
57 Avoidability of Binary Words 141
58 Avoiding a Set of Words 144
vii

59 Minimal Unique Factors 146


60 Minimal Absent Words 148
61 Greedy Superstring 152
62 Shortest Common Superstring of Short Words 155
63 Counting Factors by Length 157
64 Counting Factors Covering a Position 160
65 Longest Common-Parity Factors 161
66 Word Square-Freeness with DBF 162
67 Generic Words of Factor Equations 164
68 Searching an Infinite Word 166
69 Perfect Words 169
70 Dense Binary Words 173
71 Factor Oracle 175

5 Regularities in Words 180


72 Three Square Prefixes 181
73 Tight Bounds on Occurrences of Powers 183
74 Computing Runs on General Alphabets 185
75 Testing Overlaps in a Binary Word 188
76 Overlap-Free Game 190
77 Anchored Squares 192
78 Almost Square-Free Words 195
79 Binary Words with Few Squares 197
80 Building Long Square-Free Words 199
81 Testing Morphism Square-Freeness 201
82 Number of Square Factors in Labelled Trees 203
83 Counting Squares in Combs in Linear Time 206
84 Cubic Runs 208
85 Short Square and Local Period 210
86 The Number of Runs 212
87 Computing Runs on Sorted Alphabet 214
88 Periodicity and Factor Complexity 219
89 Periodicity of Morphic Words 220
90 Simple Anti-powers 222
91 Palindromic Concatenation of Palindromes 224
92 Palindrome Trees 225
93 Unavoidable Patterns 227
viii Contents

6 Text Compression 230


94 BW Transform of Thue–Morse Words 231
95 BW Transform of Balanced Words 233
96 In-place BW Transform 237
97 Lempel–Ziv Factorisation 239
98 Lempel–Ziv–Welch Decoding 242
99 Cost of a Huffman Code 244
100 Length-Limited Huffman Coding 248
101 Online Huffman Coding 253
102 Run-Length Encoding 256
103 A Compact Factor Automaton 261
104 Compressed Matching in a Fibonacci Word 264
105 Prediction by Partial Matching 266
106 Compressing Suffix Arrays 269
107 Compression Ratio of Greedy Superstrings 271

7 Miscellaneous 275
108 Binary Pascal Words 276
109 Self-Reproducing Words 278
110 Weights of Factors 280
111 Letter-Occurrence Differences 282
112 Factoring with Border-Free Prefixes 283
113 Primitivity Test for Unary Extensions 286
114 Partially Commutative Alphabets 288
115 Greatest Fixed-Density Necklace 290
116 Period-Equivalent Binary Words 292
117 Online Generation of de Bruijn Words 295
118 Recursive Generation of de Bruijn Words 298
119 Word Equations with Given Lengths of Variables 300
120 Diverse Factors over a Three-Letter Alphabet 302
121 Longest Increasing Subsequence 304
122 Unavoidable Sets via Lyndon Words 306
123 Synchronising Words 309
124 Safe-Opening Words 311
125 Superwords of Shortened Permutations 314

Bibliography 318
Index 332
Preface

This book is about algorithms on texts, also called algorithmic stringology.


Text (word, string, sequence) is one of the main unstructured data types and
the subject is of vital importance in computer science.
The subject is versatile because it is a basic requirement in many sciences,
especially in computer science and engineering. The treatment of unstructured
data is a very lively area and demands efficient methods owing both to their
presence in highly repetitive instructions of operating systems and to the vast
amount of data that needs to be analysed on digital networks and equipments.
The latter is clear for information technology companies that manage massive
data in their data centres but also holds for most scientific areas beyond
Computer science.
The book presents a collection of the most interesting representative
problems in stringology. They are introduced in a short and pleasant way
and open doors to more advanced topics. They were extracted from hundreds
of serious scientific publications, some of which are more than a hundred
years old and some are very fresh and up to date. Most of the problems are
related to applications while others are more abstract. The core part of most of
them is an ingenious short algorithmic solution except for a few introductory
combinatorial problems.
This is not just yet another monograph on the subject but a series of
problems (puzzles and exercises). It is a complement to books dedicated to the
subject in which topics are introduced in a more academic and comprehensive
way. Nevertheless, most concepts in the field are included in the book, which
fills a missing gap and is very expected and needed, especially for students and
teachers, as the first problem-solving textbook of the domain.

ix
x Preface

The book is organised into seven chapters:


‘The Very Basics of Stringology’ is a preliminary chapter introducing the ter-
minology, basic concepts and tools for the next chapters and that reflects six
main streams in the area.
‘Combinatorial Puzzles’ is about combinatorics on words, an important topic
because many algorithms are based on combinatorial properties of their
input.
‘Pattern Matching’ deals with the most classical subject, text searching and
string matching.
‘Efficient Data Structures’ is about data structures for text indexing. They are
used as fundamental tools in a large number of algorithms, such as special
arrays and trees associated with texts.
‘Regularities in Words’ concerns regularities that occur in texts, in particular
repetitions and symmetries, that have a strong influence on the efficiency of
algorithms.
‘Text Compression’ is devoted to several methods of the practically important
area of conservative text compression.
‘Miscellaneous’ contains various problems that do not fit in earlier chapters
but certainly deserve presentation.
Problems listed in the book have been accumulated and developed over
several years of teaching on string algorithms in our own different institutions
in France, Poland, UK and USA. They have been taught mostly to master’s
students and are given with solutions as well as with references for further
readings. The content also profits from the experience authors gained in writing
previous textbooks.
Anyone teaching graduate courses on data structures and algorithms can
select whatever they like from our book for their students. However, the overall
book is not elementary and is intended as a reference for researchers, PhD and
master’s students, as well as for academics teaching courses on algorithms
even if they are not directly related to text algorithms. It should be viewed
as a companion to standard textbooks on the domain. The self-contained
presentation of problems provides a rapid access to their understanding and
to their solutions without requiring a deep background on the subject.
The book is useful for specialised courses on text algorithms, as well as
for more general courses on algorithms and data structures. It introduces all
required concepts and notions to solve problems but some prerequisites in
bachelor- or sophomore-level academic courses on algorithms, data structures
and discrete mathematics certainly help in grasping the material more easily.
1 The Very Basics of Stringology
2 The Very Basics of Stringology

In this chapter we introduce basic notation and definitions of words and sketch
several constructions used in text algorithms.
Texts are central in ‘word processing’ systems, which provide facilities
for the manipulation of texts. Such systems usually process objects that are
quite large. Text algorithms occur in many areas of science and information
processing. Many text editors and programming languages have facilities for
processing texts. In molecular biology, for example, text algorithms arise in
the analysis of biological molecular sequences.

Words

An alphabet is a non-empty set whose elements are called letters or symbols.


We typically use alphabets A = {a,b,c, . . .}, B = {0,1} and natural numbers.
A word (mot, in French) or string on an alphabet A is a sequence of elements
of A.

The zero letter sequence is called the empty word and is denoted by ε. The
set of all finite words on an alphabet A is denoted by A∗ , and A+ = A∗ \ {ε}.
The length of a word x, length of the sequence, is denoted by |x|. We
denote by x[i], for i = 0,1, . . . ,|x| − 1, the letter at position or index i
on a non-empty word x. Then x = x[0]x[1] · · · x[|x| − 1] is also denoted by
x[0 . . |x| − 1]. The set of letters that occur in the word x is denoted by alph (x).
For the example x = abaaab we have |x| = 6 and alph (x) = {a,b}.
The product or concatenation of two words x and y is the word composed
of the letters of x followed by the letters of y. It is denoted by xy or by x · y
to emphasise the decomposition of the resulting word. The neutral element for
the product is ε and we denote respectively by zy −1 and x −1 z the words x and
y when z = xy.
A conjugate , rotation or cyclic shift of a word x is any word y that
factorises into vu, where uv = x. This makes sense because the product of
words is obviously non-commutative. For example, the set of conjugates of
abba, its conjugacy class because conjugacy is an equivalence relation, is
{aabb,abba,baab,bbaa} and that of abab is {abab,baba}.
A word x is a factor (sometimes called substring ) of a word y if y = uxv
for two words u and v. When u = ε, x is a prefix of y, and when v = ε, x
is a suffix of y. Sets Fact (x), Pref (x) and Suff (x) denote the sets of factors,
prefixes and suffixes of x respectively.
The Very Basics of Stringology 3

When x is a non-empty factor of y = y[0 . . n − 1] it is of the form y[i . . i +


|x| − 1] for some i. An occurrence of x in y is an interval [i . . i + |x| − 1]
for which x = y[i . . i + |x| − 1]. We say that i is the starting position (or left
position) on y of this occurrence, and that i + |x| − 1 is its ending position (or
right position). An occurrence of x in y can also be defined as a triple (u,x,v)
such that y = uxv. Then the starting position of the occurrence is |u|. For
example, the starting and ending positions of x = aba on y = babaababa
are

i 0 1 2 3 4 5 6 7 8

y[i] b a b a a b a b a
starting positions 1 4 6
ending positions 3 6 8

For words x and y, |y|x denotes the number of occurrences of x in y. Then, for
instance, |y| = {|y|a : a ∈ alph (y)}.
The word x is a subsequence or subword of y if the latter decomposes
into w0 x[0]w1 x[1] . . . x[|x| − 1]w|x| for words w0 , w1 , . . . , w|x| .
A factor or a subsequence x of a word y is said to be proper if x = y.

Periodicity

Let x be a non-empty word. An integer p, 0 < p ≤ |x|, is called a period of


x if x[i] = x[i + p] for i = 0,1, . . . ,|x| − p − 1. Note that the length of a
word is a period of this word, so every non-empty word has at least one period.
The period of x, denoted by per (x), is its smallest period. For example, 3, 6,
7 and 8 are periods of the word aabaabaa, and per (aabaabaa) = 3. Note
that if p is a period of x, its multiples not larger than |x| are also periods of x.
Here is a series of properties equivalent to the definition of a period p of x.
First, x can be factorised uniquely as (uv)k u, where u and v are words, v is
non-empty, k is a positive integer and p = |uv|. Second, x is a prefix of ux for
a word u of length p. Third, x is a factor of uk , where u is a word of length
p and k a positive integer. Fourth, x can be factorised as uw = wv for three
words u, v and w, verifying p = |u| = |v|.
The last point leads to the notion of border. A border of x is a proper
factor of x that is both a prefix and a suffix of x. The border of x, denoted by
Border (x), is its longest border. Thus, ε, a, aa, and aabaa are the borders of
aabaabaa and Border (aabaabaa) = aabaa.
4 The Very Basics of Stringology

a a b a a baa
 -
3 aa baab aa
 -
6 aab aabaa
 -
7 aa baabaa
 -
8 a abaabaa

Borders and periods of x are in one-to-one correspondence because of the


fourth point above: a period p of x is associated with the border x[p . . |x| − 1].
Note that, when defined, the border of a border of x is also a border of x.
Then Border (x), Border 2 (x), . . . , Border k (x) = ε is the list of all borders of
x. The (non-empty) word x is said to be border free if its only border is the
empty word or equivalently if its only period is |x|.

Lemma 1 (Periodicity lemma ) If p and q are periods of a word x and satisfy


p + q − gcd(p,q) ≤ |x| then gcd(p,q) is also a period of x.
The proof of the lemma may be found in textbooks (see Notes). The Weak
Periodicity lemma refers to a variant of the lemma in which the condition is
strengthened to p + q ≤ |x|. Its proof comes readily as follows.

0 i i+p−q i+p
x a a a
 -
p q

The conclusion obviously holds when p = q. Else, w.l.o.g. assume p > q


and let us show first that p − q is a period of x. Indeed, let i be a position on
x for which i + p < |x|. Then x[i] = x[i + p] = x[i + p − q] because p
and q are periods. And if i + p ≥ |x|, the condition implies i − q ≥ 0. Then
x[i] = x[i − q] = x[i + p − q] as before. Thus p − q is a period of x. Iterating
the reasoning or using a recurrence as for Euclid’s algorithm, we conclude that
gcd(p,q) is a period of x.
To illustrate the Periodicity lemma, let us consider a word x that admits 5
and 8 as periods. Then, if we assume moreover that x is composed of at least
two distinct letters, gcd(5,8) = 1 is not a period of x. Thus, the condition of
the lemma cannot hold, that is, |x| < 5 + 8 − gcd(5,8) = 12.

ab a ba ababa ababa
abaababaaba
b a b a a baa babaabaa
The Very Basics of Stringology 5

The extreme situation is displayed in the picture and shows (when generalised)
that the condition required on periods in the statement of the Periodicity lemma
cannot be weakened.

Regularities

The powers of a word x are defined by x 0 = ε and x i = x i−1 x for a positive


integer i. The kth power of x is x k . It is a square if k is a positive even integer
and a cube if k is a positive multiple of 3.
The next lemma states a first consequence of the Periodicity lemma.

Lemma 2 For words x and y, xy = yx if and only if x and y are (integer)


powers of the same word. The same conclusion holds when there exist two
positive integers k and  for which x k = y  .
The proofs of the two parts of the lemma are essentially the same (in
fact the conclusion derives from a more general statement on codes). For
example, if xy = yx, both x and y are borders of the word, then both |x|
and |y| are periods of it and gcd(|x|,|y|) as well by the Periodicity lemma.
Since gcd(|x|,|y|) divides also |xy|, the conclusion follows. The converse
implication is straightforward.
The non-empty word x is said to be primitive if it is not the power of any
other word. That is to say, x is primitive if x = uk , for a word u and a positive
integer k, implies k = 1 and then u = x. For example, abaab is primitive,
while ε and bababa = (ba)3 are not.
It follows from Lemma 2 that a non-empty word has exactly one prim-
itive word it is a power of. When x = uk and u is primitive, u is called
the primitive root of x and k is its exponent , denoted by exp(x). More
generally, the exponent of x is the quantity exp(x) = |x|/per (x), which is not
necessarily an integer, and the word is said to be periodic if its exponent is at
least 2.
Note the number of conjugates of a word, the size of its conjugacy class ,
is the length of its (primitive) root.
Another consequence of the Periodicity lemma follows.

Lemma 3 (Primitivity Lemma , Synchronisation lemma ) A non-empty word


x is primitive if and only if it is a factor of its square only as a prefix and as a
suffix, or equivalently if and only if per (x 2 ) = |x|.

ab b a b a a bbaba ababab a b a b a b
abab a b
6 The Very Basics of Stringology

The picture illustrates the result of the lemma. The word abbaba is
primitive and there are only two occurrences of it in its square, while ababab
is not primitive and has four occurrences in its square.
The notion of run or maximal periodicity encompasses several types of
regularities occurring in words. A run in the word x is a maximal occurrence
of a periodic factor. To say it more formally, it is an interval [i . . j ] of positions
on x for which exp(x[i . . j ]) ≥ 2 and both x[i − 1 . . j ] and x[i . . j + 1] have
periods larger than that of x[i . . j ] when they exist. In this situation, since the
occurrence is identified by i and j , we also say abusively that x[i . . j ] is a run.
Another type of regularity consists in the appearance of reverse factors or
of palindromes in words. The reverse or mirror image of the word x is the
word x R = x[|x| − 1]x[|x| − 2] · · · x[0]. Associated with this operation is the
notion of palindrome : a word x for which x R = x.
For example, noon and testset are English palindromes. The first is
an even palindrome of the form uuR while the second is an odd palindrome
of the form uauR with a letter a. The letter a can be replaced by a short
word, leading to the notion of gapped palindromes as useful when related to
folding operations like those occurring in sequences of biological molecules.
As another example, integers whose decimal expansion is an even palindrome
are multiples of 11, such as 1661 = 11 × 151 or 175571 = 11 × 15961.

Ordering

Some algorithms benefit from the existence of an ordering on the alphabet,


denoted by ≤. The ordering induces the lexicographic ordering or alphabetic
ordering on words as follows. Like the alphabet ordering, it is denoted by ≤.
For x,y ∈ A∗, x ≤ y if and only if either x is a prefix of y or x and y can be
decomposed as x = uav and y = ubw for words u, v and w, letters a and b,
with a < b. Thus, ababb < abba < abbaab when considering a < b and
more generally the natural ordering on the alphabet A.
We say that x is strongly less than y, denoted by x << y, when x ≤ y
but x is not a prefix of y. Note that x << y implies xu << yv for any words
u and v.
Concepts of Lyndon words and of necklaces are built from the lexico-
graphic ordering.
A Lyndon word x is a primitive word that is the smallest among its
conjugates. Equivalently but not entirely obvious, x is smaller than all its
proper non-empty suffixes, and as such is also called a self-minimal word .
As a consequence, x is border-free. It is known that any non-empty word
w factorises uniquely into x0 x1 · · · xk , where xi s are Lyndon words and
The Very Basics of Stringology 7

x0 ≥ x1 ≥ · · · ≥ xk . For example, the word aababaabaaba factorises


as aabab · aab · aab · a, where aabab, aab and a are Lyndon words.
A necklace or minimal word is a word that is the smallest in its conjugacy
class. It is a (integer) power of a Lyndon word. A Lyndon word is a necklace
but, for example, the word aabaab = aab2 is a necklace without being a
Lyndon word.

Remarkable Words

Besides Lyndon words, three sets of words have remarkable properties and are
often used in examples. They are Thue–Morse words, Fibonacci words and de
Bruijn words. The first two are prefixes of (one-way) infinite words. Formally
an infinite word on the alphabet A is a mapping from natural numbers to A.
Their set is denoted by A∞ .
The notion of (monoid) morphism is central to defining some infinite sets
of words or an associate infinite word. A morphism from A∗ to itself (or
another free monoid) is a mapping h : A∗ → A∗ satisfying h(uv) = h(u)h(v)
for all words u and v. Consequently, a morphism is entirely defined by the
images h(a) of letters a ∈ A.
The Thue–Morse word is produced by iterating the Thue–Morse mor-
phism μ from {a,b}∗ to itself, defined by


μ(a) = ab,
μ(b) = ba.

Iterating the morphism from letter a gives the list of Thue–Morse words μk (a),
k ≥ 0, that starts with

τ0 = μ0 (a) = a
τ1 = μ1 (a) = ab
τ2 = μ2 (a) = abba
τ3 = μ3 (a) = abbabaab
τ4 = μ4 (a) = abbabaabbaababba
τ5 = μ5 (a) = abbabaabbaababbabaababbaabbabaab

and eventually produces its infinite associate:

t = lim μk (a) = abbabaabbaababbabaababbaabbabaab · · · .


k→∞
8 The Very Basics of Stringology

An equivalent definition of Thue–Morse words is provided by the following


recurrence:

τ0 = a,
τk+1 = τk τk , for k ≥ 0,

where the bar morphism is defined by a = b and b = a. Note the length of


the kth Thue–Morse word is |τk | = 2k .
A direct definition of t is as follows: the letter t[n] is b if the number
of occurrences of digit 1 in the binary representation of n is odd, and is a
otherwise.
The infinite Thue–Morse word is known to contain no overlap (factor of the
form auaua for a letter a and a word u), that is, no factor of exponent larger
than 2. It is said to be overlap-free .
The Fibonacci word is similarly produced by iterating a morphism, the
Fibonacci morphism φ, from {a,b}∗ to itself, defined by

φ(a) = ab,
φ(b) = a.

Iterating the morphism from letter a gives the list of Fibonacci words φ k (a),
k ≥ 0, that starts with

fib0 = φ 0 (a) = a
fib1 = φ 1 (a) = ab
fib2 = φ 2 (a) = aba
fib3 = φ 3 (a) = abaab
fib4 = φ 4 (a) = abaababa
fib5 = φ 5 (a) = abaababaabaab
fib6 = φ 6 (a) = abaababaabaababaababa

and eventually its infinite associate:

f = lim φ k (a) = abaababaabaababaababaabaababaabaab · · · .


k→∞

An equivalent definition of Fibonacci words comes from the recurrence


relation:


⎨fib0 = a,

fib1 = ab,


⎩fib
k+1 = fibk fibk−1, for k ≥ 1.
The Very Basics of Stringology 9

The sequence of lengths of these words is the sequence of Fibonacci


numbers, that is, |fibk | = Fk+2 . Recall that Fibonacci numbers are defined
by the recurrence


⎪F0 = 0,

F1 = 1,


⎩F
k+1 = Fk + Fk−1, for k ≥ 1.
Among many properties they satisfy are
• gcd(Fn,Fn−1 ) = 1, for n ≥ 2,
√ √
• Fn is the nearest integer of n / 5, where  = 12 (1 + 5) = 1.61803 · · ·
is the golden ratio .
The interest in Fibonacci words comes from the combinatorial properties
they satisfy and the large number of repeats they contain. However, the
infinite Fibonacci word contains no factor of exponent larger than 2 + 1 =
3.61803 · · · .
De Bruijn words are defined here on the alphabet A = {a,b} and are
parameterised by a positive integer k. A word x ∈ A+ is a de Bruijn word of
order k if each word of Ak occurs exactly once in x. As a first example, ab
and ba are the only two de Bruijn words of order 1. As a second example, the
word aaababbbaa is a de Bruijn word of order 3, since its eight factors of
length 3 are the eight words of A3 , that is, aaa, aab, aba, abb, baa, bab,
bba and bbb.
The existence of a de Bruijn word of order k ≥ 2 can be verified with the
help of the de Bruijn automaton defined by
• States are the words of Ak−1 .
• Arcs are of the form (av,b,vb) with a,b ∈ A and v ∈ Ak−2 .
The picture displays the automaton for de Bruijn words of order 3. Note that
exactly two arcs exit each of the states, one labelled by a, the other by b; and
that exactly two arcs enter each of the states, both labelled by the same letter.
The graph associated with the automaton thus satisfies the Euler condition:
every vertex has an even degree. It follows that there exists an Eulerian circuit
in the graph. Its label is a circular de Bruijn word . Appending to it its prefix
of length k − 1 gives an ordinary de Bruijn word.
b
a aa ab
b
a a b

ba a bb b
10 The Very Basics of Stringology

It can also be verified that the number of de Bruijn words of order k is


exponential in k.
De Bruijn words can be defined on larger alphabets and are often used as
examples of limit cases because they contain all the factors of a given length.

Automata

A finite automaton M on the finite alphabet A is composed of a finite set Q


of states , of an initial state q0 , of a set T ⊆ Q of terminal states and of a set
F ⊆ Q × A × Q of labelled edges or arcs corresponding to state transitions .
We denote the automaton M by the quadruplet (Q,q0,T ,F ) or sometimes by
just (Q,F ) when, for example, q0 is implicit and T = Q. We say of an arc
(p,a,q) that it leaves state p and enters state q; state p is the source of the
arc, letter a its label and state q its target . A graphic representation of an
automaton is displayed below.
The number of arcs exiting a given state is called the outgoing degree of
the state. The incoming degree of a state is defined in a dual way. By analogy
with graphs, the state q is a successor by the letter a of the state p when
(p,a,q) ∈ F ; in the same case, we say that the pair (a,q) is a labelled
successor of state p.

2
c a
b,c
a b
a b a
0 1 3 4
c b
b,c
c

A path of length n in the automaton M = (Q,q0,T ,F ) is a sequence


of n consecutive arcs (p0,a0,p0 ),(p1,a1,p1 ), . . . ,(pn−1,an−1,pn−1  ) that

satisfies pk = pk+1 for k = 0,1, . . . ,n − 2. The label of the path is the

word a0 a1 . . . an−1 , its origin the state p0 and its end the state pn−1 . A path in
the automaton M is successful if its origin is the initial state q0 and if its end
is in T . A word is recognised or accepted by the automaton if it is the label
of a successful path. The language composed of the words recognised by the
automaton M is denoted by Lang (M).
The Very Basics of Stringology 11

An automaton M = (Q,q0,T ,F ) is deterministic if for every pair (p,a) ∈


Q × A there exists at most one state q ∈ Q for which (p,a,q) ∈ F . In such
a case, it is natural to consider the transition function δ : Q × A → Q of the
automaton defined for every arc (p,a,q) ∈ F by δ(p,a) = q and undefined
elsewhere. The function δ merely extends to words.
It is known that any language accepted by an automaton is also accepted
by a deterministic automaton and that there is a unique (up to state naming)
minimal deterministic automaton accepting it.

Trie

A trie T on the alphabet A, a kind of digital tree, is an automaton whose


paths from the initial state, the root, do not converge. A trie is used mostly to
represent finite sets of words. If no word of the set is a prefix of another word
of the set, words are associated with the leaves of the trie.
Below is the trie T ({aa,aba,abaaa,abab}). States correspond to pre-
fixes of words in the set. For example, state 3 corresponds to the prefix of
length 2 of both abaaa and abab. Terminal states (doubly circled) 2, 4, 6
and 7 correspond to the words in the set.

2
a
a a
0 1 5 6
b a
a
3 4
b

Suffix Structures

Suffix structures that store the suffixes of a word are important data structures
used to produce efficient indexes. Tries can be used as such but their size can be
quadratic. One solution to cope with that is to compact the trie, resulting in the
Suffix tree of the word. It consists in eliminating non-terminal nodes with only
one outgoing edge and in labelling arcs by factors of the word accordingly.
Eliminated nodes are sometimes called implicit nodes of the Suffix tree and
remaining nodes are called explicit nodes.
Below are the trie T (Suff (aabab)) of suffixes of aabab (on the left)
and its Suffix tree ST (aabab) (on the right). To get a complete linear-size
12 The Very Basics of Stringology

structure, each factor of the word that labels an arc needs to be represented by
a pair of integers such as (position, length).

b a b abab
a 2 3 4 5 5

a 1 b a 1 b
a b ab
6 7 8 6 8
0 0
b b
a b ab
9 10 11 9 11

A second solution to reduce the size of the Suffix trie is to minimise it,
which means considering the minimal deterministic automaton accepting the
suffixes of the word, its Suffix automaton . Below (left) is S(aabab), the
Suffix automaton of aabab.

a a b a b a a b a b
0 1 2 3 4 5 0 1 2 3 4 5
b b
b a b
6

It is known that S(x) possesses fewer than 2|x| states and fewer than 3|x|
arcs, for a total size O(|x|), that is, linear in |x|. The Factor automaton F(x) of
the word, minimal deterministic automaton accepting its factors, can even be
smaller because all its states are terminal. In the above picture, the right part is
the Factor automaton of aabab in which state 6 of S(aabab) is merged with
state 3.

Suffix Array

The Suffix array of a word is also used to produce indexes but proceeds
differently than with trees or automata. It consists primarily in sorting the non-
empty suffixes of the word to allow binary search for its factors. To get actually
efficient searches another feature is considered: the longest common prefixes
of successive suffixes in the sorted list.
The information is stored in two arrays, SA and LCP. The array SA is
the inverse of the array Rank that gives the rank of each suffix attached at
its starting position.
Below are the tables associated with the example word aababa. Its sorted
list of suffixes is a, aababa, aba, ababa, ba and baba whose starting
The Very Basics of Stringology 13

positions are 5, 0, 3, 1, 4 and 2. This latter list is stored in SA indexed by


suffix ranks.

i 0 1 2 3 4 5

x[i] a a b a b a
Rank[i] 1 3 5 2 4 0
r 0 1 2 3 4 5 6 7 8 9 10 11 12

SA[r] 5 0 3 1 4 2
LCP[r] 0 1 1 3 0 2 0 0 1 0 0 0 0

The table LCP essentially contains longest common prefixes stored as


maximal lengths of common prefixes between successive suffixes:

LCP[r] = |lcp (x[SA[r − 1] . . |x| − 1],x[SA[r] . . |x| − 1])|,

where lcp denotes the longest common prefix between two words. This gives
LCP[0 . . 6] for the example. The next values in LCP[7 . . 12] correspond to the
same information for suffixes starting at positions d and f when the pair (d,f )
appears in the binary search. Formally, for such a pair, the value is stored at
position |x| + 1 + (d + f )/2. For example, in the above LCP array the value
1 corresponding to the pair (0,2), maximal length of prefixes between x[5 . . 5]
and x[3 . . 5], is stored at position 8.
The table Rank is used in applications of the Suffix array that are mainly
other than searching.

Compression

The most powerful compression methods for general texts are based either on
the Ziv–Lempel factorisation of words or on easier techniques on top of the
Burrows–Wheeler transform of words. We give a glimpse of both.
When processing a word online, the goal of Ziv–Lempel compression
scheme is to capture information that has been met before. The associated
factorisation of a word x is u0 u1 · · · uk , where ui is the longest prefix of
ui · · · uk that appears before this occurrence in x. When it is empty, the first
letter of ui · · · uk , which does not occur in u0 · · · ui−1 , is chosen. The factor
ui is sometimes called abusively the longest previous factor at position
|u0 · · · ui−1 | on x.
For example, the factorisation of the word abaabababaaababb is a · b ·
a · aba · baba · aabab · b.
14 The Very Basics of Stringology

There are several variations to define the factors of the decomposition; here
are a few of them. The factor ui may include the letter immediately following
the occurrence of the longest previous factor at position |u0 · · · ui−1 |, which
amounts to extending a factor occurring before. Previous occurrences of factors
may be chosen among the factors u0 , . . . , ui−1 or among all the factors of
u0 · · · ui−1 (to avoid an overlap between occurrences) or among all factors
occurring before. This results in a large variety of text compression software
based on the method.
When designing word algorithms the factorisation is also used to reduce
some online processing by storing what has already been done on previous
occurrences of factors.
The Burrows–Wheeler transform of a word x is a reversible mapping that
transforms x ∈ Ak into BW(x) ∈ Ak . The effect is mostly to group together
letters having the same context in x. The encoding proceeds as follows. Let us
consider the sorted list of rotations (conjugates) of x. Then BW(x) is the word
composed of the last letters of sorted rotations, referred to as the last column
of the corresponding table.
For the example word banana, rotations are listed below on the left and
their sorted list on the right. Then BW(banana) = nnbaaa.

0 b a n a n a 5 a b a n a n
1 a n a n a b 3 a n a b a n
2 n a n a b a 1 a n a n a b
3 a n a b a n 0 b a n a n a
4 n a b a n a 4 n a b a n a
5 a b a n a n 2 n a n a b a

Two conjugate words have the same image by the mapping. Choosing the
Lyndon word as a representative of the class of a primitive word, the mapping
becomes bijective. To recover the original word x other than a Lyndon word,
it is sufficient to keep the position on BW(x) of the first letter of x.
The main property of the transformation is that occurrences of a given letter
are in the same relative order in BW(x) and in the sorted list of all letters. This
is used to decode BW(x).
To do it on nnbaaa from the above example, we first sort the letters getting
the word aaabnn. Knowing that the first letter of the initial word appears at
position 2 on nnbaaa, we can start the decoding: the first letter is b followed
by letter a at the same position 2 on aaabnn. This is the third occurrence
of a in aaabnn corresponding to its third occurrence in nnbaaa, which is
followed by n, and so on.
The Very Basics of Stringology 15

The decoding process is similar to following the cycle in the graph below
from the correct letter. Starting from a different letter produces a conjugate of
the initial word.
BW(banana) n n b a a a

sorted letters a a a b n n

Writing Conventions of Algorithms

The style of the algorithmic language used here is relatively close to real
programming languages but at a higher abstraction level. We adopt the
following conventions:
• Indentation means the structure of blocks inherent to compound instruc-
tions.
• Lines of code are numbered in order to be referred to in the text.
• The symbol  introduces a comment.
• The access to a specific attribute of an object is signified by the name of
the attribute followed by the identifier associated with the object between
brackets.
• A variable that represents a given object (table, queue, tree, word, automa-
ton) is a pointer to this object.
• The arguments given to procedures or to functions are managed by the ‘call
by value’ rule.
• Variables of procedures and functions are local to them unless otherwise
mentioned.
• The evaluation of boolean expressions is performed from left to right in a
lazy way.
• Instructions of the form (m1,m2, . . .) ← (exp1,exp2, . . .) abbreviate the
sequence of assignments m1 ← exp1 , m2 ← exp2 , . . . .
Algorithm Trie below is an example of how algorithms are written. It
produces the trie of a dictionary X, finite set of words X. It successively
considers each word of X during the for loop of lines 2–10 and inserts them
into the structure letter by letter during execution of the for loop of lines 4–9.
When the latter loop is over, the last considered state t, ending the path from
the initial state and labelled by the current word, is set as terminal at line 10.
16 The Very Basics of Stringology

Trie(X finite set of words)


1 M ← New-automaton()
2 for each string x ∈ X do
3 t ← initial(M)
4 for each letter a of x, sequentially do
5 p ← Target(t,a)
6 if p = nil then
7 p ← New-state()
8 Succ [t] ← Succ [t] ∪ {(a,p)}
9 t ←p
10 terminal [t] ← true
11 return M

Notes
Basic elements on words introduced in this section follow their presentation
in [74]. They can be found in other textbooks on text algorithms, like those
by Crochemore and Rytter [96], Gusfield [134], Crochemore and Rytter [98]
and Smyth [228]. The notions are also introduced in some textbooks dealing
with the wider topics of combinatorics on words, such as those by Lothaire
[175–177], or in the tutorial by Berstel and Karhumäki [34].
2 Combinatorial Puzzles
18 Combinatorial Puzzles

1 Stringologic Proof of Fermat’s Little Theorem

In 1640 the great French number theorist Pierre de Fermat proved the following
property:
If p is a prime number and k is any natural number
then p divides k p − k.

The statement is known as Fermat’s little theorem. For example:

7 divides 27 − 2 and 101 divides 10101 − 10.

Question. Prove Fermat’s little theorem using only stringologic arguments.


[Hint: Count conjugacy classes of words of length p.]

Solution
To prove the property we consider conjugacy classes of words of the same
length. For example, the conjugacy class containing aaaba is the set
C(aaaba) = {aaaab,aaaba,aabaa,abaaa,baaaa}. The next fact is
a consequence of the Primitivity Lemma.

Observation. The conjugacy class of a primitive word w contains exactly |w|


distinct words.
Let us consider the set of words of length p, a prime number, over the
alphabet {1,2, . . . ,k} and let Sk (p) be its subset of primitive words. Among
the k p words only k of them are not primitive, namely words of the form a p
for a letter a. Thus we arrive at the following observation.

Observation. The number |Sk (p)| of primitive words of length p, a prime


number, on a k-letter alphabet is k p − k.
Since words in Sk (p) are primitive, the conjugacy class of each of them is
of size p. Conjugacy classes partition Sk (p) into sets of size p, which implies
that p divides k p − k and that there are (k p − k)/p classes. This proves the
theorem.

Notes
When a word w = uq of length n on a k-letter alphabet has a primitive root u
of length d, we have n = qd and the conjugacy class of w contains d elements.
Running d over the divisors of n we get the equality k n = {dψk (d) :
d divisor of n}, where ψk (m) denotes the number of classes of primitive words
of length m. It proves the theorem when n is prime. Further details are in the
book by Lothaire [175, chapter 1].
2 Simple Case of Codicity Testing 19

2 Simple Case of Codicity Testing

A set {w1,w2, . . . ,wn } of words drawn from an alphabet A is a (uniquely


decipherable) code if for every two sequences (noted as words) i1 i2 · · · ik and
j1 j2 · · · j of indices from {1,2, . . . ,n} we have

i1 i2 · · · ik = j1 j2 · · · j ⇒ wi1 wi2 · · · wik = wj1 wj2 · · · wj .

In other words, if we define the morphism h from {1,2, . . . ,n}∗ to A∗ by


h(i) = wi , for i ∈ {1,2, . . . ,n}, the condition means that the morphism is
injective.
For an arbitrary integer n there is no known linear-time algorithm for testing
the codicity property. However, the situation is extremely simple for n = 2: it
is enough to check if the two codewords commute, that is, if w1 w2 = w2 w1 .

Question. Show that {x,y} is a code if and only if xy = yx.

Solution
A proof idea is given on page 5 as a consequence of the Periodicity Lemma.
Below is a self-contained inductive proof.
If {x,y} is a code, the conclusion follows by definition. Conversely, let us
assume {x,y} is not a code and prove the equality xy = yx. The equality
holds if one of the words is empty, so we are left to consider the two words are
not empty.
The proof is by induction on the length of |xy|. The induction base is the
simple case x = y, for which the equality obviously holds.
Assume that x = y. Then one of the words is a proper prefix of the other
and assume w.l.o.g. that x is a proper prefix of y: y = xz for a non-empty
word z. Then {x,z} is not a code because the two distinct concatenations of x’s
and y’s producing the same word translate into two distinct concatenations of
x’s and z’s producing the word.
The inductive hypothesis applies because |xz| < |xy| and yields xz = zx.
Consequently xy = xxz = xzx = yx, which shows that the equality holds for
x and y, and achieves the proof.

Notes
The same type of proof shows that {x,y} is not a code if x k = y  for two
positive integers k and .
We do not know if there is a special codicity test for three words in terms of a
fixed set of inequalities. For a finite number of words, an efficient polynomial-
time algorithm using a graph-theoretical approach is given in Problem 52.
20 Combinatorial Puzzles

3 Magic Squares and the Thue–Morse Word

The goal of the problem is to build magic squares with the help of the infinite
Thue–Morse word t on the binary alphabet {0,1} (instead of {a,b}). The word
t is μ∞ (0) obtained by iterating the morphism μ defined by μ(0) = 01 and
μ(1) = 10:
t = 01101001100101101001 · · · .
The n × n array Sn , where n = 2m for a positive natural number m is defined,
for 0 ≤ i,j < n, by
Sn [i,j ] = t[k](k + 1) + (1 − t[k])(n2 − k),
where k = i.n + j . The generated array S4 is

16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1

The array is a magic square because it contains all the integers from 1 to 16
and the sum of elements on each row is 34, as well as the sums on each column
and on each diagonal.

Question. Show the n × n array Sn is a magic square for any natural number
n power of 2.

Solution
To understand the structure of the array Sn let Tn be the Thue–Morse
2-dimensional word of shape n × n, where n = 2m , defined, for 0 ≤ i,j < n,
by Tn [i,j ] = t[i.n + j ]. The picture displays T4 and T8 , where ∗ substitutes
for 0 and space substitutes for 1.

* * * *
* * * *
* * * *
* * * *
* * * * * *
* * * * * *
* * * * * *
* * * * * *
3 Magic Squares and the Thue–Morse Word 21

Notice that the table Tn satisfies two simple properties:


(i) Every row and every column is made up of blocks 0110 and 1001.
(ii) Each of the two main diagonals is homogeneous, consisting only of 0’s or
only of 1’s (respectively stars and spaces on the picture).
It is clear from the definition that the n × n matrix Sn is filled with all the
integers from 1 to n2 . To prove it is a magic square we have to show that the
sum of all entries in any row, in any column or in any of the two diagonals is
the same, that is, n2 (n2 + 1).

Correctness for rows. According to property (i) each block in a row is of type
0110 or type 1001. Consider a block 0110 whose first element is the kth
element in the array. Then
S[k,k + 1,k + 2,k + 3] = [n2 − k, k + 2, k + 3, n2 − k − 3],
which sums to 2n2 + 2. For a block whose type is different from 0110 we get
[k + 1,n2 − k − 1,n2 − k − 2,k + 4], whose sum is the same value. Since we
have n/4 such blocks in a row, the sum of all their contributions is
n n
· (2n2 + 2) = (n2 + 1),
4 2
as required.
The correctness for columns can be shown similarly.

Correctness for diagonals. Let us consider only the diagonal from (0,0) to
(n − 1,n − 1) since the other diagonal can be treated similarly. Entries on the
diagonal are 1,1+(n+1),1+2(n+1), . . . ,1+(n−1)(n+1), listed bottom-up.
Their sum is

n−1
n n
n + (n + 1) i = n + (n + 1) (n − 1) = (n2 + 1),
2 2
i=0

as required.
This achieves the proof that Sn is a magic square.

Notes
More on magic squares and their long history may be found on Wikipedia:
https://en.wikipedia.org/wiki/Magic_square.
22 Combinatorial Puzzles

4 Oldenburger–Kolakoski Sequence

The Oldenburger–Kolakoski sequence is an autodescriptive and self-


generating infinite sequence of symbols {1,2}. More technically, it is its own
run-length encoding. The sequence, denoted here by K, is one of the strangest
sequences. Despite the simplicity of its generation it appears to have a random
behaviour.
By a block of letters in a word we mean a run of letters, that is, a maximal
factor consisting of occurrences of the same letter. The operation blocks(S)
replaces each block of a word S by its length. For example,
blocks(2 1 1 1 2 2 1 2 2 2) = 1 3 2 1 3.
The sequence K is the unique infinite sequence over the alphabet {1,2} that
starts with 2 and satisfies blocks(K) = K.

Remark. Usually the sequence is defined to start with 1, but it is more


convenient here that it starts with 2. In fact, these are the same sequences after
removing the first occurrence of 1.

Question. Show that we can generate online the first n symbols of the
sequence K in O(n) time and O(log n) space.

[Hint: Produce K by iterating h = blocks−1 from 2.]


The very small space used for the generation of K is the most interesting
element of the question.

Solution
As h is defined, h(x) = y if and only if y starts with 2 and blocks(y) = x.

How to generate hk+1 (2) from hk (2). Let x = hk (2). Then y = hk+1 (2) =
h(x) results by replacing the letter x[i] of x either by x[i] occurrences of letter
2 if i is even or by x[i] occurrences of letter 1 if i is odd. The word K is the
limit of Kk = hk (2) when k goes to infinity. The first iterations of h give


⎪ h(2) = 22
⎨ 2
h (2) = 22 11

⎪ h 3 (2) = 22 11 2 1
⎩ 4
h (2) = 22 11 2 1 22 1
We leave for the reader the following technical fact.

Observation. n = O(log |Kn |) and nk=0 |Kk | = O(|Kn |).
Let T be the parsing tree associated with Kn . Its leaves correspond to
positions on Kn . For a position i, 0 ≤ i < |Kn |, RightBranch(i) denotes
4 Oldenburger–Kolakoski Sequence 23

the path from the ith leaf upwards to the first node on the leftmost branch of
the tree (see picture).

2 2

2 2 1 1

2 2 1 1 2 1

2 2 1 1 2 1 2 2 1

2 2 1 1 2 1 2 2 1 2 2 1 1 2

2 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 2 1 2 1 1

The figure illustrates the parsing tree of K6 = h6 (2). Each level represents
hk (2) for k = 0,1, . . . ,6. The RightBranch of position 10 (circled leaf)
consists of the thick edges and their endpoints. It starts from the leaf and goes
up to finish at the first node on the leftmost branch.
To every node on the RightBranch is attached one bit of information: the parity
of the numbers of nodes to the left on its level.
If for each node we know its label and whether it is a left child, then
from RightBranch(i) the symbol at position (i + 1) as well as the whole
RightBranch(i + 1) are computed in logarithmic space and amortised constant
time due to the observation (since lengths of paths are logarithmic and the size
of the whole tree is linear). The process works as follows on a suffix of the
RightBranch. It goes up the tree to find the first left child, then goes down to
the right from its parent and continues until it reaches the next leaf. Basically
it goes up to the lowest common ancestor of leaves i and i + 1 and in a certain
sense each iteration can be seen as an in-order traversal of the parsing tree
using small memory.
The RightBranch may grow upwards, as happens when changing
RightBranch(13) to RightBranch(14) in the example. This is a top-level
description of the algorithm and technical details are omitted.

Notes
The Oldenburger–Kolakoski sequence, often referred to as just the Kolakoski
sequence, was designed by Oldenburger [197] and later popularised by
Kolakoski [166]. The sequence is an example of a smooth word, see [46]. Our
sketch of the algorithm is a version of the algorithm by Nilsson [195]; see also
https://en.wikipedia.org/wiki/Kolakoski_sequence.
24 Combinatorial Puzzles

5 Square-Free Game

A non-trivial square is a word over an alphabet A of the form uu, where


|u| > 1, and it is an odd-square if in addition |u| is an odd number.
The square-free game of length n over A is played between two players,
Ann and Ben. The players extend an initially empty word w by alternately
appending letters to the word. The game ends when the length of the emerging
word is n or a non-trivial square has been created earlier. We assume that Ben
makes the first move and that n is even. Ann wins if there are no non-trivial
squares in the final word. Otherwise, Ben is the winner.

Odd square-free game. In this limited game Ann wins if no odd-square


occurs. On the alphabet A = {0,1,2} we describe Ann’s winning strategy
as follows. Ann never makes the same move as Ben’s last move, and if Ben
repeats Ann’s last move then she does not repeat his previous move.
To do so, Ann remembers the pair (b,a), where a is the letter appended
during her previous move and b is that from Ben’s previous move. In other
terms, the word w is of even length and after the first move is of the form w =
vba. Then Ben adds c and Ann responds by adding d to get w = vbacd, where

a if c = a,
d =
3 − b − a otherwise.

Ann behaves like a finite deterministic automaton whose output has six states.
A possible sequence of moves starting with 1 2, potentially winning for
Ann, is

1 2 1 2 2 0 1 0 0 2 1 2 2 0.

Question. (A) Show that Ann always wins against Ben in the odd square-
free game of any even length n.
(B) Describe a winning strategy for Ann in the square-free game over an
alphabet of size 9.

[Hint: To prove (A) show w contains no odd-square. For point (B) mix a
simple even-square strategy with the former strategy.]

Solution
Point (A). We show point (A) by contradiction that Ann’s strategy is win-
ning and assume the word w (history of the game) contains an odd-square
uu (|u| > 1).
5 Square-Free Game 25

Case 1. The first letter of uu is from a move by Ben.


The square is of the form

uu = b0 a1 b1 a2 b2 · · · ak bk a0 b1 a1 b2 a2 · · · bk ak ,

where the letters bi and bj correspond to Ben’s moves and the others to Ann’s
moves.
Since uu is a square we get b0 = a0 , a1 = b1 , . . . , bk = ak . Due to Ann’s
strategy we have a1 = b0 , a2 = b1 , etc.; that is, each two adjacent letters in uu
are distinct. In particular, this implies that Ben never repeats the last move of
Ann in uu.
Consequently all moves of Ann are the same; that is, all letters ai , aj are the
same. Hence ak = ak but at the same time ak = bk since uu is a square. This
implies bk = ak and that Ben repeats the last move of Ann, a contradiction.
This completes the proof for this case.

Case 2. The first letter of uu is from a move by Ann.


The square is of the form

uu = a0 b1 a1 b2 a2 · · · bk ak b0 a1 b1 a2 b2 · · · ak bk ,

where as before the letters bi ,bj correspond to Ben’s moves and the others to
Ann’s moves.
Similarly to the previous case we can prove that Ben always makes a move
different from the last move of Ann, except that it can happen that ak = b0 .
If so, a1 = bk , since a1 = 3 − ak − bk , and later a1 = a2 = · · · = ak .
Consequently ak = bk but at the same time ak = bk , since uu is a square, a
contradiction.
If ak = b0 all moves of Ben are different from those of Ann, who
consequently always does the same move in uu. This leads to a contradiction
in the same way as in case 1.
This completes the proof of this case and shows that Ann’s strategy is
winning.

Point (B). If the game concerns non-trivial even squares on the alphabet
{0,1,2} a winning strategy for Ann is extremely simple: in her kth move
she adds the kth letter of any (initially fixed) square-free word over the same
alphabet.
Combining in a simple way strategies (using them simultaneously) for non-
trivial odd and even square-free games, Ann gets a winning strategy avoiding
general non-trivial squares on a 9-letter alphabet. The alphabet now consists of
pairs (e,e ) of letters in {0,1,2}. The history of the game is a word of the form
26 Combinatorial Puzzles

w = (e1,e1 )(e2,e2 ) · · · (ek ,ek ) for which e1 e2 · · · ek contains no odd-square


and e1 e2 · · · ek contains no non-trivial even square.

Notes
The solution of the game presented in the problem is described in [132], where
the number of letters was additionally decreased to 7 using more complicated
arguments. However, a flaw was discovered by Kosinski et al.; see [169], where
the number of letters is reduced just to 8.

6 Fibonacci Words and Fibonacci Numeration System

Let r(m) denote the Fibonacci representation of a non-negative integer m.


It is a word x of length  on the alphabet {0,1} ending with 1 except for
m = 0, containing no two consecutive occurrences of 1 and that satisfies

m = −1 i=0 x[i] · Fi+2 , where Fi+2 is the (i + 2)th Fibonacci number (recall
that F0 = 0, F1 = 1, F2 = 1, F3 = 2, etc.).
For example: r(0) = 0, r(1) = 1, r(2) = 01, r(3) = 001, r(4) = 101,
r(5) = 0001, r(6) = 1001, r(7) = 0101.
Note that the usual positional Fibonacci representation of an integer m is
r(m)R , the reverse of r(m). Also note that Fibonacci coding used to encode an
integer m in a data stream is r(m)1, terminating with 11 to allow its decoding.

Question. Show that the sequence of first digits of Fibonacci representations


of natural numbers in increasing order is the infinite Fibonacci word when
letters are identified to digits: a to 0, b to 1.

Let pos (k,c), k > 0, denote the position of the kth occurrence of letter c in
the infinite Fibonacci word f.
Question. Show how to compute the position of the kth occurrence of letter
a in the Fibonacci word f in time O(log k). The same applies for the letter b.

[Hint: Show the following formulas: r(pos(k,a)) = 0 · r(k − 1) and


r(pos(k,b)) = 10 · r(k − 1).]
6 Fibonacci Words and Fibonacci Numeration System 27

Solution
To understand the structure of Fibonacci representations, let us consider the
rectangle Rn whose rows are representations of the first | fibn | = Fn+2 natural
numbers. Representations are possibly right padded with 0’s to get n digits.
The rectangles are given by the recurrence shown in the picture below.
0
0 0 0 .
Rn+1
0 0 1 0 0 .
0
R1 = R2 = 1 0 R3 = 0 1 0 Rn+2 =
1
0 1 0 0 1 0 1
1 0 1 Rn . .

Answer to the first question. Rows of rectangles R1 and R2 are representa-


tions of first | fib1 | and | fib2 | integers in increasing order respectively. Let us
show by recurrence it holds for Rn+2 , n > 0. Indeed, the first |fibn+1 | rows
of Rn+2 are representations padded with 0 of the first |fibn+1 | integers by the
recurrence hypothesis. The next |fibn | rows are representations of the form
x · 01 (they cannot end with 11). Since x is a row of Rn and using again the
recurrence hypothesis, the next rows represent the next |fibn | integers, which
shows that Rn+2 satisfies the property and ends the recurrence.
It is clear from the recurrence that the sequence of first digits (the first
column at the limit) corresponds to the infinite Fibonacci word. This answers
the first question.

Answer to the second question. The limit of tables Rn is the infinite table R∞
of Fibonacci representations of all consecutive natural numbers in increasing
order. In each row, letters to the right of the rightmost occurrence of 1 are
non-significant digits equal to zero.
Zeros in the first column of R∞ correspond to a’s in the Fibonacci word.
Rows starting with 0’s are of the form
0 · x0, 0 · x1, 0 · x2, . . . ,
where
x0, x1, x2, . . .
is the sequence of representations of consecutive natural numbers.
Hence the kth zero corresponds to xk−1 and occurs at position 0 · xk−1 ,
which gives r(pos(k,a)) = 0 · r(k − 1).
Similarly we get r(pos(k,b)) = 10 · r(k − 1), since all rows containing 1
in the first column of R∞ start in fact with 10.
28 Combinatorial Puzzles

Therefore, computing the kth occurrence of a letter in the Fibonacci word


amounts to computing the Fibonacci representation of an integer and doing the
inverse operation, both taking O(log k) time as expected.
0a 0 0 0 0 0 .
1b 1 0 0 0 0 .
2a 0 1 0 0 0 .
positions of the 3a 0 0 1 0 0 .
4b 1 0 1 0 0 .
5th occurrence of a: 5a 0 0 0 1 0 .
b 1 0 0 1 0 .
(0 · 101)F = 7 6
7a 0 1 0 1 0 .
8a 0 0 0 0 1 .
4th occurrence of b: 9b 1 0 0 0 1 .
(10 · 001)F = 9 10a 0 1 0 0 1 .
11a 0 0 1 0 1 .
12b 1 0 1 0 1 .
. . . . . . . .

Notes
The problem material is by Rytter [216].

7 Wythoff’s Game and Fibonacci Word

Wythoff’s game, a variant of the game of Nim, is a two-player game of strategy.


It is played with two piles of tokens, one being initially non-empty. Players take
turns removing either a positive number of tokens from one pile or the same
number of tokens from both piles. When there are no tokens left, the game
ends and the last player is the winner.
A configuration of the game is described by a pair of natural numbers (m,n),
m ≤ n, where m and n are the number of tokens on the two piles. Note that
(0,n) as well as (n,n), n > 0 are winning configurations. The smallest losing
configuration is (1,2) and all configurations of the form (m + 1,m + 2), (1,m)
and (2,m) for m > 0 are winning configurations.
7 Wythoff’s Game and Fibonacci Word 29

It is known that losing configurations follow a regular pattern determined


by the golden ratio. Thus we pose the following question.

Question. Is there any close relation between Wythoff’s game and the
infinite Fibonacci word?

Solution
Losing configurations in Wythoff’s game are closely related to the Fibonacci
word. Let WytLost denote the set of losing configurations. It contains pairs of
the form (m,n), 0 < m < n:

WytLost = {(1,2),(3,5),(4,7),(6,10),(8,13), . . .}.

Denoting by (mk ,nk ) the kth lexicographically smallest pair of the set we get

WytLost = {(m1,n1 ),(m2,n2 ),(m3,n3 ), . . .},

with m1 < m2 < m3 < · · · and n1 < n2 < n3 < · · ·


Let pos (k,c), k > 0, denote the position of the kth occurrence of the letter c
in the infinite Fibonacci word f. The following property relating f to Wythoff’s
game is stated as follows.

Fact 1. mk = pos(k,a) + 1 and nk = pos(k,b) + 1.


Let M = {m1,m2,m3, . . .} and N = {n1,n2,n3, . . .}. The following fact is
well known and not proved here.

Fact 2.
(i) M ∩ N = ∅ and M ∪ N = {1,2,3, . . .}.
(ii) nk = mk + k for every k > 0.
Fact 2 is used to derive Fact 1. It is enough to prove that both properties (i)
and (ii) hold for the sets M  = {pos(k,a) + 1 : k > 0} and N  = {pos(k,b) +
1 : k > 0}.
Property (i) obviously holds and property (ii) follows from the hint pre-
sented and proved in Problem 6:

r(pos(k,a)) = 0 · r(k − 1) and r(pos(k,b)) = 10 · r(k − 1),

where r(i) stands for the Fibonacci representation of the natural number i. To
show that pos(k,b) + 1 − pos(k,a) + 1 = k it is sufficient to prove that for
any Fibonacci representation x of a positive integer we have (10x)F −(0x)F =
(x)F +1, where (y)F denotes the number i for which r(i) = y. But this follows
directly from the definition of the Fibonacci representation and achieves the
proof.
Exploring the Variety of Random
Documents with Different Content
In some the merest skime.

I’m no’ like Burns, and weel I ken,


Tho’ ony wench can ser’,
It’s no’ through mony but through yin
That ony man wuns fer....

I weddit thee frae fause love, lass,


To free thee and to free mysel’;
But man and wumman tied for life
True can be and truth can tell.

Pit ony couple in a knot


They canna lowse and needna try,
And mair o’ love at last they’ll ken
—If ocht!—than joy’ll alane descry.

For them as for the beasts, my wife,


A’s fer frae dune when pleesure’s owre,
And coontless difficulties gar
Ilk hert discover a’ its power.

I dinna say that bairns alane


Are true love’s task—a sairer task
Is aiblins to create oorsels
As we can be—it’s that I ask.

Create oorsels, syne bairns, syne race.


Sae on the cod I see’t in you
Wi’ Maidenkirk to John o’ Groats
The bosom that you draw me to.

And nae Scot wi’ a wumman lies,


But I am he and ken as ’twere
A stage I’ve passed as he maun pass’t,
Gin he grows up, his way wi’ her!...

A’thing wi’ which a man


Can intromit’s a wumman,
And can, and s’ud, become
As intimate and human.

And Jean’s nae mair my wife


Than whisky is at times,
Or munelicht or a thistle
Or kittle thochts or rhymes.

He’s no’ a man ava’,


And lacks a proper pride,
Gin less than a’ the warld
Can ser’ him for a bride!...

Use, then, my lust for whisky and for thee,


Your function but to be and let me be
And see and let me see.

If in a lesser licht I grope my way,


Or use’t for ends that need your different ray
Whelm’t in superior day.

Then aye increase and ne’er withdraw your licht.


—Gin it shows either o’s in hideous plicht,
What gain to turn’t to nicht?

Whisky mak’s Heaven or Hell and whiles mells baith,


Disease is but the privy torch o’ Daith,
—But sex reveals life, faith!

I need them a’ and maun be aye at strife.


Daith and ayont are nocht but pairts o’ life.
—Then be life’s licht, my wife!...

Love often wuns free


In lust to be strangled,
Or love, o’ lust free,
In law’s sairly tangled.
And it’s ill to tell whether
Law or lust is to blame
When love’s chokit up
—It comes a’ to the same.

In this sorry growth


Whatna beauty is tint
That freed o’t micht find
A waur fate than is in’t?...

Yank oot your orra boughs, my hert!

God gied man speech and speech created thocht,


He gied man speech but to the Scots gied nocht
Barrin’ this clytach that they’ve never brocht
To onything but sic a Blottie O
As some bairn’s copybook micht show,

A spook o’ soond that frae the unkent grave


In which oor nation lies loups up to wave
Sic leprous chuns as tatties have
That cellar-boond send spindles gropin’
Towards ony hole that’s open,

Like waesome fingers in the dark that think


They still may widen the ane and only chink
That e’er has gi’en mankind a blink
O’ Hope—tho’ ev’n in that puir licht
They s’ud ha’e seen their hopeless plicht.

This puir relation o’ my topplin’ mood,


This country cousin, streak o’ churl-bluid,
This hopeless airgh ’twixt a’ we can and should,
This Past that like Astarte’s sting I feel,
This arrow in Achilles’ heel.

Yank oot your orra boughs, my hert!


Mebbe we’re in a vicious circle cast,
Mebbe there’s limits we can ne’er get past,
Mebbe we’re sentrices that at the last
Are flung aside, and no’ the pillars and props
O’ Heaven foraye as in oor hopes.

Oor growth at least nae steady progress shows,


Genius in mankind like an antrin rose
Abune a jungly waste o’ effort grows,
But to Man’s purpose it mak’s little odds,
And seems irrelevant to God’s....

Eneuch? Then here you are. Here’s the haill story.


Life’s connached shapes too’er up in croons o’ glory,
Perpetuatin’, natheless, in their gory
Colour the endless sacrifice and pain
That to their makin’s gane.

The roses like the saints in Heaven treid


Triumphant owre the agonies o’ their breed,
And wag fu’ mony a celestial heid
Abune the thorter-ills o’ leaf and prick
In which they ken the feck maun stick.

Yank oot your orra boughs, my hert!

A mongrel growth, jumble o’ disproportions,


Whirlin’ in its incredible contortions,
Or wad-be client that an auld whore shuns,
Wardin’ her wizened orange o’ a bosom
Frae importunities sae gruesome,

Or new diversion o’ the hormones


Mair fond o’ procreation than the Mormons,
And fetchin’ like a devastatin’ storm on’s
A’ the uncouth dilemmas o’ oor natur’
Objectified in vegetable maitter.
Yank oot your orra boughs, my hert!

And heed nae mair the foolish cries that beg


You slice nae mair to aff or pu’ to leg,
You skitin’ duffer that gar’s a’body fleg,
—What tho’ you ding the haill warld oot o’ joint
Wi’ a skier to cover-point!

Yank oot your orra boughs, my hert!

There was a danger—and it’s weel I see’t—


Had brocht ye like Mallarmé to defeat:—
“Mon doute, amas de nuit ancienne s’achève
En maint rameau subtil, qui, demeuré les vrais
Bois même, prouve, hélas! que bien seul je m’offrais
Pour triomphe le faute idéale des roses.”[8]

Yank oot your orra boughs, my hert!...

I love to muse upon the skill that gangs


To mak’ the simplest thing that Earth displays,
The eident life that ilka atom thrangs,
And uses it in the appointit ways,
And a’ the endless brain that nocht escapes
That myriad moves them to inimitable shapes.

Nor to their customed form nor ony ither


New to Creation, by man’s cleverest mind,
A’ needfu’ particles first brocht thegither,
Could they wi’ timeless labour be combined.
There’s nocht that Science yet’s begood to see
In hauf its deemless detail or its destiny.

Oor een gi’e answers based on pairt-seen facts


That beg a’ questions, to ebb minds’ content,
But hoo a’e feature or the neist attracts,
Wi’ millions mair unseen, wha kens what’s meant
By human brains and to what ends may tell
—For naething’s seen or kent that’s near a thing itsel’!

Let whasae vaunts his knowledge then and syne


Sets up a God and kens His purpose tae
Tell me what’s gart a’e strain o’ maitter twine
In sic an extraordinary way,
And what God’s purpose wi’ the Thistle is
—I’ll aiblins ken what he and his God’s worth by this.

I’ve watched it lang and hard until I ha’e


A certain symp’thy wi’ its orra ways
And pride in its success, as weel I may,
In growin’ exactly as its instinct says,
Save in sae fer as thwarts o’ weather or grun’
Or man or ither foes ha’e’ts aims perchance fordone.

But I can form nae notion o’ the spirit


That gars it tak’ the difficult shape it does,
Nor judge the merit yet or the demerit
O’ this detail or that sae fer as it goes
T’ advance the cause that gied it sic a guise
As maun ha’e pleased its Maker wi’ a gey surprise.

The craft that hit upon the reishlin’ stalk,


Wi’ts gausty leafs and a’ its datchie jags,
And spired it syne in seely flooers to brak
Like sudden lauchter owre its fousome rags
Jouks me, sardonic lover, in the routh
O’ contrairies that jostle in this dumfoondrin’ growth.

What strength ’t’ud need to pit its roses oot,


Or double them in number or in size,
He canna tell wha canna plumb the root,
And learn what’s gar’t its present state arise,
And what the limits are that ha’e been put
To change in thistles, and why—and what a change ’ud boot....

I saw a rose come loupin’ oot[9]


Frae a camsteerie plant.
O wha’d ha’e thocht yon puir stock had
Sic an inhabitant?

For centuries it ran to waste,


Wi’ pin-heid flooers at times.
O’ts hidden hert o’ beauty they
Were but the merest skimes.

Yet while it ran to wud and thorns,


The feckless growth was seekin’
Some airt to cheenge its life until
A’ in a rose was beekin’.

“Is there nae way in which my life


Can mair to flooerin’ come,
And bring its waste on shank and jags
Doon to a minimum?

“It’s hard to struggle as I maun


For scrunts o’ blooms like mine,
While blossom covers ither plants
As by a knack divine.

“What hinders me unless I lack


Some needfu’ discipline?
—I wis I’ll bring my orra life
To beauty or I’m din!”

Sae ran the thocht that hid ahint


The thistle’s ugsome guise,
“I’ll brak’ the habit o’ my life
A worthier to devise.”

“My nobler instincts sall nae mair


This contrair shape be gi’en.
I sall nae mair consent to live
A life no’ fit to be seen.”
Sae ran the thocht that hid ahint
The thistle’s ugsome guise,
Till a’ at aince a rose loupt out
—I watched it wi’ surprise.

A rose loupt oot and grew, until


It was ten times the size
O’ ony rose the thistle afore
Had heistit to the skies.

And still it grew till a’ the buss


Was hidden in its flame.
I never saw sae braw a floo’er
As yon thrawn stock became.

And still it grew until it seemed


The haill braid earth had turned
A reid reid rose that in the lift
Like a ball o’ fire burned.

The waefu’ clay was fire aince mair,


As Earth had been resumed
Into God’s mind, frae which sae lang
To grugous state ’twas doomed.

Syne the rose shrivelled suddenly


As a balloon is burst;
The thistle was a ghaistly stick,
As gin it had been curst.

Was it the ancient vicious sway


Imposed itsel’ again,
Or nerve owre weak for new emprise
That made the effort vain,

A coward strain in that lorn growth


That wrocht the sorry trick?
—The thistle like a rocket soared
And cam’ doon like the stick.

Like grieshuckle the roses glint,


The leafs like farles hing,
As roond a hopeless sacrifice
Earth draws its barren ring.

The dream o’ beauty’s dernin’ yet


Ahint the ugsome shape.
—Vain dream that in a pinheid here
And there can e’er escape!

The vices that defeat the dream


Are in the plant itsel’,
And till they’re purged its virtues maun
In pain and misery dwell.

Let Deils rejoice to see the waste,


The fond hope brocht to nocht.
The thistle in their een is as
A favourite lust they’re wrocht.

The orderin’ o’ the thistle means


Nae richtin’ o’t to them.
Its loss they ca’ a law, its thorns
A fule’s fit diadem.

And still the idiot nails itsel’


To its ain crucifix,
While here a rose and there a rose
Jaups oot abune the pricks.

Like connoisseurs the Deils gang roond


And praise its attitude,
Till on the Cross the silly Christ
To fidge fu’ fain’s begood!

Like connoisseurs the Deils gang roond


Wi’ ready platitude.
It’s no’ sae dear as vinegar,
And every bit as good!

The bitter taste is on my tongue,


I chowl my chafts, and pray
“Let God forsake me noo and no’
Staund connoisseur-like tae!”...

The language that but sparely flooers


And maistly gangs to weed;
The thocht o’ Christ and Calvary
Aye liddenin’ in my heid;
And a’ the dour provincial thocht
That merks the Scottish breed
—These are the thistle’s characters,
To argie there’s nae need.
Hoo weel my verse embodies
The thistle you can read!
—But will a Scotsman never
Frae this vile growth be freed?...

O ilka man alive is like


A quart that’s squeezed into a pint
(A maist unScottish-like affair!)
Or like the little maid that showed
Me into a still sma’er room.

What use to let a sunrise fade


To ha’e anither like’t the morn,
Or let a generation pass
That ane nae better may succeed,
Or wi’ a’ Time’s machinery
Keep naething new aneth the sun,
Or change things oot o’ kennin’ that
They may be a’ the mair the same?

The thistle in the wund dissolves


In lichtnin’s as shook foil gi’es way
In sudden splendours, or the flesh
At Daith lets slip the infinite soul;
And syne it’s like a sunrise tint
In grey o’ day, or love and life,
That in a cloody blash o’ sperm
Undae the warld to big’t again,
Or like a pickled foetus that
Nae man feels ocht in common wi’
—But micht as easily ha’ been!
Or like a corpse a soul set free
Scunners to think it tenanted
—And little recks that but for it
It never micht ha’ been at a’,
Like love frae lust and God frae man!

The wasted seam that dries like stairch


And pooders aff, that micht ha’ been
A warld o’ men and syne o’ Gods;
The grey that haunts the vievest green;
The wrang side o’ the noblest scene
We ne’er can whummle to oor een,
As ’twere the hinderpairts o’ God
His face aye turned the opposite road,
Or’s neth the flooers the drumlie clods
Frae which they come at sicna odds,
As a’ Earth’s magic frae a spirt,
In shame and secrecy, o’ dirt!

Then shak’ nae mair in silly life,


Nor stand impossible as Daith,
Incredible as a’thing is
Inside or oot owre closely scanned.
As mithers aften think the warld
O’ bairns that ha’e nae end or object,
Or lovers think their sweethearts made
Yince-yirn—wha haena waled the lave,
Maikless—when they are naebody,
Or men o’ ilka sort and kind
Are prood o’ thochts they ca’ their ain,
That nameless millions had afore
And nameless millions yet’ll ha’e,
And that were never worth the ha’en,
Or Cruivie’s “latest” story or
Gilsanquhar’s vows to sign the pledge,
Or’s if I thocht maist whisky was,
Or failed to coont the cheenge I got,
Sae wad I be gin I rejoiced,
Or didna ken my place, in thee.

O stranglin’ rictus, sterile spasm,


Thou stricture in the groins o’ licht,
Thou ootrie gangrel frae the wilds
O’ chaos fenced frae Eden yet
By the unsplinterable wa’
O’ munebeams like a bleeze o’ swords!
Nae chance lunge cuts the Gordian knot,
Nor sall the belly find relief
In wha’s entangled moniplies
Creation like a stoppage jams,
Or in whose loins the mapamound
Runkles in strawns o’ bubos whaur
The generations gravel.
The soond o’ water winnin’ free,
The sicht o’ licht that braks the rouk,
The thocht o’ every thwart owrecome
Are in my ears and een and brain,
In whom the bluid is spilt in stour,
In whom a’ licht in darkness fails,
In whom the mystery o’ life
Is to a wretched weed bewrayed.

But let my soul increase in me,


God dwarfed to enter my puir thocht
Expand to his true size again,
And protoplasm’s look befit
The nature o’ its destiny,
And seed and sequence be nae mair
Incongruous to ane anither,
And liquor packed impossibly
Mak’ pint-pot an eternal well,
And art be relevant to life,
And poets mair than dominies yet,
And ends nae langer tint in means,
Nor forests hidden by their trees,
Nor men be sacrificed alive
In foonds o’ fates designed for them,
Nor mansions o’ the soul stand toom
Their owners in their cellars trapped,
Nor a’ a people’s genius be
A rumple-fyke in Heaven’s doup,
While Calvinism uses her
To breed a minister or twa!

A black leaf owre a white leaf twirls,


A grey leaf flauchters in atween,
Sae ply my thochts aboot the stem
O’ loppert slime frae which they spring.
The thistle like a snawstorm drives,
Or like a flicht o’ swallows lifts,
Or like a swarm o’ midges hings,
A plague o’ moths, a starry sky,
But’s naething but a thistle yet,
And still the puzzle stands unsolved.
Beauty and ugliness alike,
And life and daith and God and man,
Are aspects o’t but nane can tell
The secret that I’d fain find oot
O’ this bricht hive, this sorry weed,
The tree that fills the universe,
Or like a reistit herrin’ crines.
Gin I was sober I micht think
It was like something drunk men see!

The necromancy in my bluid


Through a’ the gamut cheenges me
O’ dwarf and giant, foul and fair,
But winna let me be mysel’
—My mither’s womb that reins me still
Until I tae can prick the witch
And “Wumman” cry wi’ Christ at last,
“Then what hast thou to do wi’ me?”

The tug-o’-war is in me still,


The dog-hank o’ the flesh and soul,
Faither in Heaven, what gar’d ye tak’
A village slut to mither me,
Your mongrel o’ the fire and clay?
The trollop and the Deity share
My writhen form as tho’ I were
A picture o’ the time they had
When Licht rejoiced to file itsel’
And Earth upshuddered like a star.

A drucken hizzie gane to bed


Wi’ three-in-ane and ane-in-three.

O fain I’d drink until I saw


Scotland a ferlie o’ delicht,
And fain bide drunk nor ha’e’t recede
Into a shrivelled thistle syne,
As when a sperklin’ tide rins oot,
And leaves a wreath o’ rubbish there!

Wull a’ the seas gang dry at last


(As dry as I am gettin’ noo),
Or wull they aye come back again,
Seilfu’ as my neist drink to me,
Or as the sunlicht to the mune,
Or as the bonny sangs o’ men,
Wha’re but puir craturs in themsels,
And save when genius mak’s them drunk,
As donnert as their audiences,
—As dreams that mak’ a tramp a king,
A madman sane to his ain mind,
Or what a Scotsman thinks himsel’,
Tho’ naethin’ but a thistle kyths.

The mair I drink the thirstier yet,


And whiles when I’m alowe wi’ booze,
I’m like God’s sel’ and clad in fire,
And ha’e a Pentecost like this.
O wad that I could aye be fou’,
And no’ come back as aye I maun
To naething but a fule that nane
’Ud credit wi’ sic thochts as thae,
A fule that kens they’re empty dreams!

Yet but fer drink and drink’s effects,


The yeast o’ God that barms in us,
We micht as weel no’ be alive.
It maitters not what drink is ta’en,
The barley bree, ambition, love,
Or Guid or Evil workin’ in’s,
Sae lang’s we feel like souls set free
Frae mortal coils and speak in tongues
We dinna ken and never wull,
And find a merit in oorsels,
In Cruivies and Gilsanquhars tae,
And see the thistle as ocht but that!

For wha o’s ha’e the thistle’s poo’er


To see we’re worthless and believe ’t?

A’thing that ony man can be’s


A mockery o’ his soul at last.
The mair it shows’t the better, and
I’d suner be a tramp than king,
Lest in the pride o’ place and poo’er
I e’er forgot my waesomeness.
Sae to debauchery and dirt,
And to disease and daith I turn,
Sin’ otherwise my seemin’ worth
’Ud block my view o’ what is what,
And blin’ me to the irony
O’ bein’ a grocer ’neth the sun,
A lawyer gin Justice ope’d her een,
A pedant like an ant promoted,
A parson buttonholin’ God,
Or ony cratur o’ the Earth
Sma’-bookt to John Smith, High Street, Perth,
Or sic like vulgar gaffe o’ life
Sub speciem aeternitatis—
Nae void can fleg me hauf as much
As bein’ mysel’, whate’er I am,
Or, waur, bein’ onybody else.

The nervous thistle’s shiverin’ like


A horse’s skin aneth a cleg,
Or Northern Lichts or lustres o’
A soul that Daith has fastened on,
Or mornin’ efter the nicht afore.

Shudderin’ thistle, gi’e owre, gi’e owre....

Grey sand is churnin’ in my lugs


The munelicht flets, and gantin’ there
The grave o’ a’ mankind’s laid bare
—On Hell itsel’ the drawback rugs!

Nae man can ken his hert until


The tide o’ life uncovers it,
And horror-struck he sees a pit
Returnin’ life can never fill!...
Thou art the facts in ilka airt
That breenge into infinity,
Criss-crossed wi’ coontless ither facts
Nae man can follow, and o’ which
He is himsel’ a helpless pairt,
Held in their tangle as he were
A stick-nest in Ygdrasil!

The less man sees the mair he is


Content wi’t, but the mair he sees
The mair he kens hoo little o’
A’ that there is he’ll ever see,
And hoo it mak’s confusion aye
The waur confoondit till at last
His brain inside his heid is like
Ariadne wi’ an empty pirn,
Or like a birlin’ reel frae which
A whale has rived the line awa’.

What better’s a forhooied nest


Than skasloch scattered owre the grun’?

O hard it is for man to ken


He’s no’ creation’s goal nor yet
A benefitter by’t at last—
A means to ends he’ll never ken,
And as to michtier elements
The slauchtered brutes he eats to him
Or forms o’ life owre sma’ to see
Wi’ which his heedless body swarms,
And a’ man’s thocht nae mair to them
Than ony moosewob to a man,
His Heaven to them the blinterin’ o’
A snail-trail on their closet wa’!

For what’s an atom o’ a twig


That tak’s a billion to an inch
To a’ the routh o’ shoots that mak’
The bygrowth o’ the Earth aboot
The michty trunk o’ Space that spreids
Ramel o’ licht that ha’e nae end,
—The trunk wi’ centuries for rings,
Comets for fruit, November shooers
For leafs that in its Autumns fa’
—And Man at maist o’ sic a twig
Ane o’ the coontless atoms is!

My sinnens and my veins are but


As muckle o’ a single shoot
Wha’s fibre I can ne’er unwaft
O’ my wife’s flesh and mither’s flesh
And a’ the flesh o’ humankind,
And revelled thrums o’ beasts and plants
As gangs to mak’ twixt birth and daith
A’e sliver for a microscope;
And a’ the life o’ Earth to be
Can never lift frae underneath
The shank o’ which oor destiny’s pairt
As heich’s to stand forenenst the trunk
Stupendous as a windlestrae!

I’m under nae delusions, fegs!


The whuppin’ sooker at wha’s tip
Oor little point o’ view appears,
A midget coom o’ continents
Wi’ blebs o’ oceans set, sends up
The braith o’ daith as weel as life,
And we maun braird anither tip
Oot owre us ere we wither tae,
And join the sentrice skeleton
As coral insects big their reefs.

What is the tree? As fer as Man’s


Concerned it disna maitter
Gin but a giant thistle ’tis
That spreids eternal mischief there,
As I’m inclined to think.
Ruthless it sends its solid growth
Through mair than he can e’er conceive,
And braks his warlds abreid and rives
His Heavens to tatters on its horns.

The nature or the purpose o’t


He needna fash to spier, for he
Is destined to be sune owre grown
And hidden wi’ the parent wud
The spreidin’ boughs in darkness hap,
And a’ its future life’ll be
Ootwith’m as he’s ootwith his banes.

Juist as man’s skeleton has left


Its ancient ape-like shape ahint,
Sae states o’ mind in turn gi’e way
To different states, and quickly seem
Impossible to later men,
And Man’s mind in its final shape,
Or lang’ll seem a monkey’s spook,
And, strewth, to me the vera thocht
O’ Thocht already’s fell like that!
Yet still the cracklin’ thorns persist
In fitba’ match and peepy show,
To antic hay a dog-fecht’s mair
Than Jacob v. the Angel,
And through a cylinder o’ wombs,
A star reflected in a dub,
I see as ’twere my ain wild harns
The ripple o’ Eve’s moniplies.

And faith! yestreen in Cruivie’s een


Life rocked at midnicht in a tree,
And in Gilsanquhar’s glower I saw
The taps o’ waves ’neth which the warld
Ga’ed rowin’ like a jeelyfish,
And whiles I canna look at Jean
For fear I’d see the sunlicht turn
Worm-like into the glaur again!

A black leaf owre a white leaf twirls,


My liver’s shadow on my soul,
And clots o’ bluid loup oot frae stems
That back into the jungle rin,
Or in the waters underneath
Kelter like seaweed, while I hear
Abune the thunder o’ the flood,
The voice that aince commanded licht
Sing ‘Scots Wha Ha’e’ and hyne awa’
Like Cruivie up a different glen,
And leave me like a mixture o’
A wee Scotch nicht and Judgment Day,
The bile, the Bible, and the Scotsman,
Poetry and pigs—Infernal Thistle,
Damnition haggis I’ve spewed up,
And syne return to like twa dogs!
Blin’ Proteus wi’ leafs or hands
Or flippers ditherin’ in the lift
—Thou Samson in a warld that has
Nae pillars but your cheengin’ shapes
That dung doon, rise in ither airts
Like windblawn reek frae smoo’drin’ ess!
—Hoo lang maun I gi’e aff your forms
O’ plants and beasts and men and Gods
And like a doited Atlas bear
This steeple o’ fish, this eemis warld,
Or, maniac heid wi’ snakes for hair,
A Maenad, ape Aphrodite,
And scunner the Eternal sea?

Man needna fash and even noo


The cells that mak’ a’e sliver wi’m,
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like