100% found this document useful (3 votes)
16 views

Pattern Discovery in Bioinformatics Theory Algorithms 1st Edition Laxmi Parida - Download the ebook in PDF with all chapters to read anytime

The document promotes the ebook 'Pattern Discovery in Bioinformatics: Theory & Algorithms' by Laxmi Parida, available for download along with other related titles on ebookultra.com. It outlines the aims of the Chapman & Hall/CRC Mathematical and Computational Biology Series, which focuses on integrating mathematical and computational methods into biology. The book covers various topics in bioinformatics, including algorithmics, statistics, and pattern recognition.

Uploaded by

lehueylighti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
16 views

Pattern Discovery in Bioinformatics Theory Algorithms 1st Edition Laxmi Parida - Download the ebook in PDF with all chapters to read anytime

The document promotes the ebook 'Pattern Discovery in Bioinformatics: Theory & Algorithms' by Laxmi Parida, available for download along with other related titles on ebookultra.com. It outlines the aims of the Chapman & Hall/CRC Mathematical and Computational Biology Series, which focuses on integrating mathematical and computational methods into biology. The book covers various topics in bioinformatics, including algorithmics, statistics, and pattern recognition.

Uploaded by

lehueylighti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Visit https://ebookultra.

com to download the full version and


explore more ebooks or textbooks

Pattern Discovery in Bioinformatics Theory


Algorithms 1st Edition Laxmi Parida

_____ Click the link below to download _____


https://ebookultra.com/download/pattern-discovery-in-
bioinformatics-theory-algorithms-1st-edition-laxmi-parida/

Explore and download more ebooks or textbooks at ebookultra.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Clustering in Bioinformatics and Drug Discovery 1st


Edition John David Maccuish (Author)

https://ebookultra.com/download/clustering-in-bioinformatics-and-drug-
discovery-1st-edition-john-david-maccuish-author/

Bioinformatics and Drug Discovery 2nd Edition Kubilay


Demir

https://ebookultra.com/download/bioinformatics-and-drug-discovery-2nd-
edition-kubilay-demir/

Applied Pattern Recognition Algorithms and Implementation


in C 4th Edition Dietrich W. R. Paulus

https://ebookultra.com/download/applied-pattern-recognition-
algorithms-and-implementation-in-c-4th-edition-dietrich-w-r-paulus/

Emerging trends in computational biology bioinformatics


and systems biology algorithms and software tools 1st
Edition Arabnia
https://ebookultra.com/download/emerging-trends-in-computational-
biology-bioinformatics-and-systems-biology-algorithms-and-software-
tools-1st-edition-arabnia/
Sparse Modeling Theory Algorithms and Applications 1st
Edition Irina Rish

https://ebookultra.com/download/sparse-modeling-theory-algorithms-and-
applications-1st-edition-irina-rish/

Channel Coding Theory Algorithms and Applications 1st


Edition David Declercq (Editor)

https://ebookultra.com/download/channel-coding-theory-algorithms-and-
applications-1st-edition-david-declercq-editor/

Essays in Bioinformatics 1st Edition S. Jelaska

https://ebookultra.com/download/essays-in-bioinformatics-1st-edition-
s-jelaska/

Bioinformatics in MicroRNA Research 1st Edition Coll.

https://ebookultra.com/download/bioinformatics-in-microrna-
research-1st-edition-coll/

Fundamentals of wavelets Theory algorithms and


applications 2ed Edition Goswami J.

https://ebookultra.com/download/fundamentals-of-wavelets-theory-
algorithms-and-applications-2ed-edition-goswami-j/
Pattern Discovery in Bioinformatics Theory Algorithms
1st Edition Laxmi Parida Digital Instant Download
Author(s): Laxmi Parida
ISBN(s): 9781584885498, 1584885491
Edition: 1
File Details: PDF, 3.83 MB
Year: 2007
Language: english
Chapman & Hall/CRC Mathematical and Computational Biology Series

Pattern Discovery
in Bioinformatics
Theory & Algorithms

© 2008 by Taylor & Francis Group, LLC

C5491_FM.indd 1 5/18/07 11:20:04 AM


CHAPMAN & HALL/CRC
Mathematical and Computational Biology Series

Aims and scope:


This series aims to capture new developments and summarize what is known over the whole
spectrum of mathematical and computational biology and medicine. It seeks to encourage the
integration of mathematical, statistical and computational methods into biology by publishing
a broad range of textbooks, reference works and handbooks. The titles included in the series are
meant to appeal to students, researchers and professionals in the mathematical, statistical and
computational sciences, fundamental biology and bioengineering, as well as interdisciplinary
researchers involved in the field. The inclusion of concrete examples and applications, and
programming techniques and examples, is highly encouraged.

Series Editors
Alison M. Etheridge
Department of Statistics
University of Oxford

Louis J. Gross
Department of Ecology and Evolutionary Biology
University of Tennessee

Suzanne Lenhart
Department of Mathematics
University of Tennessee

Philip K. Maini
Mathematical Institute
University of Oxford

Shoba Ranganathan
Research Institute of Biotechnology
Macquarie University

Hershel M. Safer
Weizmann Institute of Science
Bioinformatics & Bio Computing

Eberhard O. Voit
The Wallace H. Couter Department of Biomedical Engineering
Georgia Tech and Emory University

Proposals for the series should be submitted to one of the series editors above or directly to:
CRC Press, Taylor & Francis Group
24-25 Blades Court
Deodar Road
London SW15 2NU
UK

© 2008 by Taylor & Francis Group, LLC

C5491_FM.indd 3 5/18/07 11:20:04 AM


Published Titles

Cancer Modelling and Simulation


Luigi Preziosi
Computational Biology: A Statistical Mechanics Perspective
Ralf Blossey
Computational Neuroscience: A Comprehensive Approach
Jianfeng Feng
Data Analysis Tools for DNA Microarrays
Sorin Draghici
Differential Equations and Mathematical Biology
D.S. Jones and B.D. Sleeman
Exactly Solvable Models of Biological Invasion
Sergei V. Petrovskii and Bai-Lian Li
Introduction to Bioinformatics
Anna Tramontano
An Introduction to Systems Biology: Design Principles of Biological Circuits
Uri Alon
Knowledge Discovery in Proteomics
Igor Jurisica and Dennis Wigle
Modeling and Simulation of Capsules and Biological Cells
C. Pozrikidis
Niche Modeling: Predictions from Statistical Distributions
David Stockwell
Normal Mode Analysis: Theory and Applications to Biological and
Chemical Systems
Qiang Cui and Ivet Bahar
Pattern Discovery in Bioinformatics: Theory & Algorithms
Laxmi Parida
Stochastic Modelling for Systems Biology
Darren J. Wilkinson
The Ten Most Wanted Solutions in Protein Bioinformatics
Anna Tramontano

© 2008 by Taylor & Francis Group, LLC

C5491_FM.indd 4 5/18/07 11:20:04 AM


Chapman & Hall/CRC Mathematical and Computational Biology Series

Pattern Discovery
in Bioinformatics
Theory & Algorithms

Laxmi Parida

Boca Raton London New York

Chapman & Hall/CRC is an imprint of the


Taylor & Francis Group, an informa business

© 2008 by Taylor & Francis Group, LLC

C5491_FM.indd 5 5/18/07 11:20:04 AM


Chapman & Hall/CRC
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487‑2742
© 2008 by Taylor & Francis Group, LLC
Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Printed in the United States of America on acid‑free paper
10 9 8 7 6 5 4 3 2 1

International Standard Book Number‑13: 978‑1‑58488‑549‑8 (Hardcover)

This book contains information obtained from authentic and highly regarded sources. Reprinted
material is quoted with permission, and sources are indicated. A wide variety of references are
listed. Reasonable efforts have been made to publish reliable data and information, but the author
and the publisher cannot assume responsibility for the validity of all materials or for the conse‑
quences of their use.

No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any
electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC)
222 Rosewood Drive, Danvers, MA 01923, 978‑750‑8400. CCC is a not‑for‑profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Parida, Laxmi.
Pattern discovery in bioinformatics / Laxmi Parida.
p. ; cm. ‑‑ (Chapman & Hall/CRC mathematical and computational biology
series)
Includes bibliographical references and index.
ISBN‑13: 978‑1‑58488‑549‑8 (alk. paper)
ISBN‑10: 1‑58488‑549‑1 (alk. paper)
1. Bioinformatics. 2. Pattern recognition systems. I. Title. II. Series: Chapman
and Hall/CRC mathematical & computational biology series.
[DNLM: 1. Computational Biology‑‑methods. 2. Pattern Recognition,
Automated. QU 26.5 P231p 2008]

QH324.2.P373 2008
572.80285‑‑dc22 2007014582

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

© 2008 by Taylor & Francis Group, LLC

C5491_FM.indd 6 5/18/07 11:20:04 AM


Dedicated to Ma and Bapa

© 2008 by Taylor & Francis Group, LLC


Contents

1 Introduction 1
1.1 Ubiquity of Patterns . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivations from Biology . . . . . . . . . . . . . . . . . . . 2
1.3 The Need for Rigor . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Who is a Reader of this Book? . . . . . . . . . . . . . . . . 3
1.4.1 About this book . . . . . . . . . . . . . . . . . . . . 4

I The Fundamentals 7
2 Basic Algorithmics 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Tree Problem 1: Minimum Spanning Tree . . . . . . . . . . 14
2.3.1 Prim’s algorithm . . . . . . . . . . . . . . . . . . . 17
2.4 Tree Problem 2: Steiner Tree . . . . . . . . . . . . . . . . . 21
2.5 Tree Problem 3: Minimum Mutation Labeling . . . . . . . 22
2.5.1 Fitch’s algorithm . . . . . . . . . . . . . . . . . . . 23
2.6 Storing & Retrieving Elements . . . . . . . . . . . . . . . . 27
2.7 Asymptotic Functions . . . . . . . . . . . . . . . . . . . . . 30
2.8 Recurrence Equations . . . . . . . . . . . . . . . . . . . . . 32
2.8.1 Counting binary trees . . . . . . . . . . . . . . . . . 34
2.8.2 Enumerating unrooted trees (Prüfer sequence) . . . 36
2.9 NP-Complete Class of Problems . . . . . . . . . . . . . . . 40
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Basic Statistics 47
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Basic Probability . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Probability space foundations . . . . . . . . . . . . 48
3.2.2 Multiple events (Bayes’ theorem) . . . . . . . . . . 50
3.2.3 Inclusion-exclusion principle . . . . . . . . . . . . . 51
3.2.4 Discrete probability space . . . . . . . . . . . . . . 54
3.2.5 Algebra of random variables . . . . . . . . . . . . . 57
3.2.6 Expectations . . . . . . . . . . . . . . . . . . . . . . 58
3.2.7 Discrete probability distribution (binomial, Poisson) 60
3.2.8 Continuous probability distribution (normal) . . . . 64
3.2.9 Continuous probability space (Ω is R) . . . . . . . 66

© 2008 by Taylor & Francis Group, LLC


3.3 The Bare Truth about Inferential Statistics . . . . . . . . . 69
3.3.1 Probability distribution invariants . . . . . . . . . . 70
3.3.2 Samples & summary statistics . . . . . . . . . . . . 72
3.3.3 The central limit theorem . . . . . . . . . . . . . . 77
3.3.4 Statistical significance (p-value) . . . . . . . . . . . 80
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 What Are Patterns? 89


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Common Thread . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Pattern Duality . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.1 Operators on p . . . . . . . . . . . . . . . . . . . . 92
4.4 Irredundant Patterns . . . . . . . . . . . . . . . . . . . . . 92
4.4.1 Special case: maximality . . . . . . . . . . . . . . . 93
4.4.2 Transitivity of redundancy . . . . . . . . . . . . . . 95
4.4.3 Uniqueness property . . . . . . . . . . . . . . . . . 95
4.4.4 Case studies . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Constrained Patterns . . . . . . . . . . . . . . . . . . . . . 99
4.6 When is a Pattern Specification Nontrivial? . . . . . . . . . 99
4.7 Classes of Patterns . . . . . . . . . . . . . . . . . . . . . . . 100
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

II Patterns on Linear Strings 111

5 Modeling the Stream of Life 113


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Modeling a Biopolymer . . . . . . . . . . . . . . . . . . . . 113
5.2.1 Repeats in DNA . . . . . . . . . . . . . . . . . . . . 114
5.2.2 Directionality of biopolymers . . . . . . . . . . . . . 115
5.2.3 Modeling a random permutation . . . . . . . . . . . 117
5.2.4 Modeling a random string . . . . . . . . . . . . . . 119
5.3 Bernoulli Scheme . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4.1 Stationary distribution . . . . . . . . . . . . . . . . 123
5.4.2 Computing probabilities . . . . . . . . . . . . . . . 127
5.5 Hidden Markov Model (HMM) . . . . . . . . . . . . . . . . 128
5.5.1 The decoding problem (Viterbi algorithm) . . . . . 130
5.6 Comparison of the Schemes . . . . . . . . . . . . . . . . . . 133
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

© 2008 by Taylor & Francis Group, LLC


6 String Pattern Specifications 139
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3 Solid Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.3.1 Maximality . . . . . . . . . . . . . . . . . . . . . . . 144
6.3.2 Counting the maximal patterns . . . . . . . . . . . 144
6.4 Rigid Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.4.1 Maximal rigid patterns . . . . . . . . . . . . . . . . 150
6.4.2 Enumerating maximal rigid patterns . . . . . . . . 152
6.4.3 Density-constrained patterns . . . . . . . . . . . . . 156
6.4.4 Quorum-constrained patterns . . . . . . . . . . . . 157
6.4.5 Large-|Σ| input . . . . . . . . . . . . . . . . . . . . 158
6.4.6 Irredundant patterns . . . . . . . . . . . . . . . . . 160
6.5 Extensible Patterns . . . . . . . . . . . . . . . . . . . . . . 164
6.5.1 Maximal extensible patterns . . . . . . . . . . . . . 165
6.6 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.6.1 Homologous sets . . . . . . . . . . . . . . . . . . . . 165
6.6.2 Sequence on reals . . . . . . . . . . . . . . . . . . . 167
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7 Algorithms & Pattern Statistics 183


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.2 Discovery Algorithm . . . . . . . . . . . . . . . . . . . . . . 183
7.3 Pattern Statistics . . . . . . . . . . . . . . . . . . . . . . . . 191
7.4 Rigid Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.5 Extensible Patterns . . . . . . . . . . . . . . . . . . . . . . 193
7.5.1 Nondegenerate extensible patterns . . . . . . . . . . 194
7.5.2 Degenerate extensible patterns . . . . . . . . . . . . 196
7.5.3 Correction factor for the dot character . . . . . . . 197
7.6 Measure of Surprise . . . . . . . . . . . . . . . . . . . . . . 198
7.6.1 z-score . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.6.2 χ-square ratio . . . . . . . . . . . . . . . . . . . . . 199
7.6.3 Interplay of combinatorics & statistics . . . . . . . 200
7.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8 Motif Learning 213


8.1 Introduction: Local Multiple Alignment . . . . . . . . . . . 213
8.2 Probabilistic Model: Motif Profile . . . . . . . . . . . . . . 214
8.3 The Learning Problem . . . . . . . . . . . . . . . . . . . . . 215
8.4 Importance Measure . . . . . . . . . . . . . . . . . . . . . . 216
8.4.1 Statistical significance . . . . . . . . . . . . . . . . . 216
8.4.2 Information content . . . . . . . . . . . . . . . . . . 219
8.5 Algorithms to Learn a Motif Profile . . . . . . . . . . . . . 220
8.6 An Expectation Maximization Framework . . . . . . . . . . 222

© 2008 by Taylor & Francis Group, LLC


8.6.1 The initial estimate ρ0 . . . . . . . . . . . . . . . . 222
8.6.2 Estimating z given ρ . . . . . . . . . . . . . . . . . 223
8.6.3 Estimating ρ given z . . . . . . . . . . . . . . . . . 224
8.7 A Gibbs Sampling Strategy . . . . . . . . . . . . . . . . . . 227
8.7.1 Estimating ρ given an alignment . . . . . . . . . . . 227
8.7.2 Estimating background probabilities given Z . . . . 228
8.7.3 Estimating Z given ρ . . . . . . . . . . . . . . . . . 228
8.8 Interpreting the Motif Profile in Terms of p . . . . . . . . . 229
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9 The Subtle Motif 235


9.1 Introduction: Consensus Motif . . . . . . . . . . . . . . . . 235
9.2 Combinatorial Model: Subtle Motif . . . . . . . . . . . . . 236
9.3 Distance between Motifs . . . . . . . . . . . . . . . . . . . . 238
9.4 Statistics of Subtle Motifs . . . . . . . . . . . . . . . . . . . 240
9.5 Performance Score . . . . . . . . . . . . . . . . . . . . . . . 245
9.6 Enumeration Schemes . . . . . . . . . . . . . . . . . . . . . 246
9.6.1 Neighbor enumeration (exact) . . . . . . . . . . . . 246
9.6.2 Submotif enumeration (inexact) . . . . . . . . . . . 249
9.7 A Combinatorial Algorithm . . . . . . . . . . . . . . . . . . 252
9.8 A Probabilistic Algorithm . . . . . . . . . . . . . . . . . . . 255
9.9 A Modular Solution . . . . . . . . . . . . . . . . . . . . . . 257
9.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

III Patterns on Meta-Data 263


10 Permutation Patterns 265
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . 266
10.2 How Many Permutation Patterns? . . . . . . . . . . . . . . 267
10.3 Maximality . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
10.3.1 P=1 : Linear notation & PQ trees . . . . . . . . . . 269
10.3.2 P>1 : Linear notation? . . . . . . . . . . . . . . . . 271
10.4 Parikh Mapping-based Algorithm . . . . . . . . . . . . . . . 273
10.4.1 Tagging technique . . . . . . . . . . . . . . . . . . . 275
10.4.2 Time complexity analysis . . . . . . . . . . . . . . . 275
10.5 Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
10.5.1 The naive algorithm . . . . . . . . . . . . . . . . . . 280
10.5.2 The Uno-Yagiura RC algorithm . . . . . . . . . . . 281
10.6 Intervals to PQ Trees . . . . . . . . . . . . . . . . . . . . . 294
10.6.1 Irreducible intervals . . . . . . . . . . . . . . . . . . 295
10.6.2 Encoding intervals as a PQ tree . . . . . . . . . . . 297
10.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.7.1 Case study I: Human and rat . . . . . . . . . . . . 308

© 2008 by Taylor & Francis Group, LLC


10.7.2 Case study II: E. Coli K-12 and B. Subtilis . . . . . 309
10.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

11 Permutation Pattern Probabilities 323


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 323
11.2 Unstructured Permutations . . . . . . . . . . . . . . . . . . 323
11.2.1 Multinomial coefficients . . . . . . . . . . . . . . . . 325
11.2.2 Patterns with multiplicities . . . . . . . . . . . . . . 328
11.3 Structured Permutations . . . . . . . . . . . . . . . . . . . 329
11.3.1 P -arrangement . . . . . . . . . . . . . . . . . . . . 330
11.3.2 An incremental method . . . . . . . . . . . . . . . . 331
11.3.3 An upper bound on P -arrangements∗∗ . . . . . . . 336
11.3.4 A lower bound on P -arrangements . . . . . . . . . 341
11.3.5 Estimating the number of frontiers . . . . . . . . . 342
11.3.6 Combinatorics to probabilities . . . . . . . . . . . . 345
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

12 Topological Motifs 355


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 355
12.1.1 Graph notation . . . . . . . . . . . . . . . . . . . . 355
12.2 What Are Topological Motifs? . . . . . . . . . . . . . . . . 356
12.2.1 Combinatorics in topologies . . . . . . . . . . . . . 357
12.2.2 Input with self-isomorphisms . . . . . . . . . . . . . 358
12.3 The Topological Motif . . . . . . . . . . . . . . . . . . . . . 359
12.3.1 Maximality . . . . . . . . . . . . . . . . . . . . . . . 367
12.4 Compact Topological Motifs . . . . . . . . . . . . . . . . . 369
12.4.1 Occurrence-isomorphisms . . . . . . . . . . . . . . . 369
12.4.2 Vertex indistinguishability . . . . . . . . . . . . . . 372
12.4.3 Compact list . . . . . . . . . . . . . . . . . . . . . . 373
12.4.4 Compact vertex, edge & motif . . . . . . . . . . . . 373
12.4.5 Maximal compact lists . . . . . . . . . . . . . . . . 374
12.4.6 Conjugates of compact lists . . . . . . . . . . . . . 374
12.4.7 Characteristics of compact lists . . . . . . . . . . . 378
12.4.8 Maximal operations on compact lists . . . . . . . . 380
12.4.9 Maximal subsets of location lists . . . . . . . . . . . 381
12.4.10 Binary relations on compact lists . . . . . . . . . . 384
12.4.11 Compact motifs from compact lists . . . . . . . . . 384
12.5 The Discovery Method . . . . . . . . . . . . . . . . . . . . . 392
12.5.1 The algorithm . . . . . . . . . . . . . . . . . . . . . 393
12.6 Related Classical Problems . . . . . . . . . . . . . . . . . . 399
12.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 400
12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
12.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

© 2008 by Taylor & Francis Group, LLC


13 Set-Theoretic Algorithmic Tools 417
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 417
13.2 Some Basic Properties of Finite Sets . . . . . . . . . . . . . 418
13.3 Partial Order Graph G(S, E) of Sets . . . . . . . . . . . . . 419
13.3.1 Reduced partial order graph . . . . . . . . . . . . . 420
13.3.2 Straddle graph . . . . . . . . . . . . . . . . . . . . . 421
13.4 Boolean Closure of Sets . . . . . . . . . . . . . . . . . . . . 423
13.4.1 Intersection closure . . . . . . . . . . . . . . . . . . 423
13.4.2 Union closure . . . . . . . . . . . . . . . . . . . . . 424
13.5 Consecutive (Linear) Arrangement of Set Members . . . . . 426
13.5.1 PQ trees . . . . . . . . . . . . . . . . . . . . . . . . 426
13.5.2 Straddling sets . . . . . . . . . . . . . . . . . . . . . 429
13.6 Maximal Set Intersection Problem (maxSIP) . . . . . . . . 434
13.6.1 Ordered enumeration trie . . . . . . . . . . . . . . . 435
13.6.2 Depth first traversal of the trie . . . . . . . . . . . . 436
13.7 Minimal Set Intersection Problem (minSIP) . . . . . . . . . 447
13.7.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 447
13.7.2 Minimal from maximal sets . . . . . . . . . . . . . 448
13.8 Multi-Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
13.8.1 Ordered enumeration trie of multi-sets . . . . . . . 451
13.8.2 Enumeration algorithm . . . . . . . . . . . . . . . . 453
13.9 Adapting the Enumeration Scheme . . . . . . . . . . . . . . 455
13.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

14 Expression & Partial Order Motifs 469


14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 469
14.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . 470
14.2 Extracting (Monotone CNF) Boolean Expressions . . . . . 471
14.2.1 Extracting biclusters . . . . . . . . . . . . . . . . . 475
14.2.2 Extracting patterns in microarrays . . . . . . . . . 478
14.3 Extracting Partial Orders . . . . . . . . . . . . . . . . . . . 480
14.3.1 Partial orders . . . . . . . . . . . . . . . . . . . . . 480
14.3.2 Partial order construction problem . . . . . . . . . 481
14.3.3 Excess in partial orders . . . . . . . . . . . . . . . . 483
14.4 Statistics of Partial Orders . . . . . . . . . . . . . . . . . . 485
14.4.1 Computing Cex(B) . . . . . . . . . . . . . . . . . . 489
14.5 Redescriptions . . . . . . . . . . . . . . . . . . . . . . . . . 493
14.6 Application: Partial Order of Expressions . . . . . . . . . . 494
14.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
14.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

References 503

© 2008 by Taylor & Francis Group, LLC


Acknowledgments

I owe the completion of this book to the patience and understanding of Tuhina
at home and of friends and colleagues outside of home. I am particularly
grateful for Tuhina’s subtle, quiet cheer-leading without which this effort may
have seemed like a thankless chore.
Behind every woman is an army of men. My sincere thanks to Alberto Ap-
sotolico, Saugata Basu, Jaume Bertranpetit, Andrea Califano, Matteo Comin,
David Gilbert, Danny Hermelin, Enam Karim, Gadi Landau, Naren Rama-
krishnan, Ajay Royyuru, David Sankoff, Frank Suits, Maciej Trybilo, Steve
Oshry, Samaresh Parida, Rohit Parikh, Mike Waterman and Oren Weimann
for their, sometimes unwitting, complicity in this endeavor.

© 2008 by Taylor & Francis Group, LLC


Chapter 1
Introduction

Le hasard favorise l´esprit preparé. 1


- attributed to Louis Pasteur

1.1 Ubiquity of Patterns


Major scientific discoveries have been made quite by accident: however a
closer look reveals that the scientist was intrigued by a specific pattern in
the observations. Then some diligent persuasion led to an important discov-
ery. A classic example is that of the English doctor from Gloucestershire,
England, by the name of Edward Jenner. His primary observation was that
milkmaids were immune to smallpox even though other family members would
be infected with the disease. The milkmaids were routinely exposed to cow-
pox and subsequently Jenner’s successful experiment of inducing immunity to
smallpox in a little boy by first infecting him with cowpox led to the world’s
first smallpox vaccination. A sharp observation in 1796 ultimately led to the
eradication of smallpox on this planet in the late 1970s.
A more recent story (1997) that has caught the attention of scientists and
media alike is the story of a group of Nairobi women, immune to AIDS. While
researchers pondered the possibility of these women acquiring immunity from
the environment (like the case of cows for smallpox), a chance conversation of
the attending doctor Dr. Joshua Kimani with the immune patients revealed
that about half of them were close relatives. This sent the doctors scrambling
to look for genetic similarity and the discovery of the presence of ‘killer’ T-
cells in the immune system of these women. This has led researchers in the
path of exploring a vaccine for AIDS.
Stories abound in our scientific history to suggest that these chance obser-
vations are key starting points in the process of major discoveries.

1 Chance favors the prepared mind.

1
© 2008 by Taylor & Francis Group, LLC
2 Pattern Discovery in Bioinformatics: Theory & Algorithms

1.2 Motivations from Biology


The biology community is inundated with a large amount of data, such as
the genome sequences of organisms, microarray data, interactions data such
as gene-protein interactions, protein-protein interactions, and so on. This
volume is rapidly increasing and the process of understanding the data is
lagging behind the process of acquiring it. The sheer enormity of this calls for
a systematic approach to understanding using (quantitative) computational
methods. An inevitable first step towards making sense of the data is to study
the regularities and hypothesize that this reveals vital information towards a
greater understanding of the underlying biology that produced this data.
In this compilation we explore various modes of regularities in the data:
string patterns, patterned clusters, permutation patterns, topological pat-
terns, partial order patterns, boolean expression patterns and so on. Each
class captures a different form of regularity in the data enabling us to provide
possible answers to a wide range of questions.

1.3 The Need for Rigor


Unmistakeably, the nature of the subject of biology has changed in the last
decades: the transition has been from ‘what’ to ‘how’.
Just as a computer scientist or a mathematician or a physicist needs to have
a fair understanding of biology to pose meaningful questions or provide useful
answers, so does a biologist need to have an understanding of the computa-
tional methods to employ them correctly and provide the correct answers to
the difficult questions.
While the easy availability of search engines makes access to exotic as well
as simple-minded systems very easy, it is unclear that this is always a step
forward. The burden is on the user to understand how the methods work,
what problems they solve and how correct are the offered answers. This book
aims at clarifying some of the mist that may accompany such encounters.
One of the features of the treatment in the book is that in each case, we
attack the problem in a model-less manner. Why is this any good? As ap-
plication domains change the underlying models change. Often our existing
understanding of the domain is so inadequate that coming up with an ap-
propriate model is difficult and sometimes even misleading. So much so that
there is little consensus amongst researchers about the correct model. This has
prompted many to resort to a model-less approach. Often the correct model
can be used to refine the existing system or appropriately pre- or post-process
the data. The model-less approach is not to be misconstrued as neglect of

© 2008 by Taylor & Francis Group, LLC


Introduction 3

the domain specifications, but a tacit acknowledgment that each domain de-
serves much closer attention and elaborate treatment that goes well beyond
the scope of this book. Also, this approach compels us to take a hard look at
the problem domain, often giving rise to elegant and clean definitions with a
sound mathematical foundation as well as efficient algorithms.

1.4 Who is a Reader of this Book?


This book is intended for biologists as well as for computer scientists and
mathematicians. In fact it is aimed at anyone who wants to understand the
workings and implications of pattern discovery. More precisely, to appreciate
the contents of this book, it is sufficient to be familiar with the prerequisites
of a regular bioinformatics course.
Often some readers are turned off by the use of terms such as ‘theorem’
or ‘lemma’, however the book does make use of these. Let me spend a few
words justifying the use of these words and at the same time encouraging
the community to embrace this vocabulary. Loosely speaking, a theorem is a
statement accepted or proposed as a demonstrable truth. Once proven, the
validity of the statement is unquestioned. It provides a means for organizing
one’s thoughts in a reusable manner. This mechanism that the mathematical
sciences has to offer is so compelling that it will be a mistake not to assimilate
it in this collection of logical thoughts.
Clearly, it is easier to have a theorem in mathematics than in the physical
sciences. Most of the theorems in this book are to be simply viewed as concise
factual statements. And, the proof is merely a justification for the claims.
Lemmas, though traditionally used as supporting statements for theorems, is
used here for simpler claims. All the proofs in this book require the logical
thinking at the level of a college freshman, albeit a motivated one.
The proofs of the theorems and lemmas are given for a curious and sus-
picious reader. However, no continuity in thought is lost by skipping the
proofs.
If I could replace a theorem with an example, I did. If I could replace an
exposition with an exercise, I did. An illustrative example is worth a thousand
words and an instructive exercise is worth a thousand paragraphs. I have made
heavy use of these two tools throughout the book in an attempt to convey the
underlying ideas.
The body of the chapter and the exercise problems accompanying it have
a spousal relationship: one is incomplete without the other. These problems
are not meant to test the reader but provide supplemental points for thought.
Each exercise is designed carefully and simply requires ‘connecting the dots’
on part of the reader, while the body of the chapters explains the ‘dots’. The

© 2008 by Taylor & Francis Group, LLC


4 Pattern Discovery in Bioinformatics: Theory & Algorithms

∗∗
challenging exercises (and sections) are marked with .

1.4.1 About this book


Consider the scenario of getting a sculptor and a painter together to pro-
duce a work of art. Usually, the sculptor will not paint and the painter will
not sculpt, however the results (if any) of their synergy could be incredibly
spectacular!
Interdisciplinary areas such as bioinformatics must deal with such issues
where each expert uses a different language. Establishing a common vocab-
ulary across disciplines is ideal but has not been very realistic. Even sub-
disciplines such as ‘systems’, ‘computational logic’, ‘artificial intelligence’ and
so on, within the umbrella discipline of computer science, have known to have
developed their very own dialects. Many seasoned researchers may also have
witnessed in their lifetimes the rediscovery of the same theorems in different
contexts or disciplines.
Sometimes, the problem is compounded by the fact that the sculptor dab-
bles in paint and the painter in clay. I believe I am cognizant of the agony
and the ecstasy of cross-disciplinary areas. Yet I write these chapters.

Roadmap of the book. The book is organized in three parts. Part I


provides the prerequisites for the remainder of the book. Chapters 2 and
3 are designed as journeys, with a hypothetical but skeptical biologist, that
takes the reader through the corridors of algorithms and statistics. We follow
a story-line, and the ideas from these areas are presented on a need-to-know
basis for a leery reader.
Chapter 4 discusses the connotation of patterns used in this book. The
nuance of repetitiveness that is associated with patterns in this book is not
universal and we reorient the reader to this view through this chapter.
Part II of the book focuses on patterns on linear (string) data. Chap-
ter 5 discusses possible statistical models for such data. Patterns on strings
are conceptually the simplest and the ramifications of these are discussed in
Chapters 6 and 7.
String patterns are simple, yet complex! This is explored in the follow-
ing two chapters. Chapter 8 discusses different (probabilistic) motif learning
methods where the pattern or motif is a consequence of local multiple align-
ment. Chapter 9 focuses on methods, primarily combinatorial, where the
pattern or motif is viewed as the (inexact) consensus of multiple occurrences.
Part III of the book deals with patterns that have more sophisticated spec-
ifications. The complexity is in the characterizations but not necessarily in
the implications. A string pattern on DNA may have as strong a repercussion
as any other.
Permutation patterns are mathematically elegant, algorithmically interest-
ing, statistically challenging and biologically meaningful as well! A well-

© 2008 by Taylor & Francis Group, LLC


Introduction 5

rounded topic as this is always a delight and is discussed in Chapters 10


and 11.
Topological or network motifs are common structures in graph data. This
area of motifs is relatively new and I have met more researchers who are
skeptical about the import of these than any other. I give a slightly unusual
treatment, in the sense that it is not based on graph traversals as is usually
the case, of this problem in Chapter 12.
Chapter 13 is an attempt at identifying some common tools that could be
utilized in most areas of pattern discovery. These include mainly structures
and operations on finite sets.
The book concludes with a discussion of even more exotic pattern character-
izations in the form of boolean expressions and partial orders in Chapter 14.

© 2008 by Taylor & Francis Group, LLC


Part I

The Fundamentals

© 2008 by Taylor & Francis Group, LLC


Other documents randomly have
different content
28

London. Published by E. Donovan, & Mess.rs Simpkin & Marshall, Jan. 1, 1823.
CONCHOLOGY.

PLATE XXVIII.

MALLEUS MACULATUS
SPOTTED HAMMER SHELL, OR HOUND’S TONGUE.

* Bivalve.

GENERIC CHARACTER.
Shell subquivalve, rough, deformed, generally lengthened and
lobed or hammer-shaped: beaks small and divergent. Hinge without
teeth, a lengthened conic hollow situated under the beaks and
traversing obliquely the facet of the ligament. A lateral slope or
groove at the side of the ligament for the passage of the byssus or
beard with which the animal is furnished.

SPECIFIC CHARACTER.
Shell curved, with a single somewhat straight abbreviated lobe at
the base: reddish yellow, clouded, spotted and dotted with fuscous.
Malleus Maculatus: testa arcuata, lobo basis unico sub-recta
abbreviato flavo-rufescente fusco nebulosa maculata
punctisque.
The singular object now before us, a shell no less remarkable for
the peculiarity of its form than rarity of occurrence, is one of the
most choice productions of the seas surrounding the Friendly Isles.
The discovery of this shell, like that of many others, resulted from
the assiduities of that eminent Naturalist and promoter of scientific
knowledge, the late Sir Joseph Banks, and of Dr. Solander, who
accompanied him in that memorable voyage of Captain Cook to the
Southern Hemisphere, in which the Friendly Isles were discovered.
The fine example of this shell, in particular, from which the drawing
in our plate is taken, it may be also added, was one of those which
were brought to this country by Captain Cook upon the return of the
expedition, and which being shortly after presented to Sir Ashton
Lever, remained in the Museum of that distinguished amateur from
that period to the time of its dissolution in the year 1806.
When we consider the very remote situation of those islands, so
distant from the usual track of all navigators, we cannot be
surprised, admitting the species to be local in those seas, to find it
has remained a very rare shell from the period of its discovery to the
present time. In the course of many years only a few specimens
have occurred to our observation, and while it has remained scarce
with us, it appears to have been still more uncommon in the
continental cabinets: very few of which, if we are informed correctly,
were lately in possession of it.
The first difficulty that arises in the mind of the naturalist upon the
inspection of this shell results from the ambiguity of its generical
peculiarities: we pause to consider where it should be placed.
Linnæus, to whom, as it will be observed, the present shell was
totally unknown, arranged the Hammer Shell, its nearest
approximation, among the Ostreæ. The Hammer Shell, or as it is
more usually denominated the Hammer Oyster Shell, had been
discovered before the time of Linnæus; it had appeared in the work
of Rumpfius, Seba, Gualtieri and Argenville, and the shell had been
examined and described by him in the Museum of Ulrica, Queen of
Sweden, under the name of Ostrea Malleus. That the hinge accords
in some degree with that of the Ostreæ generally must be admitted,
at the same time that it possesses other characters less easily
reconciled to that genus, unless we embrace the Linnæan genus in
all its latitude, and to this the conchologist of the present day cannot
accede, at least without some little difficulties.
The conformation of this shell is very striking, and yet we perceive
that its essential characteristics are less definitive than could be
wished; there are several approximations in the general figure to be
found among shells which nevertheless possess characters
generically distinct. For many years this shell was known in this
country under the name of “Margaritifera maculata,” and the trivial
English appellation of the “Spotted Hound’s Tongue:” it appeared
under those names in the Conchological Museum of M. de Calonne,
while it remained in England, and in the catalogue of that museum,
which is still extant, it will be found under those names. The epithet
of Hound’s Tongue is not inaptly applied to this shell, in allusion to
the elongated form. The term Margaritifera does not refer to the
form, but to the pearly gloss that appears upon the surface of the
dark blue space lying within the shell, immediately below the hinge,
and extending from thence about one fourth part of its whole length.
This is the region in which the animal is attached by its ligament to
the valves of the shell; besides which, a gloss of pearly hue is
observed to pervade the whole of the inner surface, only that it is
most conspicuous in the darker disk of the shell. As a secondary
character this pearliness is very remarkable in the shell before us, at
the same time that as a generical denomination the term
Margaritifera assigned to it from this circumstance alone is liable to
objection; because, the same pearliness prevails in many shells
which have no relation whatever with the present, either in the form
or structure of the hinge, and it is to these we must resort for its
true essential character.
Lamarck constitutes many genera of the shells included in the
Ostrea genus of Linnæus. His Malléacées comprehend five genera,
Crenatula, Perna, Malleus, Avicula, and Meleagrina, all which are
allied more or less remotely to the shell before us. To that particular
family which is known among collectors by the designation of
Hammer Oysters, he gives the name of Malleus, in the French
Marteau, both alike implying the hammer like form of the species
Malleus, which Lamarck assumes as the type of this genus. But even
there after all the renovation that has been attempted, the result is
not satisfactory, because this figure is by no means constant, even in
the few species included by its author in that genus; it contains but
six species, and these are entirely at variance with each other. Thus
for example, in Malleus Vulgaris, the common Hammer Shell, we
have a species with three lobes, a lateral one of considerable size
being advanced on each side the beaks: and another shell of the
same species with only short lateral lobes instead of large ones.
Admitting the hammer form to be still preserved in these, in the next
species, Malleus Normalis, instead of two lobes, the hammer head, if
it may be so expressed, has but a single lobe: in Malleus Anatinus
there is only one lobe, and that very small; and in Malleus
Vulsellatus, although characterised as “lobo oblique porrecto,” the
appearance of the shell implies rather the total absence of any lobe,
for the lobe, if so it may be termed, is so indefinite, that it cannot be
referred without violence to the genus Marteau, while we consider
its hammer like form as a leading character of the genus. With
exception to this inconstant character which may be qualified with
the expression “deformed and generally hammer shaped,” we have
no objection to the Malleus genus, because the byssus of the animal
by means of which it can affix itself to other bodies, and the peculiar
sinus or sulcation of the hinge through which the byssus passes from
the animal to those extraneous bodies, are sufficient to remove it
from the Ostrea genus, in which case if we still adhere to the
Linnæan method we can place it only among his Mytili or Pinnæ,
and it has certainly less affinity with either of those than with
Ostrea. Perhaps the name of Perna under which this shell has been
mentioned a few years ago might have been as well preserved, but
that name Lamarck assigns to an extensive genus of which Ostrea
Isognomum is the type, and it is therefore better to retain the name
Malleus than to alter it to another which could not fail at this time to
create confusion. The same consequence would as unquestionably
result were we to sub-divide the Malléacées into different genera
according to the configuration of the shell or number of its lateral
lobes.
The definition of Malleus in the Règne Animal of Cuvier appears to
intimate the same objection; it does not consider the hammer like
form of the shell as any criterion, it is only stated that the Marteaux
are inequivalve and irregular, that they have a simple hollow for the
ligament as in the oysters, but that they are distinguished by a slope
at the side of the ligament for the passage of the byssus.
It is assuredly true that the presence of a byssus in this tribe of
shells displaces them from any immediate analogy with the Ostrea,
where as Cuvier remarks “Linnæus left them.” But, if however, we
attentively examine the hinge of the common oyster, the two valves,
and the oyster as it lies within the valves, we shall perceive with this
exception a pretty near approximation. The great objection is, that
the animal of the tribe of shells now before us protrudes a byssus
from its body through a lateral opening on one side or slope of the
ligament of the hinge; if we closely inspect the valves of the oyster,
we also find a slight depression or hollow upon each side of the
cartilage of the hinge; these are small, and usually somewhat
lamellar. The oyster, moreover, as it lies in the shell, seems capable
of expanding or spreading that part of the body which lies under the
hinge laterally upon and into these depressions, a circumstance very
easily observed in the half famished oyster, because these lateral
expansions of the animal are then more visibly elongated along the
passage of these lateral grooves of the hinge, and give the pointed
end of the animal a somewhat cornuted appearance. Under the
same circumstance these processes adhere as they lie in the hollow
of these grooves, and thus suggests the idea of the animal having
exerted itself by such extension to obtain refreshment through these
lateral hollows. Those hollows are also so far pervious as to admit
the ingress of moisture while the shells are closed, in the same
manner as it is possible the Malleus genus may receive moisture
under the same circumstance through the sinus, whence the byssus
is protruded. These peculiarities considered, may perhaps afford
some further justification of Linnæus in placing the hammer shells
with the Ostreæ. It has been indeed advanced that Linnæus was not
aware of these hammer shells being furnished with a byssus, or that
he would have referred them to the Mytili, but this observation
cannot be correct, because in the figure given of these shells by
Seba, to which Linnæus refers, the byssus, which is very
conspicuous, is represented pendent or hanging to a considerable
length out of the shell.
From an attentive examination of the different Conchological
authors, it does not appear to us that the shell before us has
hitherto been figured, and we have reason also to believe that it has
never been described. These circumstances are the more probable
since, as we have before observed, the shell is at this time very little
known among the Continental Cabinets. The nearest approach, so
far as we can judge from the description, unassisted by any figure, is
the Marteau Normal (Malleus Normalis) of Lamarck, a species
defined by him as testa biloba; lobo basis unico anticali ad normam,
our shell is certainly bilobate, for it has only one lateral lobe at the
beak, and that moreover advances from the beak, pretty nearly,
though not exactly, in a right line; but its general description does
not sufficiently accord with our shell to authorise as a conclusion
that they are the same. Lamarck informs us that there are two
varieties of his Malleus Normalis, one of which is a native of the
ocean of the Great Indies, the other of the seas of New Holland. The
first, or Indian kind, he describes as being on the inside as well as
outside of a black colour, with a longish lobe at the base of the shell.
[22]
The New Holland kind is described of a whitish colour, with the
lobe at the base abbreviated.[23]
The two last-mentioned shells which Lamarck concludes to be
varieties of the same species, may perhaps prove hereafter to be
species distinct from each other, as Lamarck has himself shewn to be
the case with respect to the common black and the white hammer
shells. The black supposed variety of Malleus Normalis we
apprehend to be distinct from the shell before us, but it is possible
that the New Holland shell which he describes as being whitish, with
the lobe at the base abbreviated, may be a worn or much
depauperated specimen of our present shell; it certainly does not
accord with our shell in any tolerable state of preservation.
Lamarck says nothing of any ruddiness or testaceous hues in his
New Holland variety of Normalis, and admitting these colours to
indicate that the shell had been found with its animal in a living
state, we can scarcely conceive the dark fuscous spotting which is so
conspicuous in the species could by any ordinary accident be so
entirely obliterated as appears to be the case in Lamarck’s specimen,
if his New Holland variety of Malleus Normalis be really of this
species; and it may be further added that if our present shell was
actually intended by his Malleus Normalis, the defects of his shell has
necessarily influenced his specific character and rendered it
imperfect.
We have not adverted to Malleus Anatinus of Chemnitz, because
the figure of that shell is ambiguous. There is a remote resemblance
in the lateral appendages of the beaks, but in other particulars the
resemblance is less obvious, the body is sometimes curved as in the
shell before us and sometimes straight, but the edges of the valves
are parallel, and the shell itself pellucid: the figure in Chemnitz is
less than half the size of our shell. This inhabits the seas of Timor
and the Nicobar Islands.
It should be observed in conclusion that there is a specimen of our
species among the Hammer Shells in the British Museum, the habitat
of which is indicated by the word “Amboina:” it is much smaller than
our shell. Besides this we have lately seen another example from
New Holland, of a growth still larger than the shell we have
delineated.
We have entered thus minutely into the analogies of this shell
from an apprehension we might otherwise in this instance submit as
a new species an object that had been previously described. The
result of our enquiry will tend to shew that if the species has not
remained entirely unnoticed, it has never been described with much
precision.
29

London. Published by E. Donovan & Mess.rs Simpkin & Marshall, Jan.y 1, 1823.
ENTOMOLOGY.

PLATE XXIX.

PAPILIO TROS
TROS’S BUTTERFLY.

Lepidoptera.

GENERIC CHARACTER.
Antennæ thicker towards the tip, and generally terminating in a
knob: wings erect when at rest. Fly by day.
* Equites Trojani.

SPECIFIC CHARACTER
AND
SYNONYMS.
Wings indented, tailed, above and beneath black; on the anterior
wings an abbreviated white band: posterior ones with sanguineous
spots.
Papilio Tros: alis dentato caudatis concoloribus nigris: anticis fascia
abbreviata alba, posticis sanguinea maculari. Fabr. Ent. Syst.
T. 3. p. 1. 10. 30.
Jon. fig. pict. 1. tab. 23.

The tribe of Butterflies to which the Papilio now before us


appertains, includes many of the larger and more interesting species
of the Papiliones known. This tribe, as its designation implies, has
been dedicated by Entomologists to the memory of the more
distinguished worthies of the Trojan race, and above others to
preserve the memory of those heroes whose exploits in the defence
of that rich and potent station of the ancient world, the town of Troy,
has been commemorated in the Iliad by the immortal Homer. Our
present species refers indeed to a Trojan of an earlier period; it is
named after Tros, the founder of the Trojan name. Tros was the fifth
king of the Trojan dynasty, from its first establishment in the person
of Scamander, and the last but three; the destruction of Troy being
accomplished under the reign of Priam. The country before the time
of Tros was called Dardania, from Dardanus, who is usually stiled the
first of the Trojan kings, though in Phrygia he was preceded by
Scamander and Teucer. Tros lived about fourteen hundred years
before the Christian Era, and reigned king of Troy for the space of
sixty years. It is in honour of this Trojan Monarch that Fabricius has
given the present insect the name of Papilio Tros.
There are several Papiliones which bear a nearer or more distant
resemblance to this Papilio, a circumstance that will impose some
caution upon the Entomologist before he can venture to pronounce
upon the species with decision: its characters are nevertheless
sufficiently conspicuous, and when examined with due attention,
enables us to determine the species from its nearest approximations,
in a clear and satisfactory manner. The wings are dark above as well
as beneath, the deeper colouring prevailing, however, on the upper
surface as well as beneath; the anterior wings are marked with a
broad abbreviated whitish band, and the lower wings with a large
sanguineous or blood red spot of considerable magnitude. This
sanguineous spot from lying in the disk of the wing is traversed and
divided by the black nerves of the wing in such a manner as to
appear in the form of six distinct oblong spots, placed laterally to
each other: these spots appear also on the lower surface, in the
same form as above, but the colour is rather paler.
As there is no figure extant of this large and fine Papilio in the
work of any author, the delineation which we have the pleasure on
this occasion to submit before our readers will doubtlessly be viewed
with peculiar satisfaction. It need be only added that the species has
been definitively determined upon the authority of Mr. Jones’s
collection of original drawings, to which Fabricius so constantly
refers, and that for this reason its specific appellation may be
implicitly upon by the scientific Entomologist.
This interesting Papilio is a native of Brazil.
30

London. Published by E. Donovan, & Mess.rs Simpkin & Marshall. Jan.y 1,


1823.
ORNITHOLOGY.

PLATE XXX.

PSITTACUS MELANOPTERUS
BLACK WINGED PARRAKEET.

Order
Picæ.

GENERIC CHARACTER.
Bill falcated; upper mandible moveable and in general covered
with a cere: nostrils rounded, placed in the base of the bill: tongue
fleshy, obtuse, entire: feet formed for climbing.

SPECIFIC CHARACTER
AND
SYNONYMS.
Pale green, back and wings black: secondary wing feathers yellow,
at the tip blue: tail purple with a black band.
Psittacus Melanopterus: pallide viridis, dorso alisque nigris, remigibus
secundariis luteis apice cæruleis, rectricibus purpureis fascia
nigra.—Lath. Ind. Orn. T. 1. p. 132. n. 152.
Psittacus Melanopterus: pallide viridis, dorso, tectricibus alarum,
caudæ fascia remigibusque primariis nigris, secundariis
flavescentibus cæruleo punctatis.—Gmel. Linn. Syst. Nat. T.
1. p. 350. n. 132.
Perruche aux ailes variées.—Buff. Hist. Nat. des Ois. 6. p. 172.
Petite perruche de Batavia.—Buff. Pl. enlum. n. 791. f. 1.
Petite perruche de l’isle de Luçon.—Sonner. it. p. 78. t. 41.
Black Winged Parrakeet.—Brown Illus. t. 3.

There are few beings of the feathered race more peculiarly


distinguished for the splendid gaiety and rich variety of colours with
which their plumage is adorned than the parrot race; for however
they may differ in size from the magnitude of a kite or hawk, to that
of the comparative diminutive thrush or sparrow, they are almost
uniformly beautiful in this particular, and exhibit a diversity that is
scarcely found in any other tribe. The species we have selected for
our present representation is one of the smaller kinds of the family
distinguished by the name of Parrakeets. Its total length is about six
inches, its form robust or bulky in proportion.
The bill and legs of this bird are usually described as being dusky,
in our specimen the bill is rather pale, tinged with brown and
greenish, and the legs inclining to flesh colour. The general colour of
the head and neck is green, and the same colour prevails on the
breast, belly, and thighs. Upon the crown of the head the green
assumes a blueish tint, and on the neck appears enlivened with
yellowish, the disk of a number of the feathers being of a yellow
colour, with the edges brown, so as to present a kind of scolloped
appearance. The back and wing coverts are deep black, with a
somewhat velvet aspect; the greater quill feathers black. But one of
the characters by which it is distinguished chiefly is the remarkable
band of yellow, and its contiguous parallel band of blue by which the
wings are traversed. This conspicuous band is formed by the
secondary quill feathers, which being of a fine yellow, with the ends
a lively blue, appear like two distinct bands, and from their gaiety of
colouring are admirably relieved by the deep sable hues of the wings
and back. In the bird before us the black colour of the back extends
nearly to the tail, the ends of the tail coverts only being green. The
most singular contrast in the appearance of its plumage arises from
the very different colour of the tail: this is of a pale carnation,
glossed or changeable to a delicate violet. The tail, with the
exception of the two middle feathers, is traversed near the tip with a
single broad band of black; the two middle feathers are of the same
pale carnation colour as the rest, but rather more inclined to blueish.
The black winged Parrot is described as a native of Batavia and
Luzonia. Our specimen we are assured is from the Brazils. We have
also very lately had an opportunity of consulting an extensive series
of drawings, representing the principal Natural productions of
Surinam, made by an Englishman resident upon the spot, for his
own amusement, and among those drawings have met with one of
the black winged Parrakeet. Upon this authority we have no
hesitation in pronouncing it to be a native of Surinam; and indeed it
seems to be so well known in that part of the world that it is
distinguished among the inhabitants by a peculiar name, it is called
by them Ajàlàlero.
31

London. Published by E. Donovan & Mess.rs Simpkin & Marshall, Feb. 1, 1823.
ENTOMOLOGY.

PLATE XXXI.

PAPILIO HIPPODAMIA
HIPPODAMIA’S BUTTERFLY.

Lepidoptera.

GENERIC CHARACTER.
Antennæ thicker towards the tip, and generally terminating in a
knob: wings erect when at rest. Fly by day.
**** P. Heliconii.

SPECIFIC CHARACTER
AND
SYNONYMS.
Wings oblong and entire; anterior pair black, with three hyaline
bands: lower ones hyaline.
Papilio Hippodamia: alis oblongis integerrimis: anticis nigris: fasciis
tribus hyalinis, posticis hyalinis. Fabr. Ent. Syst. T. 3. p. 1.
165. 509.
Jon. pict. n. 149.

The Papiliones of the Heliconii tribe are named by Linnæus after


the nymphs of the fabulous and mythological history of the ancient
classics; an example that has been followed by Fabricius, and
subsequently by other writers. Thus the present interesting insect is
dedicated to commemorate among the votaries of science, the name
of Hippodamia, a nymph feigned by the poets to be the daughter of
Oenomaiis, and who according to the legends of classic lore, besides
being much celebrated for her beauty, was distinguished for her
swiftness in the race; and at length bestowed her fair hand in
marriage upon Pelops, because in speed he excelled her.
This insect, which is of a moderate size, is of a light and elegant
structure. The wings are black, but the transparent spots occupy so
much space that the sable colouring does not appear predominant;
it is less prevalent in the posterior than the anterior wings, and yet
less upon the under surface than the upper. The form and
disposition of these transparent spots with which the dark colour of
this fly is variegated, are altogether characteristic, and deserve
particular attention, because there are other insects of the same
tribe which pretty nearly resemble it. From the middle of the anterior
wing extends a transparent spot of a very elongated heart shaped
form, having the point directed to the thorax, and a bar of black
crossing it at the broader end, so as to give it the appearance of two
distinct spots; and beyond this is another hyaline spot about the
same size as the larger one of the two transparent spaces of which
the first-mentioned spot consists. The posterior part of the wing is
further marked with two bands of the same transparent texture as
the others, each consisting of three distinct spots. The lower wings
present a larger transparent space than the upper wings, the whole
disk being hyaline with only the posterior limb or border opake, and
of a black colour. The thorax and body is black.
The hyaline spots as seen on the under side are of the same size
and form as they appear above, but the opake spaces instead of
being uniformly black as on the upper surface, are agreeably
diversified with rufous and geminous dots of white: these double
white dots are situated on the black border at the tips of the wings,
three on that of the anterior pair, and three on that of the posterior
ones.
From the very close analogy that prevails between this and several
other species of the same tribe, it would, no doubt, have been a
matter of considerable difficulty at this time to determine the
Fabrician species Papilio Hippodamia with precision, if we had not
possessed the means of reference to the Fabrician manuscripts, and
the drawings in which it is delineated; for it has remained to this
period unfigured by any author. It will be observed that Fabricius
does not refer for this species to the Collectanea of Mr. Jones, as in
many other instances. The cause of this omission will admit of a very
easy explanation; Fabricius had seen the insect in the first instance
in the cabinet of M. Mauduit, at Paris, to which he has referred. But
subsequently when in England he found a drawing of the insect in
the collection of Mr. Jones, and inscribed the name and character of
the species upon the drawing, as it afterwards appeared in his
Entomologia Systematica; and it is upon this authority that we are
enabled to speak with certainty upon a species which, but for this
circumstance, would be now involved in ambiguity. The figures in
our plate are copied from the drawings of Mr. Jones, inscribed with
the hand-writing of Fabricius.
At the time Fabricius described this species its habitat was
unknown: we have lately met with it in a collection of Brasilian
insects, and entertain no doubt of its having been brought with the
rest from that part of the globe.
32

London. Published by E. Donovan & Mess.rs Simpkin & Marshall, Feb.y 1, 1823.
CONCHOLOGY.

PLATE XXXII.

CYPRÆA AURORA
AURORA, MORNING-DAWN,
OR,
ORANGE COWRY.

* Univalve.

GENERIC CHARACTER.
Shell univalve, involute, subovate, smooth, obtuse at each end:
aperture effuse at each end, linear, extending the whole length of
the shell and denticulated each side.

SPECIFIC CHARACTER
AND
SYNONYMS.
Shell ovate ventricose, and somewhat globose, orange without
spots: margin white: throat orange or sometimes rosy.
Cypræa Aurora: ovato-ventricosa, subglobosa, aurantiâ immaculatâ:
margine alba, fauce aurantia vel incarnata.
Cypræa Aurantium: testa subturbinata aurantia margine alba
immaculata fauce rutila. Gmel. Linn. Syst. Nat. T. 1. p. 6.
3403. 121.
Cypræa Aurora: testa ovato-ventricosâ, turgidâ subglobosâ, aurantiâ,
immaculatâ; lateribus albis; fauce aurantiâ. Lamarck T. 7.
382. 14.

Every Conchologist is aware of the existence of this superb shell:


its magnitude is considerable, and its colour too conspicuously
distinct from that of all other species of its genus to be passed over
without immediate observation.
The Cypræa generally are a tribe of shells peculiarly striking: the
most common species possess an elegance of fervid colouring and
politure that never fail to recommend them to attention. But a few
years only have passed away, since the mantle decorations of the
fire place in the apartments of fashion, besides images and jars of
china porcellain, consisted of shells, among which the various kinds
of Cowries were not esteemed the least ornamental. And they are
sometimes still seen in such situations; while the grotesque statuary,
the josses, and the dragons, of China and Japan, in conformity with
a better taste, have wholly disappeared.
The shells of the Cypræa, genus which are most familiar to the
generality of observers, are the spotted Cowries, and some others of
usual occurrence. There are others which from their rarity are less
extensively known, and among the number we may truly rank the
species which we have now before us, the Orange Cowry, or as it is
sometimes called, the “Morning Dawn.” The beauty of this shell, as
well as scarcity, has established its celebrity; the species is well
known, but few collections, excepting those of the more costly kind,
possess the shell. Its magnitude is considerable, for its size is
nothing inferior to that of the Spotted Cowry, which ranks in this
respect the chief species of its family, while the distinction of its
colour from that of all other shells of the Cypræa tribe at once
attracts particular attention.
The colour of the back in this species is of a very fine orange,
simple, and unadorned with any marks or spots whatever. The tint of
orange varies in different shells from pale to darker, but whatever
may be its deviations in this respect, the tint of colour is constantly
deepest upon the back, and the transition as constantly becoming
gradually paler or more diluted as the colour descends upon the
sides towards the margin. This margin is rounded, projecting, and of
a pure white, except at the throat, as it is termed, where a tint of
red or reddish prevails to a small extent. The under surface of the
shell is white, except at the sides where the orange colour of the
back descends, spreads, and fades away into the white. The
aperture of the shell is a longitudinal opening down the middle as
usual in the other kinds of Cowry; the surrounding region of the
shell is a pure white, but the edges of the opening, both which are
beset with numerous linear teeth, are of a fine orange.
For the discovery of this extremely beautiful shell, like many other
acquisitions of importance in the cabinet of the Conchologist, we
stand indebted to the assiduities of that eminent Naturalist Sir
Joseph Banks, and those who accompanied him in the celebrated
voyage of Captain Cook round the world. They observed it among
the ornaments with which the natives of Otaheite had decorated
their dresses, which were composed of feathers, and the barks of
trees. To these garments they were attached by means of a string
passing through a hole perforated for the purpose on one side of the
shell. The natives were not so easily induced to part with these
shells as the other decorations of their clothing, appreciating them at
a much higher value. Our navigators were at first led to imagine
these shells to be inhabitants of the seas surrounding Otaheite, in
which particular they were at length undeceived by the natives who
informed them to the contrary: they said the shells were found near
an island at a great distance from Otaheite, and from the direction of
the spot toward which they pointed, it was conjectured they meant
the Fegee or Fidgi Islands, which are inhabited by the most ferocious
cannibals throughout those seas.[24] Our navigators were therefore
able only to procure such specimens as were attached to the dresses
of the natives, and these being almost constantly perforated for the
better convenience of fastening them on safely, at once explains the
reason of the Orange Cowry being so rarely met with undisfigured
by such perforation.
The mention of this circumstance, which at this distant period can
be little known, is moreover of some importance, because as the
shells were really brought from Otaheite, it has been generally
supposed to be a native of that island, and has even sometimes
been called the Otaheitan Cowry. Gmelin, who records this shell
under the name of Cypræa Aurantium, speaks of it as a native of the
Friendly Isles, “habitat ad insulas amicas,” resting his authority, we
apprehend, upon the Conchology of Martyn, and which though
published shortly after the return of Captain Cook, could not be so
well informed upon the subject as the venerable friend who assured
us it is neither a native of Otaheite, nor the Friendly Islands.
Lamarck has subsequently observed that the species inhabits the
seas of the Friendly Islands as well as those of Otaheite, and also of
New Zealand. Upon what authority the localities have been increased
to this extent is not stated. We have understood from very good
authority that researches have been made repeatedly of late years
by our navigators to discover the shells in those seas, and without
effect; and this fact appears to be confirmed from the increasing
value and importance attached to the species. We are indeed not
entirely certain that any of these shells have ever been procured,
except as before observed from among the natives of Otaheite, and
the value of the shell has progressively advanced in consequence
from four, or five, to ten pounds. A specimen in the collection of Mrs.
Angus sold about three years ago in London for twenty guineas;
thirty guineas have been in vain offered for another specimen within
the last two or three years, and a collector at this period in London
is in possession of another which it is understood cost him very
lately fifty guineas. These circumstances, if we mistake not, conspire
to prove, that the Orange Cowry is a far more local species than
might be inferred from the observation of Lamarck.
Besides the name of Otaheitan Cowry, this shell has been also
called the “Orange Cowry,” and the “Morning Dawn,” in reference to
the latin “Cypræa Aurantium,” and “Aurora,” by both which it had
been at different times distinguished. That of Aurantium alludes only
to the prevailing orange colour of the shell, and has been given to it
by Gmelin after Martyn. There is something more poetically elegant,
and perhaps no less appropriate in the trivial name Aurora, which
Lamarck adopts: we may in truth compare its beauteous fulvous
hues fading into white with inexpressive softness, to the warm
glowing tints and fainter blushes of an opening morning sky in
summer. We have also adopted this name as well as Lamarck, for its
peculiar elegance, in preference to that of Aurantium.
The origin of the epithet “Aurora,” bestowed upon this shell has
probably long since been forgotten; it arose from one of those
fugitive events not likely to be recorded excepting only in the
recollection of collectors; and those in whose immediate knowledge
the circumstance occurred have long since passed this transitory
scene and are perhaps ere this themselves forgotten. The relation
though in some respects trivial, may afford amusement to the
amateur: it serves to shew the origin of its name “Aurora” at the
same time that it presents a striking illustration of that ardent zeal
with which the science of Conchology was cultivated in this country
nearly half a century ago; its authenticity may be relied upon. The
circumstance as related to us by an old collector some years ago
was briefly this; a specimen of the shell had very shortly after the
return of the discovery ships been presented by one of the officers
to a lady, which coming to the knowledge of a most zealous collector
of that period, he solicited the indulgence of seeing it; and waited
upon the lady for the purpose, upon an intimation that the favour
would be readily granted. Madam, said the enraptured visitor, gazing
in admiration upon the Cowry, which he now beheld for the first
time, has this shell a price? will twenty guineas purchase this lovely
gem? it will not answered the lady. Allow me then said its
enthusiastic admirer to clasp it for a moment in my hands, and
bending on one knee, at the same time pressing the shell to his lips,
pronounced with an emphasis of poetic fervour, “thus do I salute the
‘Morning Dawn’ of the new discovered world!” Let poets reverence
Venus the beauty of the Grecian seas: my idol is “Aurora,” this sea-
born nymph of surpassing beauty, that rose upon the waves of the
Southern deep!
Tu quoque cum Dea sis, Divâ formosior illâ
Concha per æquoreum quam vasa ducit iter.[25]

Sec. 6. Basium.

Abating somewhat of the romantic warmth with which the ideas of


the venerable collector alluded to was expressed, it must be
admitted that in point of beautiful simplicity this shell has never been
surpassed by any subsequent discoveries in the southern
hemisphere; and it is no less singular than certain that the price of
twenty guineas, which that collector named upon an imaginary
valuation, has become the average standard value of a fine shell of
this kind for some years past. At present they are more highly
prized, because it is now pretty clearly ascertained that they are no
longer to be procured among the natives of Otaheite; and for this
reason it is much more likely they will reach a still higher price than
that the value of them should diminish. The shell we have
represented is to be considered as a very fine specimen in respect to
size as well as colour.
33

London. Published by E. Donovan & Mess.rs Simpkin & Marshall Feb. 1, 1823.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookultra.com

You might also like