0% found this document useful (0 votes)

2 views

Deep Learning in Natural Language Processing 1st Edition Li Deng download

Ebook access

Uploaded by

geseklaierbr

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Deep Learning in Natural Language Processing 1st Edition Li Deng download

Ebook access

Uploaded by

geseklaierbr

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Deep Learning in Natural Language Processing 1st

Edition Li Deng pdf download

https://textbookfull.com/product/deep-learning-in-natural-
language-processing-1st-edition-li-deng/

Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Deep learning in natural language processing Deng

https://textbookfull.com/product/deep-learning-in-natural-
language-processing-deng/

Deep Learning for Natural Language Processing Develop

Deep Learning Models for Natural Language in Python
Jason Brownlee

https://textbookfull.com/product/deep-learning-for-natural-
language-processing-develop-deep-learning-models-for-natural-
language-in-python-jason-brownlee/

Python Natural Language Processing Advanced machine

learning and deep learning techniques for natural
language processing 1st Edition Jalaj Thanaki

https://textbookfull.com/product/python-natural-language-
processing-advanced-machine-learning-and-deep-learning-
techniques-for-natural-language-processing-1st-edition-jalaj-
thanaki/

Applied Natural Language Processing with Python:

Implementing Machine Learning and Deep Learning
Algorithms for Natural Language Processing 1st Edition
Taweh Beysolow Ii
https://textbookfull.com/product/applied-natural-language-
processing-with-python-implementing-machine-learning-and-deep-
learning-algorithms-for-natural-language-processing-1st-edition-
Deep Learning for Natural Language Processing (MEAP
V07) Stephan Raaijmakers

https://textbookfull.com/product/deep-learning-for-natural-
language-processing-meap-v07-stephan-raaijmakers/

Natural language processing with TensorFlow Teach

language to machines using Python s deep learning
library 1st Edition Thushan Ganegedara

https://textbookfull.com/product/natural-language-processing-
with-tensorflow-teach-language-to-machines-using-python-s-deep-
learning-library-1st-edition-thushan-ganegedara/

Natural Language Processing 1st Edition Jacob

Eisenstein

https://textbookfull.com/product/natural-language-processing-1st-
edition-jacob-eisenstein/

Machine Learning with PySpark: With Natural Language

Processing and Recommender Systems 1st Edition Pramod
Singh

https://textbookfull.com/product/machine-learning-with-pyspark-
with-natural-language-processing-and-recommender-systems-1st-
edition-pramod-singh/

Natural Language Processing in Artificial Intelligence

1st Edition Brojo Kishore Mishra

https://textbookfull.com/product/natural-language-processing-in-
artificial-intelligence-1st-edition-brojo-kishore-mishra/
Li Deng · Yang Liu Editors

Deep Learning
in Natural
Language
Processing
Deep Learning in Natural Language Processing
Li Deng Yang Liu
•

Editors

Deep Learning in Natural

Language Processing

123
Editors
Li Deng Yang Liu
AI Research at Citadel Tsinghua University
Chicago, IL Beijing
USA China

and

AI Research at Citadel
Seattle, WA
USA

ISBN 978-981-10-5208-8 ISBN 978-981-10-5209-5 (eBook)

https://doi.org/10.1007/978-981-10-5209-5
Library of Congress Control Number: 2018934459

© Springer Nature Singapore Pte Ltd. 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
part of Springer Nature
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword

“Written by a group of the most active researchers in the ﬁeld, led by Dr. Deng, an
internationally respected expert in both NLP and deep learning, this book provides
a comprehensive introduction to and up-to-date review of the state of art in applying
deep learning to solve fundamental problems in NLP. Further, the book is highly
timely, as demands for high-quality and up-to-date textbooks and research refer-
ences have risen dramatically in response to the tremendous strides in deep learning
applications to NLP. The book offers a unique reference guide for practitioners in
various sectors, especially the Internet and AI start-ups, where NLP technologies
are becoming an essential enabler and a core differentiator.”
Hongjiang Zhang (Founder, Sourcecode Capital; former CEO of KingSoft)
“This book provides a comprehensive introduction to the latest advances in deep
learning applied to NLP. Written by experienced and aspiring deep learning and
NLP researchers, it covers a broad range of major NLP applications, including
spoken language understanding, dialog systems, lexical analysis, parsing, knowl-
edge graph, machine translation, question answering, sentiment analysis, and social
computing.
The book is clearly structured and moves from major research trends, to the
latest deep learning approaches, to their limitations and promising future work.
Given its self-contained content, sophisticated algorithms, and detailed use cases,
the book offers a valuable guide for all readers who are working on or learning
about deep learning and NLP.”
Haifeng Wang (Vice President and Head of Research, Baidu; former President
of ACL)
“In 2011, at the dawn of deep learning in industry, I estimated that in most speech
recognition applications, computers still made 5 to 10 times more errors than human
subjects, and highlighted the importance of knowledge engineering in future
directions. Within only a handful of years since, deep learning has nearly closed the
gap in the accuracy of conversational speech recognition between human and
computers. Edited and written by Dr. Li Deng—a pioneer in the recent speech

v
vi Foreword

recognition revolution using deep learning—and his colleagues, this book elegantly
describes this part of the fascinating history of speech recognition as an important
subﬁeld of natural language processing (NLP). Further, the book expands this
historical perspective from speech recognition to more general areas of NLP,
offering a truly valuable guide for the future development of NLP.
Importantly, the book puts forward a thesis that the current deep learning trend is
a revolution from the previous data-driven (shallow) machine learning era, although
ostensibly deep learning appears to be merely exploiting more data, more com-
puting power, and more complex models. Indeed, as the book correctly points out,
the current state of the art of deep learning technology developed for NLP appli-
cations, despite being highly successful in solving individual NLP tasks, has not
taken full advantage of rich world knowledge or human cognitive capabilities.
Therefore, I fully embrace the view expressed by the book’s editors and authors that
more advanced deep learning that seamlessly integrates knowledge engineering will
pave the way for the next revolution in NLP.
I highly recommend speech and NLP researchers, engineers, and students to read
this outstanding and timely book, not only to learn about the state of the art in NLP
and deep learning, but also to gain vital insights into what the future of the NLP
ﬁeld will hold.”
Sadaoki Furui (President, Toyota Technological Institute at Chicago)
Preface

Natural language processing (NLP), which aims to enable computers to process

human languages intelligently, is an important interdisciplinary field crossing
artificial intelligence, computing science, cognitive science, information processing,
and linguistics. Concerned with interactions between computers and human lan-
guages, NLP applications such as speech recognition, dialog systems, information
retrieval, question answering, and machine translation have started to reshape the
way people identify, obtain, and make use of information.
The development of NLP can be described in terms of three major waves:
rationalism, empiricism, and deep learning. In the first wave, rationalist approaches
advocated the design of handcrafted rules to incorporate knowledge into NLP
systems based on the assumption that knowledge of language in the human mind is
fixed in advance by generic inheritance. In the second wave, empirical approaches
assume that rich sensory input and the observable language data in surface form are
required and sufficient to enable the mind to learn the detailed structure of natural
language. As a result, probabilistic models were developed to discover the regu-
larities of languages from large corpora. In the third wave, deep learning exploits
hierarchical models of nonlinear processing, inspired by biological neural systems
to learn intrinsic representations from language data, in ways that aim to simulate
human cognitive abilities.
The intersection of deep learning and natural language processing has resulted in
striking successes in practical tasks. Speech recognition is the first industrial NLP
application that deep learning has strongly impacted. With the availability of
large-scale training data, deep neural networks achieved dramatically lower
recognition errors than the traditional empirical approaches. Another prominent
successful application of deep learning in NLP is machine translation. End-to-end
neural machine translation that models the mapping between human languages
using neural networks has proven to improve translation quality substantially.
Therefore, neural machine translation has quickly become the new de facto tech-
nology in major commercial online translation services offered by large technology
companies: Google, Microsoft, Facebook, Baidu, and more. Many other areas of
NLP, including language understanding and dialog, lexical analysis and parsing,

vii
viii Preface

knowledge graph, information retrieval, question answering from text, social

computing, language generation, and text sentiment analysis, have also seen much
significant progress using deep learning, riding on the third wave of
NLP. Nowadays, deep learning is a dominating method applied to practically all
NLP tasks.
The main goal of this book is to provide a comprehensive survey on the recent
advances in deep learning applied to NLP. The book presents state of the art of
NLP-centric deep learning research, and focuses on the role of deep learning played
in major NLP applications including spoken language understanding, dialog sys-
tems, lexical analysis, parsing, knowledge graph, machine translation, question
answering, sentiment analysis, social computing, and natural language generation
(from images). This book is suitable for readers with a technical background in
computation, including graduate students, post-doctoral researchers, educators, and
industrial researchers and anyone interested in getting up to speed with the latest
techniques of deep learning associated with NLP.
The book is organized into eleven chapters as follows:
• Chapter 1: A Joint Introduction to Natural Language Processing and to Deep
Learning (Li Deng and Yang Liu)
• Chapter 2: Deep Learning in Conversational Language Understanding (Gokhan
Tur, Asli Celikyilmaz, Xiaodong He, Dilek Hakkani-Tür, and Li Deng)
• Chapter 3: Deep Learning in Spoken and Text-Based Dialog Systems
(Asli Celikyilmaz, Li Deng, and Dilek Hakkani-Tür)
• Chapter 4: Deep Learning in Lexical Analysis and Parsing (Wanxiang Che and
Yue Zhang)
• Chapter 5: Deep Learning in Knowledge Graph (Zhiyuan Liu and Xianpei Han)
• Chapter 6: Deep Learning in Machine Translation (Yang Liu and Jiajun Zhang)
• Chapter 7: Deep Learning in Question Answering (Kang Liu and Yansong Feng)
• Chapter 8: Deep Learning in Sentiment Analysis (Duyu Tang and Meishan
Zhang)
• Chapter 9: Deep Learning in Social Computing (Xin Zhao and Chenliang Li)
• Chapter 10: Deep Learning in Natural Language Generation from Images
(Xiaodong He and Li Deng)
• Chapter 11: Epilogue (Li Deng and Yang Liu)
Chapter 1 first reviews the basics of NLP as well as the main scope of NLP
covered in the following chapters of the book, and then goes in some depth into the
historical development of NLP summarized as three waves and future directions.
Subsequently, in Chaps. 2–10, an in-depth survey on the recent advances in deep
learning applied to NLP is organized into nine separate chapters, each covering a
largely independent application area of NLP. The main body of each chapter is
written by leading researchers and experts actively working in the respective field.
The origin of this book was the set of comprehensive tutorials given at the 15th
China National Conference on Computational Linguistics (CCL 2016) held in
October 2016 in Yantai, Shandong, China, where both of us, editors of this book,
Preface ix

were active participants and were taking leading roles. We thank our Springer’s
senior editor, Dr. Celine Lanlan Chang, who kindly invited us to create this book
and who has been providing much of timely assistance needed to complete this
book. We are grateful also to Springer’s Assistant Editor, Jane Li, for offering
invaluable help through various stages of manuscript preparation.
We thank all authors of Chaps. 2–10 who devoted their valuable time carefully
preparing the content of their chapters: Gokhan Tur, Asli Celikyilmaz, Dilek
Hakkani-Tur, Wanxiang Che, Yue Zhang, Xianpei Han, Zhiyuan Liu, Jiajun Zhang,
Kang Liu, Yansong Feng, Duyu Tang, Meishan Zhang, Xin Zhao, Chenliang Li,
and Xiaodong He. The authors of Chaps. 4–9 are CCL 2016 tutorial speakers. They
spent a considerable amount of time in updating their tutorial material with the
latest advances in the ﬁeld since October 2016.
Further, we thank numerous reviewers and readers, Sadaoki Furui, Andrew Ng,
Fred Juang, Ken Church, Haifeng Wang, and Hongjiang Zhang, who not only gave
us much needed encouragements but also offered many constructive comments
which substantially improved earlier drafts of the book.
Finally, we give our appreciations to our organizations, Microsoft Research and
Citadel (for Li Deng) and Tsinghua University (for Yang Liu), who provided
excellent environments, supports, and encouragements that have been instrumental
for us to complete this book. Yang Liu is also supported by National Natural
Science Foundation of China (No.61522204, No.61432013, and No.61331013).

Seattle, USA Li Deng

Beijing, China Yang Liu
October 2017
Contents

1 A Joint Introduction to Natural Language Processing and to

Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Li Deng and Yang Liu
2 Deep Learning in Conversational Language Understanding . . . . . . 23
Gokhan Tur, Asli Celikyilmaz, Xiaodong He, Dilek Hakkani-Tür
and Li Deng
3 Deep Learning in Spoken and Text-Based Dialog Systems . . . . . . 49
Asli Celikyilmaz, Li Deng and Dilek Hakkani-Tür
4 Deep Learning in Lexical Analysis and Parsing . . . . . . . . . . . . . . . 79
Wanxiang Che and Yue Zhang
5 Deep Learning in Knowledge Graph . . . . . . . . . . . . . . . . . . . . . . 117
Zhiyuan Liu and Xianpei Han
6 Deep Learning in Machine Translation . . . . . . . . . . . . . . . . . . . . 147
Yang Liu and Jiajun Zhang
7 Deep Learning in Question Answering . . . . . . . . . . . . . . . . . . . . . 185
Kang Liu and Yansong Feng
8 Deep Learning in Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 219
Duyu Tang and Meishan Zhang
9 Deep Learning in Social Computing . . . . . . . . . . . . . . . . . . . . . . . . 255
Xin Zhao and Chenliang Li
10 Deep Learning in Natural Language Generation from Images . . . . 289
Xiaodong He and Li Deng
11 Epilogue: Frontiers of NLP in the Deep Learning Era . . . . . . . . . . 309
Li Deng and Yang Liu
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

xi
Contributors

Asli Celikyilmaz Microsoft Research, Redmond, WA, USA

Wanxiang Che Harbin Institute of Technology, Harbin, China
Li Deng Citadel, Seattle & Chicago, USA
Yansong Feng Peking University, Beijing, China
Dilek Hakkani-Tür Google, Mountain View, CA, USA
Xianpei Han Institute of Software, Chinese Academy of Sciences, Beijing, China
Xiaodong He Microsoft Research, Redmond, WA, USA
Chenliang Li Wuhan University, Wuhan, China
Kang Liu Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yang Liu Tsinghua University, Beijing, China
Zhiyuan Liu Tsinghua University, Beijing, China
Duyu Tang Microsoft Research Asia, Beijing, China
Gokhan Tur Google, Mountain View, CA, USA
Jiajun Zhang Institute of Automation, Chinese Academy of Sciences, Beijing,
China
Meishan Zhang Heilongjiang University, Harbin, China
Yue Zhang Singapore University of Technology and Design, Singapore
Xin Zhao Renmin University of China, Beijing, China

xiii
Acronyms

AI Artiﬁcial intelligence
AP Averaged perceptron
ASR Automatic speech recognition
ATN Augmented transition network
BiLSTM Bidirectional long short-term memory
BiRNN Bidirectional recurrent neural network
BLEU Bilingual evaluation understudy
BOW Bag-of-words
CBOW Continuous bag-of-words
CCA Canonical correlation analysis
CCG Combinatory categorial grammar
CDL Collaborative deep learning
CFG Context free grammar
CYK Cocke–Younger–Kasami
CLU Conversational language understanding
CNN Convolutional neural network
CNNSM Convolutional neural network based semantic model
cQA Community question answering
CRF Conditional random ﬁeld
CTR Collaborative topic regression
CVT Compound value typed
DA Denoising autoencoder
DBN Deep belief network
DCN Deep convex net
DNN Deep neural network
DSSM Deep structured semantic model
DST Dialog state tracking
EL Entity linking
EM Expectation maximization
FSM Finite state machine

xv
xvi Acronyms

ROUGE Recall-oriented understudy for gisting evaluation

RUBER Referenced metric and unreferenced metric blended evaluation routine
SDS Spoken dialog system
SLU Spoken language understanding
SMT Statistical machine translation
SP Semantic parsing
SRL Semantic role labeling
SRNN Segmental recurrent neural network
STAGG Staged query graph generation
SVM Support vector machine
UAS Unlabeled attachment score
UGC User-generated content
VIME Variational information maximizing exploration
VPA Virtual personal assistant
Chapter 1
A Joint Introduction to Natural
Language Processing and to Deep
Learning

Li Deng and Yang Liu

Abstract In this chapter, we set up the fundamental framework for the book. We
first provide an introduction to the basics of natural language processing (NLP) as an
integral part of artificial intelligence. We then survey the historical development of
NLP, spanning over five decades, in terms of three waves. The first two waves arose
as rationalism and empiricism, paving ways to the current deep learning wave. The
key pillars underlying the deep learning revolution for NLP consist of (1) distributed
representations of linguistic entities via embedding, (2) semantic generalization due
to the embedding, (3) long-span deep sequence modeling of natural language, (4)
hierarchical networks effective for representing linguistic levels from low to high,
and (5) end-to-end deep learning methods to jointly solve many NLP tasks. After
the survey, several key limitations of current deep learning technology for NLP are
analyzed. This analysis leads to five research directions for future advances in NLP.

1.1 Natural Language Processing: The Basics

Natural language processing (NLP) investigates the use of computers to process or to

understand human (i.e., natural) languages for the purpose of performing useful tasks.
NLP is an interdisciplinary field that combines computational linguistics, computing
science, cognitive science, and artificial intelligence. From a scientific perspective,
NLP aims to model the cognitive mechanisms underlying the understanding and pro-
duction of human languages. From an engineering perspective, NLP is concerned
with how to develop novel practical applications to facilitate the interactions between
computers and human languages. Typical applications in NLP include speech recog-
nition, spoken language understanding, dialogue systems, lexical analysis, parsing,
machine translation, knowledge graph, information retrieval, question answering,

L. Deng (B)
Citadel, Seattle & Chicago, USA
e-mail: l.deng@ieee.org
Y. Liu
Tsinghua University, Beijing, China
e-mail: liuyang2011@tsinghua.edu.cn

© Springer Nature Singapore Pte Ltd. 2018 1

L. Deng and Y. Liu (eds.), Deep Learning in Natural
Language Processing, https://doi.org/10.1007/978-981-10-5209-5_1
2 L. Deng and Y. Liu

sentiment analysis, social computing, natural language generation, and natural lan-
guage summarization. These NLP application areas form the core content of this
book.
Natural language is a system constructed specifically to convey meaning or seman-
tics, and is by its fundamental nature a symbolic or discrete system. The surface or
observable “physical” signal of natural language is called text, always in a sym-
bolic form. The text “signal” has its counterpart—the speech signal; the latter can
be regarded as the continuous correspondence of symbolic text, both entailing the
same latent linguistic hierarchy of natural language. From NLP and signal processing
perspectives, speech can be treated as “noisy” versions of text, imposing additional
difficulties in its need of “de-noising” when performing the task of understanding the
common underlying semantics. Chapters 2 and 3 as well as current Chap. 1 of this
book cover the speech aspect of NLP in detail, while the remaining chapters start
directly from text in discussing a wide variety of text-oriented tasks that exemplify
the pervasive NLP applications enabled by machine learning techniques, notably
deep learning.
The symbolic nature of natural language is in stark contrast to the continuous
nature of language’s neural substrate in the human brain. We will defer this discussion
to Sect. 1.6 of this chapter when discussing future challenges of deep learning in NLP.
A related contrast is how the symbols of natural language are encoded in several
continuous-valued modalities, such as gesture (as in sign language), handwriting
(as an image), and, of course, speech. On the one hand, the word as a symbol is
used as a “signifier” to refer to a concept or a thing in real world as a “signified”
object, necessarily a categorical entity. On the other hand, the continuous modalities
that encode symbols of words constitute the external signals sensed by the human
perceptual system and transmitted to the brain, which in turn operates in a continuous
fashion. While of great theoretical interest, the subject of contrasting the symbolic
nature of language versus its continuous rendering and encoding goes beyond the
scope of this book.
In the next few sections, we outline and discuss, from a historical perspective, the
development of general methodology used to study NLP as a rich interdisciplinary
field. Much like several closely related sub- and super-fields such as conversational
systems, speech recognition, and artificial intelligence, the development of NLP can
be described in terms of three major waves (Deng 2017; Pereira 2017), each of which
is elaborated in a separate section next.

1.2 The First Wave: Rationalism

NLP research in its first wave lasted for a long time, dating back to 1950s. In 1950,
Alan Turing proposed the Turing test to evaluate a computer’s ability to exhibit intelli-
gent behavior indistinguishable from that of a human (Turing 1950). This test is based
on natural language conversations between a human and a computer designed to gen-
erate human-like responses. In 1954, the Georgetown-IBM experiment demonstrated
1 A Joint Introduction to Natural Language Processing and to Deep Learning 3

the first machine translation system capable of translating more than 60 Russian sen-
tences into English.
The approaches, based on the belief that knowledge of language in the human
mind is fixed in advance by generic inheritance, dominated most of NLP research
between about 1960 and late 1980s. These approaches have been called rationalist
ones (Church 2007). The dominance of rationalist approaches in NLP was mainly
due to the widespread acceptance of arguments of Noam Chomsky for an innate
language structure and his criticism of N-grams (Chomsky 1957). Postulating that
key parts of language are hardwired in the brain at birth as a part of the human
genetic inheritance, rationalist approaches endeavored to design hand-crafted rules
to incorporate knowledge and reasoning mechanisms into intelligent NLP systems.
Up until 1980s, most notably successful NLP systems, such as ELIZA for simulating
a Rogerian psychotherapist and MARGIE for structuring real-world information into
concept ontologies, were based on complex sets of handwritten rules.
This period coincided approximately with the early development of artificial
intelligence, characterized by expert knowledge engineering, where domain experts
devised computer programs according to the knowledge about the (very narrow)
application domains they have (Nilsson 1982; Winston 1993). The experts designed
these programs using symbolic logical rules based on careful representations and
engineering of such knowledge. These knowledge-based artificial intelligence sys-
tems tend to be effective in solving narrow-domain problems by examining the
“head” or most important parameters and reaching a solution about the appropriate
action to take in each specific situation. These “head” parameters are identified in
advance by human experts, leaving the “tail” parameters and cases untouched. Since
they lack learning capability, they have difficulty in generalizing the solutions to new
situations and domains. The typical approach during this period is exemplified by
the expert system, a computer system that emulates the decision-making ability of a
human expert. Such systems are designed to solve complex problems by reasoning
about knowledge (Nilsson 1982). The first expert system was created in 1970s and
then proliferated in 1980s. The main “algorithm” used was the inference rules in the
form of “if-then-else” (Jackson 1998). The main strength of these first-generation
artificial intelligence systems is its transparency and interpretability in their (limited)
capability in performing logical reasoning. Like NLP systems such as ELIZA and
MARGIE, the general expert systems in the early days used hand-crafted expert
knowledge which was often effective in narrowly defined problems, although the
reasoning could not handle uncertainty that is ubiquitous in practical applications.
In specific NLP application areas of dialogue systems and spoken language under-
standing, to be described in more detail in Chaps. 2 and 3 of this book, such ratio-
nalistic approaches were represented by the pervasive use of symbolic rules and
templates (Seneff et al. 1991). The designs were centered on grammatical and onto-
logical constructs, which, while interpretable and easy to debug and update, had
experienced severe difficulties in practical deployment. When such systems worked,
they often worked beautifully; but unfortunately this happened just not very often
and the domains were necessarily limited.
4 L. Deng and Y. Liu

Likewise, speech recognition research and system design, another long-standing

NLP and artificial intelligence challenge, during this rationalist era were based
heavily on the paradigm of expert knowledge engineering, as elegantly analyzed
in (Church and Mercer 1993). During 1970s and early 1980s, the expert system
approach to speech recognition was quite popular (Reddy 1976; Zue 1985). How-
ever, the lack of abilities to learn from data and to handle uncertainty in reasoning was
acutely recognized by researchers, leading to the second wave of speech recognition,
NLP, and artificial intelligence described next.

1.3 The Second Wave: Empiricism

The second wave of NLP was characterized by the exploitation of data corpora and
of (shallow) machine learning, statistical or otherwise, to make use of such data
(Manning and Schtze 1999). As much of the structure of and theory about natural
language were discounted or discarded in favor of data-driven methods, the main
approaches developed during this era have been called empirical or pragmatic ones
(Church and Mercer 1993; Church 2014). With the increasing availability of machine-
readable data and steady increase of computational power, empirical approaches have
dominated NLP since around 1990. One of the major NLP conferences was even
named “Empirical Methods in Natural Language Processing (EMNLP)” to reflect
most directly the strongly positive sentiment of NLP researchers during that era
toward empirical approaches.
In contrast to rationalist approaches, empirical approaches assume that the human
mind only begins with general operations for association, pattern recognition, and
generalization. Rich sensory input is required to enable the mind to learn the detailed
structure of natural language. Prevalent in linguistics between 1920 and 1960, empiri-
cism has been undergoing a resurgence since 1990. Early empirical approaches to
NLP focused on developing generative models such as the hidden Markov model
(HMM) (Baum and Petrie 1966), the IBM translation models (Brown et al. 1993),
and the head-driven parsing models (Collins 1997) to discover the regularities of
languages from large corpora. Since late 1990s, discriminative models have become
the de facto approach in a variety of NLP tasks. Representative discriminative mod-
els and methods in NLP include the maximum entropy model (Ratnaparkhi 1997),
supporting vector machines (Vapnik 1998), conditional random fields (Lafferty et al.
2001), maximum mutual information and minimum classification error (He et al.
2008), and perceptron (Collins 2002).
Again, this era of empiricism in NLP was paralleled with corresponding approaches
in artificial intelligence as well as in speech recognition and computer vision. It came
about after clear evidence that learning and perception capabilities are crucial for
complex artificial intelligence systems but missing in the expert systems popular in
the previous wave. For example, when DARPA opened its first Grand Challenge for
autonomous driving, most vehicles then relied on the knowledge-based artificial intel-
ligence paradigm. Much like speech recognition and NLP, the autonomous driving and
1 A Joint Introduction to Natural Language Processing and to Deep Learning 5

computer vision researchers immediately realized the limitation of the knowledge-

based paradigm due to the necessity for machine learning with uncertainty handling
and generalization capabilities.
The empiricism in NLP and speech recognition in this second wave was based
on data-intensive machine learning, which we now call “shallow” due to the general
lack of abstractions constructed by many-layer or “deep” representations of data
which would come in the third wave to be described in the next section. In machine
learning, researchers do not need to concern with constructing precise and exact rules
as required for the knowledge-based NLP and speech systems during the first wave.
Rather, they focus on statistical models (Bishop 2006; Murphy 2012) or simple neural
networks (Bishop 1995) as an underlying engine. They then automatically learn or
“tune” the parameters of the engine using ample training data to make them handle
uncertainty, and to attempt to generalize from one condition to another and from one
domain to another. The key algorithms and methods for machine learning include EM
(expectation-maximization), Bayesian networks, support vector machines, decision
trees, and, for neural networks, backpropagation algorithm.
Generally speaking, the machine learning based NLP, speech, and other artificial
intelligence systems perform much better than the earlier, knowledge-based counter-
parts. Successful examples include almost all artificial intelligence tasks in machine
perception—speech recognition (Jelinek 1998), face recognition (Viola and Jones
2004), visual object recognition (Fei-Fei and Perona 2005), handwriting recognition
(Plamondon and Srihari 2000), and machine translation (Och 2003).
More specifically, in a core NLP application area of machine translation, as to be
described in detail in Chap. 6 of this book as well as in (Church and Mercer 1993), the
field has switched rather abruptly around 1990 from rationalistic methods outlined in
Sect. 1.2 to empirical, largely statistical methods. The availability of sentence-level
alignments in the bilingual training data made it possible to acquire surface-level
translation knowledge not by rules but from data directly, at the expense of discarding
or discounting structured information in natural languages. The most representative
work during this wave is that empowered by various versions of IBM translation
models (Brown et al. 1993). Subsequent developments during this empiricist era of
machine translation further significantly improved the quality of translation systems
(Och and Ney 2002; Och 2003; Chiang 2007; He and Deng 2012), but not at the
level of massive deployment in real world (which would come after the next, deep
learning wave).
In the dialogue and spoken language understanding areas of NLP, this empiri-
cist era was also marked prominently by data-driven machine learning approaches.
These approaches were well suited to meet the requirement for quantitative evalua-
tion and concrete deliverables. They focused on broader but shallow, surface-level
coverage of text and domains instead of detailed analyses of highly restricted text
and domains. The training data were used not to design rules for language under-
standing and response action from the dialogue systems but to learn parameters of
(shallow) statistical or neural models automatically from data. Such learning helped
reduce the cost of hand-crafted complex dialogue manager’s design, and helped
improve robustness against speech recognition errors in the overall spoken language
6 L. Deng and Y. Liu

understanding and dialogue systems; for a review, see He and Deng (2013). More
specifically, for the dialogue policy component of dialogue systems, powerful rein-
forcement learning based on Markov decision processes had been introduced during
this era; for a review, see Young et al. (2013). And for spoken language understand-
ing, the dominant methods moved from rule- or template-based ones during the first
wave to generative models like hidden Markov models (HMMs) (Wang et al. 2011)
to discriminative models like conditional random fields (Tur and Deng 2011).
Similarly, in speech recognition, over close to 30 years from early 1980 s to around
2010, the field was dominated by the (shallow) machine learning paradigm using the
statistical generative model based on the HMM integrated with Gaussian mixture
models, along with various versions of its generalization (Baker et al. 2009a, b;
Deng and O’Shaughnessy 2003; Rabiner and Juang 1993). Among many versions of
the generalized HMMs were statistical and neural-network-based hidden dynamic
models (Deng 1998; Bridle et al. 1998; Deng and Yu 2007). The former adopted EM
and switching extended Kalman filter algorithms for learning model parameters (Ma
and Deng 2004; Lee et al. 2004), and the latter used backpropagation (Picone et al.
1999). Both of them made extensive use of multiple latent layers of representations for
the generative process of speech waveforms following the long-standing framework
of analysis-by-synthesis in human speech perception. More significantly, inverting
this “deep” generative process to its counterpart of an end-to-end discriminative
process gave rise to the first industrial success of deep learning (Deng et al. 2010,
2013; Hinton et al. 2012), which formed a driving force of the third wave of speech
recognition and NLP that will be elaborated next.

1.4 The Third Wave: Deep Learning

While the NLP systems, including speech recognition, language understanding, and
machine translation, developed during the second wave performed a lot better and
with higher robustness than those during the first wave, they were far from human-
level performance and left much to desire. With a few exceptions, the (shallow)
machine learning models for NLP often did not have the capacity sufficiently large to
absorb the large amounts of training data. Further, the learning algorithms, methods,
and infrastructures were not powerful enough. All this changed several years ago,
giving rise to the third wave of NLP, propelled by the new paradigm of deep-structured
machine learning or deep learning (Bengio 2009; Deng and Yu 2014; LeCun et al.
2015; Goodfellow et al. 2016).
In traditional machine learning, features are designed by humans and feature
engineering is a bottleneck, requiring significant human expertise. Concurrently,
the associated shallow models lack the representation power and hence the ability
to form levels of decomposable abstractions that would automatically disentangle
complex factors in shaping the observed language data. Deep learning breaks away
the above difficulties by the use of deep, layered model structure, often in the form of
neural networks, and the associated end-to-end learning algorithms. The advances in
1 A Joint Introduction to Natural Language Processing and to Deep Learning 7

deep learning are one major driving force behind the current NLP and more general
artificial intelligence inflection point and are responsible for the resurgence of neural
networks with a wide range of practical, including business, applications (Parloff
2016).
More specifically, despite the success of (shallow) discriminative models in a
number of important NLP tasks developed during the second wave, they suffered from
the difficulty of covering all regularities in languages by designing features manually
with domain expertise. Besides the incompleteness problem, such shallow models
also face the sparsity problem as features usually only occur once in the training
data, especially for highly sparse high-order features. Therefore, feature design has
become one of the major obstacles in statistical NLP before deep learning comes
to rescue. Deep learning brings hope for addressing the human feature engineering
problem, with a view called “NLP from scratch” (Collobert et al. 2011), which was
in early days of deep learning considered highly unconventional. Such deep learning
approaches exploit the powerful neural networks that contain multiple hidden layers
to solve general machine learning tasks dispensing with feature engineering. Unlike
shallow neural networks and related machine learning models, deep neural networks
are capable of learning representations from data using a cascade of multiple layers of
nonlinear processing units for feature extraction. As higher level features are derived
from lower level features, these levels form a hierarchy of concepts.
Deep learning originated from artificial neural networks, which can be viewed as
cascading models of cell types inspired by biological neural systems. With the advent
of backpropagation algorithm (Rumelhart et al. 1986), training deep neural networks
from scratch attracted intensive attention in 1990s. In these early days, without large
amounts of training data and without proper design and learning methods, during
neural network training the learning signals vanish exponentially with the number
of layers (or more rigorously the depth of credit assignment) when propagated from
layer to layer, making it difficult to tune connection weights of deep neural networks,
especially the recurrent versions. Hinton et al. (2006) initially overcame this problem
by using unsupervised pretraining to first learn generally useful feature detectors.
Then, the network is further trained by supervised learning to classify labeled data.
As a result, it is possible to learn the distribution of a high-level representation using
low-level representations. This seminal work marks the revival of neural networks. A
variety of network architectures have since been proposed and developed, including
deep belief networks (Hinton et al. 2006), stacked auto-encoders (Vincent et al. 2010),
deep Boltzmann machines (Hinton and Salakhutdinov 2012), deep convolutional
neural works (Krizhevsky et al. 2012), deep stacking networks (Deng et al. 2012),
and deep Q-networks (Mnih et al. 2015). Capable of discovering intricate structures
in high-dimensional data, deep learning has since 2010 been successfully applied to
real-world tasks in artificial intelligence including notably speech recognition (Yu
et al. 2010; Hinton et al. 2012), image classification (Krizhevsky et al. 2012; He et al.
2016), and NLP (all chapters in this book). Detailed analyses and reviews of deep
learning have been provided in a set of tutorial survey articles (Deng 2014; LeCun
et al. 2015; Juang 2016).
8 L. Deng and Y. Liu

As speech recognition is one of core tasks in NLP, we briefly discuss it here due to
its importance as the first industrial NLP application in real world impacted strongly
by deep learning. Industrial applications of deep learning to large-scale speech recog-
nition started to take off around 2010. The endeavor was initiated with a collaboration
between academia and industry, with the original work presented at the 2009 NIPS
Workshop on Deep Learning for Speech Recognition and Related Applications. The
workshop was motivated by the limitations of deep generative models of speech, and
the possibility that the big-compute, big-data era warrants a serious exploration of
deep neural networks. It was believed then that pretraining DNNs using generative
models of deep belief nets based on the contrastive divergence learning algorithm
would overcome the main difficulties of neural nets encountered in the 1990s (Dahl
et al. 2011; Mohamed et al. 2009). However, early into this research at Microsoft, it
was discovered that without contrastive divergence pretraining, but with the use of
large amounts of training data together with the deep neural networks designed with
corresponding large, context-dependent output layers and with careful engineering,
dramatically lower recognition errors could be obtained than then-state-of-the-art
(shallow) machine learning systems (Yu et al. 2010, 2011; Dahl et al. 2012). This
finding was quickly verified by several other major speech recognition research
groups in North America (Hinton et al. 2012; Deng et al. 2013) and subsequently
overseas. Further, the nature of recognition errors produced by the two types of sys-
tems was found to be characteristically different, offering technical insights into how
to integrate deep learning into the existing highly efficient, run-time speech decod-
ing system deployed by major players in speech recognition industry (Yu and Deng
2015; Abdel-Hamid et al. 2014; Xiong et al. 2016; Saon et al. 2017). Nowadays,
backpropagation algorithm applied to deep neural nets of various forms is uniformly
used in all current state-of-the-art speech recognition systems (Yu and Deng 2015;
Amodei et al. 2016; Saon et al. 2017), and all major commercial speech recogni-
tion systems—Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google
Assistant, Apple Siri, Baidu and iFlyTek voice search, and more—are all based on
deep learning methods.
The striking success of speech recognition in 2010–2011 heralded the arrival of
the third wave of NLP and artificial intelligence. Quickly following the success of
deep learning in speech recognition, computer vision (Krizhevsky et al. 2012) and
machine translation (Bahdanau et al. 2015) were taken over by the similar deep
learning paradigm. In particular, while the powerful technique of neural embedding
of words was developed in as early as 2011 (Bengio et al. 2001), it is not until more
than 10 year later it was shown to be practically useful at a large and practically useful
scale (Mikolov et al. 2013) due to the availability of big data and faster computation.
In addition, a large number of other real-world NLP applications, such as image
captioning (Karpathy and Fei-Fei 2015; Fang et al. 2015; Gan et al. 2017), visual
question answering (Fei-Fei and Perona 2016), speech understanding (Mesnil et al.
2013), web search (Huang et al. 2013b), and recommendation systems, have been
made successful due to deep learning, in addition to many non-NLP tasks including
drug discovery and toxicology, customer relationship management, recommendation
systems, gesture recognition, medical informatics, advertisement, medical image
1 A Joint Introduction to Natural Language Processing and to Deep Learning 9

analysis, robotics, self-driving vehicles, board and eSports games (e.g., Atari, Go,
Poker, and the latest, DOTA2), and so on. For more details, see https://en.wikipedia.
org/wiki/deep_learning.
In more specific text-based NLP application areas, machine translation is perhaps
impacted the most by deep learning. Advancing from the shallow statistical machine
translation developed during the second wave of NLP, the current best machine
translation systems in real-world applications are based on deep neural networks. For
example, Google announced the first stage of its move to neural machine translation
in September 2016 and Microsoft made a similar announcement 2 months later.
Facebook has been working on the conversion to neural machine translation for
about a year, and by August 2017 it is at full deployment. Details of the deep learning
techniques in these state-of-the-art large-scale machine translation systems will be
reviewed in Chap. 6.
In the area of spoken language understanding and dialogue systems, deep learning
is also making a huge impact. The current popular techniques maintain and expand
the statistical methods developed during second-wave era in several ways. Like the
empirical, (shallow) machine learning methods, deep learning is also based on data-
intensive methods to reduce the cost of hand-crafted complex understanding and
dialogue management, to be robust against speech recognition errors under noise
environments and against language understanding errors, and to exploit the power
of Markov decision processes and reinforcement learning for designing dialogue
policy, e.g., (Gasic et al. 2017; Dhingra et al. 2017). Compared with the earlier
methods, deep neural network models and representations are much more powerful
and they make end-to-end learning possible. However, deep learning has not yet
solved the problems of interpretability and domain scalability associated with earlier
empirical techniques. Details of the deep learning techniques popular for current
spoken language understanding and dialogue systems as well as their challenges
will be reviewed in Chaps. 2 and 3.
Two important recent technological breakthroughs brought about in applying deep
learning to NLP problems are sequence-to-sequence learning (Sutskevar et al. 2014)
and attention modeling (Bahdanau et al. 2015). The sequence-to-sequence learning
introduces a powerful idea of using recurrent nets to carry out both encoding and
decoding in an end-to-end manner. While attention modeling was initially developed
to overcome the difficulty of encoding a long sequence, subsequent developments
significantly extended its power to provide highly flexible alignment of two arbitrary
sequences that can be learned together with neural network parameters. The key
concepts of sequence-to-sequence learning and of attention mechanism boosted the
performance of neural machine translation based on distributed word embedding over
the best system based on statistical learning and local representations of words and
phrases. Soon after this success, these concepts have also been applied successfully
to a number of other NLP-related tasks such as image captioning (Karpathy and
Fei-Fei 2015; Devlin et al. 2015), speech recognition (Chorowski et al. 2015), meta-
learning for program execution, one-shot learning, syntactic parsing, lip reading, text
understanding, summarization, and question answering and more.
10 L. Deng and Y. Liu

Setting aside their huge empirical successes, models of neural-network-based

deep learning are often simpler and easier to design than the traditional machine
learning models developed in the earlier wave. In many applications, deep learning
is performed simultaneously for all parts of the model, from feature extraction all
the way to prediction, in an end-to-end manner. Another factor contributing to the
simplicity of neural network models is that the same model building blocks (i.e., the
different types of layers) are generally used in many different applications. Using
the same building blocks for a large variety of tasks makes the adaptation of models
used for one task or data to another task or data relatively easy. In addition, software
toolkits have been developed to allow faster and more efficient implementation of
these models. For these reasons, deep neural networks are nowadays a prominent
method of choice for a large variety of machine learning and artificial intelligence
tasks over large datasets including, prominently, NLP tasks.
Although deep learning has proven effective in reshaping the processing of speech,
images, and videos in a revolutionary way, the effectiveness is less clear-cut in inter-
secting deep learning with text-based NLP despite its empirical successes in a number
of practical NLP tasks. In speech, image, and video processing, deep learning effec-
tively addresses the semantic gap problem by learning high-level concepts from raw
perceptual data in a direct manner. However, in NLP, stronger theories and structured
models on morphology, syntax, and semantics have been advanced to distill the under-
lying mechanisms of understanding and generation of natural languages, which have
not been as easily compatible with neural networks. Compared with speech, image,
and video signals, it seems less straightforward to see that the neural representations
learned from textual data can provide equally direct insights onto natural language.
Therefore, applying neural networks, especially those having sophisticated hierar-
chical architectures, to NLP has received increasing attention and has become the
most active area in both NLP and deep learning communities with highly visible
progresses made in recent years (Deng 2016; Manning and Socher 2017). Surveying
the advances and analyzing the future directions in deep learning for NLP form the
main motivation for us to write this chapter and to create this book, with the desire
for the NLP researchers to accelerate the research further in the current fast pace of
the progress.

1.5 Transitions from Now to the Future

Before analyzing the future dictions of NLP with more advanced deep learning, here
we first summarize the significance of the transition from the past waves of NLP to
the present one. We then discuss some clear limitations and challenges of the present
deep learning technology for NLP, to pave a way to examining further development
that would overcome these limitations for the next wave of innovations.
1 A Joint Introduction to Natural Language Processing and to Deep Learning 11

1.5.1 From Empiricism to Deep Learning: A Revolution

On the surface, the deep learning rising wave discussed in Sect. 1.4 in this chapter
appears to be a simple push of the second, empiricist wave of NLP (Sect. 1.3) into
an extreme end with bigger data, larger models, and greater computing power. After
all, the fundamental approaches developed during both waves are data-driven and
are based on machine learning and computation, and have dispensed with human-
centric “rationalistic” rules that are often brittle and costly to acquire in practical
NLP applications. However, if we analyze these approaches holistically and at a
deeper level, we can identify aspects of conceptual revolution moving from empiricist
machine learning to deep learning, and can subsequently analyze the future directions
of the field (Sect. 1.6). This revolution, in our opinion, is no less significant than the
revolution from the earlier rationalist wave to empiricist one as analyzed at the
beginning (Church and Mercer 1993) and at the end of the empiricist era (Charniak
2011).
Empiricist machine learning and linguistic data analysis during the second NLP
wave started in early 1990 s by crypto-analysts and computer scientists working
on natural language sources that are highly limited in vocabulary and application
domains. As we discussed in Sect. 1.3, surface-level text observations, i.e., words
and their sequences, are counted using discrete probabilistic models without relying
on deep structure in natural language. The basic representations were “one-hot” or
localist, where no semantic similarity between words was exploited. With restric-
tions in domains and associated text content, such structure-free representations and
empirical models are often sufficient to cover much of what needs to be covered.
That is, the shallow, count-based statistical models can naturally do well in limited
and specific NLP tasks. But when the domain and content restrictions are lifted for
more realistic NLP applications in real-world, count-based models would necessarily
become ineffective, no manner how many tricks of smoothing have been invented
in an attempt to mitigate the problem of combinatorial counting sparseness. This
is where deep learning for NLP truly shines—distributed representations of words
via embedding, semantic generalization due to the embedding, longer span deep
sequence modeling, and end-to-end learning methods have all contributed to beat-
ing empiricist, count-based methods in a wide range of NLP tasks as discussed in
Sect. 1.4.

1.5.2 Limitations of Current Deep Learning Technology

Despite the spectacular successes of deep learning in NLP tasks, most notably in
speech recognition/understanding, language modeling, and in machine translation,
there remain huge challenges. The current deep learning methods based on neu-
ral networks as a black box generally lack interpretability, even further away from
explainability, in contrast to the “rationalist” paradigm established during the first
12 L. Deng and Y. Liu

NLP wave where the rules devised by experts were naturally explainable. In practice,
however, it is highly desirable to explain the predictions from a seemingly “black-
box” model, not only for improving the model but for providing the users of the
prediction system with interpretations of the suggested actions to take (Koh and
Liang 2017).
In a number of applications, deep learning methods have proved to give recog-
nition accuracy close to or exceeding humans, but they require considerably more
training data, power consumption, and computing resources than humans. Also,
the accuracy results are statistically impressive but often unreliable on the individ-
ual basis. Further, most of the current deep learning models have no reasoning and
explaining capabilities, making them vulnerable to disastrous failures or attacks with-
out the ability to foresee and thus to prevent them. Moreover, the current NLP models
have not taken into account the need for developing and executing goals and plans
for decision-making via ultimate NLP systems. A more specific limitation of current
NLP methods based on deep learning is their poor abilities for understanding and
reasoning inter-sentential relationships, although huge progresses have been made
for interwords and phrases within sentences.
As discussed earlier, the success of deep learning in NLP has largely come from a
simple strategy thus far—given an NLP task, apply standard sequence models based
on (bidirectional) LSTMs, add attention mechanisms if information required in the
task needs to flow from another source, and then train the full models in an end-to-
end manner. However, while sequence modeling is naturally appropriate for speech,
human understanding of natural language (in text form) requires more complex
structure than sequence. That is, current sequence-based deep learning systems for
NLP can be further advanced by exploiting modularity, structured memories, and
recursive, tree-like representations for sentences and larger text (Manning 2016).
To overcome the challenges outlined above and to achieve the ultimate success
of NLP as a core artificial intelligence field, both fundamental and applied research
are needed. The next new wave of NLP and artificial intelligence will not come until
researchers create new paradigmatic, algorithmic, and computation (including hard-
ware) breakthroughs. Here, we outline several high-level directions toward potential
breakthroughs.

1.6 Future Directions of NLP

1.6.1 Neural-Symbolic Integration

A potential breakthrough is in developing advanced deep learning models and meth-

ods that are more effective than current methods in building, accessing, and exploit-
ing memories and knowledge, including, in particular, common-sense knowledge.
It is not clear how to best integrate the current deep learning methods, centered
on distributed representations (of everything), with explicit, easily interpretable, and
1 A Joint Introduction to Natural Language Processing and to Deep Learning 13

localist-represented knowledge about natural language and the world and with related
reasoning mechanisms.
One path to this goal is to seamlessly combine neural networks and symbolic
language systems. These NLP and artificial intelligence systems will aim to discover
by themselves the underlying causes or logical rules that shape their prediction and
decision-making processes interpretable to human users in symbolic natural language
forms. Recently, very preliminary work in this direction made use of an integrated
neural-symbolic representation called tensor-product neural memory cells, capable
of decoding back to symbolic forms. This structured neural representation is provably
lossless in the coded information after extensive learning within the neural-tensor
domain (Palangi et al. 2017; Smolensky et al. 2016; Lee et al. 2016). Extensions
of such tensor-product representations, when applied to NLP tasks such as machine
reading and question answering, are aimed to learn to process and understand mas-
sive natural language documents. After learning, the systems will be able not only to
answer questions sensibly but also to truly understand what it reads to the extent that
it can convey such understanding to human users in providing clues as to what steps
have been taken to reach the answer. These steps may be in the form of logical reason-
ing expressed in natural language which is thus naturally understood by the human
users of this type of machine reading and comprehension systems. In our view, natu-
ral language understanding is not just to accurately predict an answer from a question
with relevant passages or data graphs as its contextual knowledge in a supervised
way after seeing many examples of matched questions–passages–answers. Rather,
the desired NLP system equipped with real understanding should resemble human
cognitive capabilities. As an example of such capabilities (Nguyen et al. 2017)—
after an understanding system is trained well, say, in a question answering task
(using supervised learning or otherwise), it should master all essential aspects of the
observed text material provided to solve the question answering tasks. What such
mastering entails is that the learned system can subsequently perform well on other
NLP tasks, e.g., translation, summarization, recommendation, etc., without seeing
additional paired data such as raw text data with its summary, or parallel English and
Chinese texts, etc.
One way to examine the nature of such powerful neural-symbolic systems is
to regard them as ones incorporating the strength of the “rationalist” approaches
marked by expert reasoning and structure richness popular during the first wave of
NLP discussed in Sect. 1.2. Interestingly, prior to the rising of deep learning (third)
wave of NLP, (Church 2007) argued that the pendulum from rationalist to empiri-
cist approaches has swung too far at almost the peak of the second NLP wave, and
predicted that the new rationalist wave would arrive. However, rather than swinging
back to a renewed rationalist era of NLP, deep learning era arrived in full force in just
a short period from the time of writing by Church (2007). Instead of adding the ratio-
nalist flavor, deep learning has been pushing empiricism of NLP to its pinnacle with
big data and big compute, and with conceptually revolutionary ways of representing
a sweeping range of linguistic entities by massive parallelism and distributedness,
thus drastically enhancing the generalization capability of new-generation NLP mod-
els. Only after the sweeping successes of current deep learning methods for NLP
14 L. Deng and Y. Liu

(Sect. 1.4) and subsequent analyses of a series of their limitations, do researchers

look into the next wave of NLP—not swinging back to rationalism while abandon-
ing empiricism but developing more advanced deep learning paradigms that would
organically integrate the missing essence of rationalism into the structured neural
methods that are aimed to approach human cognitive functions for language.

1.6.2 Structure, Memory, and Knowledge

As discussed earlier in this chapter as well as in the current NLP literature (Man-
ning and Socher 2017), NLP researchers at present still have very primitive deep
learning methods for exploiting structure and for building and accessing memories
or knowledge. While LSTM (with attention) has been pervasively applied to NLP
tasks to beat many NLP benchmarks, LSTM is far from a good memory model
for human cognition. In particular, LSTM lacks adequate structure for simulating
episodic memory, and one key component of human cognitive ability is to retrieve
and re-experience aspects of a past novel event or thought. This ability gives rise
to one-shot learning skills and can be crucial in reading comprehension of natural
language text or speech understanding, as well as reasoning over events described by
natural language. Many recent studies have been devoted to better memory model-
ing, including external memory architectures with supervised learning (Vinyals et al.
2016; Kaiser et al. 2017) and augmented memory architectures with reinforcement
learning (Graves et al. 2016; Oh et al. 2016). However, they have not shown general
effectiveness, but have suffered from a number of of limitations including notably
scalability (arising from the use of attention which has to access every stored element
in the memory). Much work remains in the direction of better modeling of memory
and exploitation of knowledge for text understanding and reasoning.

1.6.3 Unsupervised and Generative Deep Learning

Another potential breakthrough in deep learning for NLP is in new algorithms for
unsupervised deep learning, which makes use of ideally no direct teaching signals
paired with inputs (token by token) to guide the learning. Word embedding discussed
in Sect. 1.4 can be viewed as a weak form of unsupervised learning, making use of
adjacent words as “cost-free” surrogate teaching signals, but for real-world NLP pre-
diction tasks, such as translation, understanding, summarization, etc., such embed-
ding obtained in an “unsupervised manner” has to be fed into another supervised
architecture which requires costly teaching signals. In truly unsupervised learning
which requires no expensive teaching signals, new types of objective functions and
new optimization algorithms are needed, e.g., the objective function for unsupervised
learning should not require explicit target label data aligned with the input data as
in cross entropy that is most popular for supervised learning. Development of unsu-
1 A Joint Introduction to Natural Language Processing and to Deep Learning 15

pervised deep learning algorithms has been significantly behind that of supervised
and reinforcement deep learning where backpropagation and Q-learning algorithms
have been reasonably mature.
The most recent preliminary development in unsupervised learning takes the
approach of exploiting sequential output structure and advanced optimization meth-
ods to alleviate the need for using labels in training prediction systems (Russell and
Stefano 2017; Liu et al. 2017). Future advances in unsupervised learning are promis-
ing by exploiting new sources of learning signals including the structure of input data
and the mapping relationships from input to output and vice versa. Exploiting the rela-
tionship from output to input is closely connected to building conditional generative
models. To this end, the recent popular topic in deep learning—generative adversar-
ial networks (Goodfellow et al. 2014)—is a highly promising direction where the
long-standing concept of analysis-by-synthesis in pattern recognition and machine
learning is likely to return to spotlight in the near future in solving NLP tasks in new
ways.
Generative adversarial networks have been formulated as neural nets, with dense
connectivity among nodes and with no probabilistic setting. On the other hand,
probabilistic and Bayesian reasoning, which often takes computational advantage
of sparse connections among “nodes” as random variables, has been one of the
principal theoretical pillars to machine learning and has been responsible for many
NLP methods developed during the empiricist wave of NLP discussed in Sect. 1.3.
What is the right interface between deep learning and probabilistic modeling? Can
probabilistic thinking help understand deep learning techniques better and motivate
new deep learning methods for NLP tasks? How about the other way around? These
issues are widely open for future research.

1.6.4 Multimodal and Multitask Deep Learning

Multimodal and multitask deep learning are related learning paradigms, both con-
cerning the exploitation of latent representations in the deep networks pooled from
different modalities (e.g., audio, speech, video, images, text, source codes, etc.) or
from multiple cross-domain tasks (e.g., point and structured prediction, ranking,
recommendation, time-series forecasting, clustering, etc.). Before the deep learning
wave, multimodal and multitask learning had been very difficult to be made effective,
due to the lack of intermediate representations that share across modalities or tasks.
See a most striking example of this contrast for multitask learning—multilingual
speech recognition during the empiricist wave (Lin et al. 2008) and during the deep
learning wave (Huang et al. 2013a).
Multimodal information can be exploited as low-cost supervision. For instance,
standard speech recognition, image recognition, and text classification methods make
use of supervision labels within each of the speech, image, and text modalities sepa-
rately. This, however, is far from how children learn to recognize speech, image, and
to classify text. For example, children often get the distant “supervision” signal for
Other documents randomly have
different content
by the other boats containing four contented, but thoroughly
mystified, captains.
As soon as they were fairly under way, our three chums stretched
out on the launch's cushioned seats for a nap. They were completely
worn out by the eventful day and night.
At sunrise Charley was awakened by Bill.
"We've been running without a hitch all night," the big fisherman
informed him. "We must be getting near to your creek by now. We
passed Tampa over an hour ago."
Charley stood up and surveyed the shore-line. "I took a landmark
before I left," he said. "There's a great, dead, pine tree standing up
amongst a clump of palmettoes just to the south of the creek. I
believe I can see it ahead there a couple of miles."
At the end of ten minutes, he could make out the big, dead pine
plainly.
He awakened his chums and the three sat tense and impatient
waiting to see if all their hopes and trouble had been in vain.
When within a few hundred yards of the creek, Charley could
stand the suspense no longer.
"Stop the engine," he requested, in a fever of impatience.
Bill threw off the battery switch. The four wondering captains
trailing behind followed his example and the throbbing of the
engines ceased.
The lad stood up and listened intently. His quick ear could just
distinguish a faint, peculiar noise, like the soft smacking of
thousands of lips.
He sank back into his seat with a sigh of relief.
"It's all right," he exclaimed, delightedly. "I can hear them. Run in
close to shore and anchor."
CHAPTER XXXII.
ABOUT MANY THINGS.

As soon as the anchors were dropped all scrambled into the skiffs,
eager to be ashore.
They landed close to the sand spit that barred the creek's
entrance, and a few steps brought them to where they could look in
on the little inland lake. All stood silent for a moment, gazing at the
thousands on thousands of little, open, gasping mouths.
"I expected to see some fish from what you told us but I didn't
expect anything like this," said Bill, drawing a deep breath. "Boy,
there's a pot of money waiting for us in that little pond."
The other fishermen's faces were expressive of amazement and
envy.
"You might have let us in on this," one of them grumbled.
"Would any of you have done it for us if you had found them?"
Charley demanded.
"I wouldn't," the man admitted. "But, all the same, ten dollars a
day looks mighty small with all this money in sight."
"We need every dollar we can make off of this thing," the lad
said, "but we want to be as generous as we can afford to be. We are
going to do better by you than we bargained to do. If you all do
your best to help us put these fish into Clearwater, we will give you
ten per cent on what we make in addition to the ten dollars a day
we promised you."
"That's more than fair," declared Captain Brown. "We will do our
best. All hands had better get to work at once. Those fish are about
all in. I doubt if they will live thirty-six hours longer."
Charley had planned everything on the way down the coast and
he had already arranged each man's part so that the work might be
done with system and despatch. The Roberts and himself were to do
the work with the nets. The fishermen were to do the loading, with
the captain to help them. All of them were to work on one launch at
a time and as soon as it was loaded it was to start for Clearwater
while the next one received its cargo.
To Chris was assigned the job of cooking for all hands, so that no
time would be lost in the preparation of meals.
Charley and the Roberts had taken on themselves the hardest
part of the work, but the four went at their nasty, disagreeable task
with vigor and cheerfulness.
Taking an end of the joined nets, they waded across one end of
the shallow lagoon stringing it out behind them. As soon as they had
gotten the end to the opposite shore, two got to each end and
pulled lustily.
They had been careful to cut off only a small portion of the
lagoon, but even so, they found that the fish between the net and
shore were almost more than they could handle. They had to pull
with all their might to drag in the ladened net, and as they pulled,
they feared each minute that the fine twine would give way under
the tremendous pressure.
But at last they got the net ashore, its meshes full of struggling,
silvery mullet.
Then began the tiring work of getting the fish out of the fine,
tangling twine. As fast as they were taken out they tossed them into
a large box, and as soon as the box was filled, a fisherman carried it
to the waiting skiffs and dumped the load, returning for another.
In two hours the first launch was loaded, and started back for
Clearwater.
Walter, his ankle done up in splints and bandages, and using a
cane for a crutch, limped into the fish house, the day following his
accident, and sought a seat on a pile of old nets in a corner where
he was not likely to be seen by Mr. Daniels. He had not sought the
kindly fish boss yet to tell him of the loss of the launch. He was
deferring the unpleasant task in hopes that his chums would be
successful when the telling would be easier. Besides, he was not
feeling equal to the task of explaining. His foot pained him intensely.
He was also depressed by the doctor's statement that he had
suffered a compound fracture of the ankle and must not try to use
his foot for many days to come. He had but little money in his
pocket and had not dared spend any of it for board and lodging.
Instead he had slept miserably in a skiff pulled up on shore and had
breakfasted off of cheese and crackers. Taking it all in all he did not
feel equal to the unpleasant task of breaking bad news. He had been
drawn to the fish house, however, knowing that there he would be
likely to hear the first news of his absent chums. He was hoping Mr.
Daniels would, not spy him in his secluded corner.
But Mr. Daniels was having troubles of his own. A dull season is
hard on the fishermen but harder still on the fish boss. On the desk
before him was a heap of letters and telegrams from customers
demanding fish. If he could not supply them at once, they would of
course buy elsewhere. Building up a trade is slow work, and if you
cannot supply its wants, it is soon lost. He was worrying through the
mass of mail when the telephone bell rang. He lifted the receiver off
the hook.
"Hello! who's this?" he demanded, curtly.
"It's Captain Brown, Cap," answered a tired voice. "I'm at the
dock. Send down for some fish, will you?"
"How many have you got, twenty pounds?" demanded Mr.
Daniels, sarcastically.
"Call it twenty pounds if you like," drawled the tired voice. "I
calculate, though, that they will come nearer tipping the scales at
ten thousand pounds."
"Good boy," exclaimed the fish boss in delight. "They will help me
out a lot. Where did you catch them."
"I didn't catch them," said the weary tones. "Credit them to the
account of those new guys, 'West, Hazard and so forth.' Good-by,
I've got to go back for another load."
Walter in his secluded corner caught enough of the conversation
to tell him that his chums had succeeded. He forgot his pain and
discouragement. Things took on a rosy tinge. He suddenly
remembered the dime's worth of cheese and crackers, for breakfast,
had only put an edge on his appetite. He stole out of the fish house
and hobbled down the street to a little restaurant where he was
soon seated behind a big, juicy steak and mashed potatoes.
As soon as his hunger was appeased, he hobbled back to the fish
house.
There he remained all the balance of the day and far into the
night for the fish house was the scene of great excitement. One
after the other the launches arrived with their finny cargoes. When
the last one was unloaded the first to arrive was back again with
another load. The house's regular force was unable to handle the
deluge. Men, boys, and even women were hired at fancy prices to
assist. Packing in barrels became impossible. As many as could be
were packed that way but the most were hustled, unpacked, into a
car and heavily iced down.
"For goodness' sake, how many more are coming?" Mr. Daniels
demanded of a midnight arrival.
"Not many," answered the launch captain. "They were making
their last haul when I left. Some of the fishermen followed the first
launch back and are trying to butt into the snap."
"The rascally scoundrels," exclaimed Mr. Daniels, indignantly.
The man grinned wearily. "You needn't worry," he said. "When I
left, Bill Roberts was standing off the gang with a rifle, while the
other fellows got out the fish."
"They must be about tired out by this time," commented the fish
boss.
"Tired!" exclaimed the launch captain. "I am pretty well worn out
myself and we launch men have the easiest part of the job. Those
fellows who are handling the nets are earning every dollar they will
make. Their fingers are worn through both skins handling that fine,
wet twine. Their hands are just bleeding raw, and you know how salt
water and fish slimes smart the smallest cut. They have bent over
the nets so long that they can't straighten up without bringing the
tears to their eyes. I'd like to have the money they will make, but
hanged if I would work that hard for it."
The launch captain had not overstated the case. The little party
on the beach below were very near the limit of human endurance
when the last fish was taken out of the nets. The launch captain had
to assist them to the skiffs and into the launches. Once aboard the
motor boats, they stretched out on the seats and slept the sleep of
utter exhaustion.
Another day had dawned when the fish captain awoke them at
Clearwater.
Walter, radiant of countenance, was waiting on the dock to
welcome them.
It took Charley several minutes to regain his sleep-scattered wits.
"How much did they weigh?" he asked eagerly, as he wrung his
chum's hand in congratulation.
"Just an even hundred and fifty thousand pounds," Walter said.
"Good! at two cents a pound, that's three thousand dollars."
"Better than that," beamed his chum. "Owing to the scarcity of
fish, the market has gone up a cent a pound."
"Four thousand five hundred dollars," cried Charley, in delight.
"Over two thousand dollars to be divided up amongst us four. It's
almost too good to be true."
"And that's not all," added Walter, eagerly. "We are not going to
lose much on the launch, after all. Mr. Daniels says she was insured
for nearly her full value."
"All's well that ends well," Charley commented. "We have not
come out of our fishing venture so badly after all."
"I am afraid we haven't reached the end just yet," said Walter, his
countenance sobering. "I've got something pretty serious to tell you
as soon as we are all alone."
"If it's nothing real pressing, save it a while," said Charley, hastily.
"I want to get some money from Mr. Daniels and pay off the launch
captains. Then, I want a good long sleep with nothing to worry me.
The Roberts have insisted on our staying with them a couple of days
until we get straightened out. We will go over to their camp as soon
as I get the fishermen paid off."
It took but a short time to get the money and pay off the sleepy
launch captains. They were all well-pleased with their share of the
venture. Besides the ten dollars a day, they received four hundred
and fifty dollars to be divided among them.
This business attended to, our little party joined the Roberts in
their launch and the run to camp was quickly made. As soon as it
was reached, the workers turned in for a good, long sleep, and
Walter was left alone with his secret.
CHAPTER XXXIII.
THE SMUGGLERS AGAIN.

While his chums were making up the sleep they had lost, Walter
took the Roberts launch and ran over to Palm Island. Brief as had
been their stay on the little isle, he had grown quite fond of it and
his anger rose as he viewed the work of the wreckers. The vandals
had done their work well. Not a stick remained standing of the
former cozy, little cabin. The wharf, too, was gone, even its posts
had been hacked short off at the surface of the water.
Leaving the scene of the ruin, Walter hobbled slowly over the
little island looking all about with thoughtful interest. At last, he
made his way back to the launch and returned to the Roberts camp.
His companions were awake and stirring about. Chris was busily
engaged in cooking dinner, while the rest were applying salve and
bandages to their sore hands.
Charley greeted his chum with an affectionate smile. "How's the
foot?" he inquired.
"Coming on all right," said Walter, cheerfully. "How about you,
feeling better?"
"Feeling fine and dandy," declared the other, "and I am as hungry
as a wolf. I remember you had some bad news to tell me. Let's hear
it. I feel able to face all kinds of trouble now."
"I don't know as it is exactly bad news for us," said his chum. "In
a way it doesn't concern us at all, unless we want to make it our
business."
"You are getting my curiosity aroused," Charley laughed. "Let's
hear this news of yours."
"The night you all left me in Clearwater, I did not go to a boarding
house to stop. It had cost quite a bit to have my ankle fixed up and I
did not have much money left and I was afraid to spend what little I
had, for I knew, if you fellows were not successful in your trip, there
was going to be mighty hard times ahead. I went out on the dock
and looked around but I didn't quite fancy sleeping there so I went
back uptown and hung around until the stores closed. I was getting
pretty sleepy by this time, so I went down again to the bay and
looked around until I found what I wanted, a skiff pulled up high and
dry on the sand. There were some old nets in the bottom and I
crawled in, stretched out on one of the nets, and pulled the other
one over me, getting my head under a seat to keep out the dew. I
went to sleep as cozy as a bug in a rug. I don't know how long I had
slept when I woke up to the sound of voices. Four men were sitting
on the edge of my skiff talking together. It was too dark to see their
faces but I knew one of the voices. It was Hunter's and you can bet
your life I laid mighty still and listened.
"They were talking about us at first and it made my blood boil to
hear them chuckling over the harm they had done us, but there was
nothing I could do but lay quiet and stand it. They talked about the
cache and wondered where we had hidden the liquor. At last they
came to what, I guess, was the real object of their meeting where
no one could hear them. Having disposed of us, as they thought,
they have arranged to bring in another large lot of aguardiente."
"When?" Charley demanded, eagerly.
"To-night. They expect the schooner at the island at about
midnight. They talked it over and arranged all the details of the job
before they separated."
"To-night at midnight," Charley mused. "We had better go right
over and tell the sheriff."
"That was the first thing I thought of," Walter said. "I was up at
his house by sunrise the next morning but it was no use. His wife
told me he was very ill and could not be seen."
"Queer, he is never around when that smuggling is going on,"
observed Charley, suspiciously. "I wonder if it can be that he is
standing in with the smugglers for a share of the profits."
"Not Sheriff Daley," spoke up Bill Roberts, warmly. "He is as
square a man as ever lived. Queer, though," he added, slowly, "I saw
him just the day before and he looked the picture of health, but
then, it may be appendicitis or some such sudden illness that's
struck him."
"It's too bad," said Captain Westfield. "It leaves those rascals free
to carry out their devilment. Of course, it's none of our business, but
it seems wrong to have such things going on."
"No, of course it is none of our business," Charley agreed,
hesitatingly. "How many of them are there in it, Walt? Did you
hear?"
"Only the four that met," his chum replied. "They were discussing
getting a couple more men to help, but Hunter objected as it would
mean more division of the profits. He said the schooner's crew could
help land the stuff."
"Did he say how many were on the schooner?" Bill Roberts
inquired.
"Four men and a boy," replied Walter.
"Well, as you have all said, I reckon it is none of our business,"
Bill observed.
They sat in thoughtful silence for a few minutes.
"It would be hard on Hunter's wife, if he was caught," Charley
said, finally.
"It would be the best thing that could happen for her," Bill
declared. "She is a good woman. She works like a slave to support
them both. Hunter blows in all the money he makes and lives on her
earnings. He beats her like a dog, too."
"The brute!" Walter exclaimed, hotly.
"Dar's five hundred dollars to be gib to de one what catches de
booze sellers, ain't dey?" Chris inquired. "'Pears like hit would be a
powerful good thing for some one to cotch him an' send all dat
money to dat poor woman."
Captain Westfield looked from one to the other with a sheepish
grin. "Thar isn't any use of our saying it's none of our business," he
said. "Down deep in his heart each one of us knows it is his
business. It's always a man's business to stop wrong-doing."
"Right you are," agreed Bill Roberts, with gruff heartiness. "I
know we are all thinking about the the same things. It isn't so much
that this man and his gang are breaking the law that counts, it's the
misery and suffering which he causes that calls for action. There
have been ten men killed in the fish camps here the past year, and
what caused the killing? Rum, rum brought in and sold by Hunter.
And that isn't all the misery he's caused. Think of the beaten wives
and neglected children. It's time there was a stop put to it."
"Yes," Captain Westfield agreed. "We are as much our brother's
keeper as in the days of Cain."
"I guess we are all pretty well agreed," smiled the practical
Charley. "The question is, how are we going to take them. There are
nine of them and only seven of us. Of course one of them is only a
boy, but then, Walt is pretty well crippled up."
"I'll be right there when the fun begins," his chum said,
determinedly. "What if they are two more in number. We will be well
armed, and surely a surprise counts for something. I went over the
island while you were all sleeping and planned it all out. There is
only one piece of the beach where a boat can land safely. There is a
group of palmettoes close to it. Now what I planned is this. We had
better start out in the launch early and run straight out of the pass
as though we were going out to the reef. Once we get behind the
island, and out of sight of Clearwater, we'll skirt the shore and run
around to the north end. There's a little cove there where the launch
will be hidden from both the gulf and the bay. When dark comes we
can hide in the clump of palmettoes and wait. When they get to
work in earnest, we can slip out and take them by surprise. Then
five of us can keep them quiet with the rifles, while the other two tie
them up. Once we have got them secure, we can load them into the
launch, carry them straight to Tampa and turn them over to the
sheriff there. How does that strike all of you?"
"It sounds simple enough," Charley said, doubtfully, "too simple,
in fact."
"What fault can you find with it?" Walter demanded.
"None," his chum answered, "only I have a hunch that Hunter is
too clever and cunning a rascal to be caught so easily."
"Have you any better plan to suggest?" Walter asked.
But Charley had not, nor did any of the others, so, after some
discussion, Walter's plan was adopted.
As soon as dinner was over, some lunch was packed into a
basket, and storing it and the loading rifles in the launch, they
steered boldly out of the inlet. As soon as the island was between
them and Clearwater, however, they shifted helm, and hugging its
shore, ran down to its northern end.
Here they found the little cove Walter had mentioned. Running
the launch into it, they anchored and waded ashore. They placed
their launch and rifles in the clump of palmettoes, and then there
was nothing to do until the coming of night, except to pass the time
away as best they could. By keeping on the gulf side of the island,
there was no danger of their being seen from Clearwater, and this
they were careful to do. A swim in the clear, warm water and the
picking up of curious shells on the beach served to while away the
balance of the afternoon. As soon as dark came, they retired within
the clump of palms. With the going down of the sun came the rising
of the moon. It was nearly full and its rays lit up the little island
almost as brightly as day. Our little party welcomed its tropical
radiance for it would allow them to see without being seen.
The hours slipped slowly away. At first some attempt was made at
story-telling and conversation, but soon all lapsed into a thoughtful
silence. Each realized that they were about to engage in a desperate
undertaking. In fact, it was almost a foolhardy act they
contemplated. The smugglers had all the advantage in point of
force. They were eight, able-bodied men beside the boy, and it was
more than likely that all of them would be armed. Of their own party,
the three Roberts boys were really the only active men. Charley,
though unusually strong for his age, was only a boy, while the
captain, vigorous though he still was, was getting well along in
years. Walter was practically helpless with his broken ankle, while
Chris was too small to be of much help where strength was required.
But for the advantage that would lie in taking the smugglers by
surprise, they were more likely to be the captured than the captors.
These reflections and the long, expectant waiting were beginning
to tell on their nerves, when they heard the welcome put-put of a
distant launch.
"They are coming, at last," said Charley, with a sigh of relief. "I
can recognize that exhaust. The Hunters launch is the only one that
sounds just like that."
"The schooner must be somewhere near but I don't see her
lights," Walter observed.
"Why, thar she is," exclaimed the captain, "sneaking inshore like a
thief in the night."
CHAPTER XXXIV.
THE SURPRISE.

So silently that they had been unaware of her approach, the

strange craft had stolen in like a phantom ship to within two
hundred yards of where they lay concealed. She now lay directly in
the moon's path and its rays so bright set out every rope and sail in
dark relief. Not a light shone aboard. Her captain had evidently been
made wary by his former alarm and was taking all possible chances
against drawing the attention of others.
As silent as a ghost ship the graceful craft crept in to within a
cable's length of the beach. Then, with a faint creak of traveling
blocks she rounded gracefully up into the wind and a muffled splash
told that her anchor had been dropped.
She made a beautiful sight laying, swan-like, full in the glowing
pathway of the moon, her great white sails quivering in the gentle
breeze.
"The bird is ready to flit away at the first alarm," whispered the
captain. "See, he has got his anchor hove short and has taken in
none of his sails but the jib. He could get under way again in half a
minute. He's wary all right."
"We had better not talk any more," cautioned Charley in a
whisper. "Sound carries a long ways over the water and the launch is
nearly here."
With nerves at highest tension the little party waited.
The loud throbbing of the launch's engine suddenly ceased. There
came a splash from a dropped anchor, and more splashing as its
crew waded ashore. Then came a murmuring of voices and the
sound of footsteps, and the watchers drew further back into their
hiding place as four figures came into view. They passed so close to
the bunch of palms that their features were plain to the hiders. One
was Hunter, himself, the other three they recognized as members of
his gang.
The four hurried down to the water's edge.
"Ahoy," Hunter hailed the schooner. "It's all right. Come ashore."
"Are you sure no one else is around?" cautiously inquired a voice
from the schooner.
The response had been in perfect English but something in the
tones and the faint foreign accent made the chums stare at each
other as though they had heard a voice from the grave.
"No, there's no one here but ourselves," Hunter replied,
impatiently. "Do you think I would be here if everything wasn't all
right? Come, get a move on you, and hustle that stuff ashore.
There's a lot to do, and it ain't many hours till daylight."
Those on the schooner fell to work with feverish haste. A small
dingy carried on deck was launched over the side. Two figures
leaped into it and received the cases, two others brought up from
the hold.
As soon as the dingy was loaded, the two on deck scrambled
aboard and one sculled her into shore.
The moment she grounded, the captain leaped ashore. "Here is
part of our goods," he said smoothly. "We can bring it all in in three
more trips."
"Good," Hunter growled. "Come, unload it. What are you waiting
for?"
"Only for our money, kind sir," said the schooner's captain, in
smooth, suave tones which stirred in the chums old, cruel memories.
"I think it would be best for each boat-load to be paid for as it is
brought in."
"Don't be a fool, man," said Hunter, roughly. "We can settle up
when the job is done. We have got no time to waste, now."
"Pay before unloading," insisted the captain of the schooner,
politely. "Gentlemen in our business cannot be too careful. Of course
I know you are the soul of honesty, but you are forgetful, my good
friend. You have never remembered to pay me for that last lot I
brought you."
"How many cases?" Hunter demanded, with an oath, as he pulled
out a greasy roll of bills.
"Twenty cases, one hundred dollars," said the stranger.
Hunter counted out the bills, and the schooner captain recounted
them carefully and thrust them into his pocket.
"You are still forgetting that little bank account of a hundred
dollars," he remarked, pleasantly. "Surely, now is a splendid time to
settle it."
Hunter's face grew livid with anger, but he controlled his temper
with an effort. He was quick to realize that he could only lose by a
display of anger. The man already had a hundred dollars of his
money, and still remained in possession of the liquor.
The chums in their concealment chuckled inwardly at his plight.
At last the rascally fisherman had met his equal in cunning.
Grudgingly, he counted out another hundred dollars which the
smuggler pocketed with a mocking bow of thanks.
"It's a pleasure to do business with a spot-cash gentleman like
you," he declared. "Now, you may have your liquor, and there's three
more boat-loads, just as good, at a hundred dollars a load."
"You'll have to help us carry it up to the cache," Hunter growled.
"There's too much of it for us four to get out of the way before
daylight."
"Always glad to oblige such a pleasant gentleman," said the
smuggler, swinging a case up on his shoulder. "Many hands make
light work." His companions silently followed his example, each
shouldering a case and the fishermen similarly loaded fell in behind
them.
Hunter and one of his gang brought up the rear. As they came
alongside the clump of palmettoes, Hunter nudged the man ahead.
"Drop behind a bit," he said, softly.
The man slowed his walk.
"That fellow's got too much of our money to get away with it," he
declared in tones too low to reach those ahead.
The man nodded. "We've got to take it from him," he agreed.
"We'd better wait until all the stuff is landed," planned Hunter.
"We'll jump him just as he gets ready to leave and make him shell
out. He can't make any trouble about it. He dasn't make any kick to
the authorities. Tell the rest of the boys when you get a chance."
The whispered conference had taken less than a minute but the
alert smuggler glanced suspiciously back at the two plotters and
they quickened their steps.
"Our work is half done for us if they are going to fight amongst
themselves," exulted Charley, as the procession passed out of
hearing. "We had better wait till the trouble starts and then come
down on them."
"Did you notice that smuggler captain's voice?" asked Captain
Westfield, eagerly.
Walter's eyes were gleaming. "It's Manuel George, the Greek
interpreter," he exclaimed, softly. "The rascal that caused us so much
misery and stole our schooner from us."
"And that's our dear old 'Beauty' lying out there," declared
Charley, a thrill in his voice. "We have got to take her, if we risk our
lives doing it. But here they come back again."
The smugglers were losing no time but working with all possible
rapidity. The first dingy load was quickly transferred to its hiding
place and a second load brought ashore, the smuggler captain
insisting on his pay before a case was unloaded, a third load quickly
followed the second, and just as the morning star began to show in
the east, the fourth and last load was brought ashore.
To the hidden watchers it seemed a century of waiting. With the
coming of the last load, the tension became almost unbearable. A
few minutes now would decide whether or not they were to recover
their dearly loved ship which they had long since given up as lost, to
them, forever.
The fisherman and smuggler captain seemed to be in excellent
spirits as the work progressed. They laughed and joked with each
other, but it seemed to Charley, keenly observant, that their gaiety
was forced. He imagined a sinister note under their high spirits and
the watchful, alert smuggler captain, for all his affected friendliness,
seemed to be watching every movement of the fishermen. All were
working at top speed now to complete the unloading before day, and
the pile of cases in the dingy rapidly diminished.
As the carriers passed back and forth to and from the new cache
they were making, there would be a few minutes each trip when
they were far enough away from the concealed ones for the little
party to hold low, whispered conversation.
"We want to act all together," Charley said, during one of these
intervals. "When I say, 'Now', we will cover them with our rifles and
step out upon them. I am going to wait till the last minute to give
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.