Complete Deep Learning in Natural Language Processing Deng PDF For All Chapters
Complete Deep Learning in Natural Language Processing Deng PDF For All Chapters
com
https://textbookfull.com/product/deep-learning-in-
natural-language-processing-deng/
https://textbookfull.com/product/deep-learning-in-natural-language-
processing-1st-edition-li-deng/
textbookfull.com
https://textbookfull.com/product/vaughan-asburys-general-
ophthalmology-19th-edition-paul-riordan-eva/
textbookfull.com
https://textbookfull.com/product/do-it-yourself-garden-projects-and-
crafts-60-planters-bird-houses-lotion-bars-garlands-and-more-wolfe-
debbie/
textbookfull.com
https://textbookfull.com/product/nan-jing-the-classic-of-difficult-
issues-paul-u-unschuld/
textbookfull.com
Population Science Methods and Approaches to Aging and
Alzheimer's Disease and Related Dementias Research 1st
Edition Trinh-Shevrin
https://textbookfull.com/product/population-science-methods-and-
approaches-to-aging-and-alzheimers-disease-and-related-dementias-
research-1st-edition-trinh-shevrin/
textbookfull.com
Li Deng · Yang Liu Editors
Deep Learning
in Natural
Language
Processing
Deep Learning in Natural Language Processing
Li Deng Yang Liu
•
Editors
123
Editors
Li Deng Yang Liu
AI Research at Citadel Tsinghua University
Chicago, IL Beijing
USA China
and
AI Research at Citadel
Seattle, WA
USA
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
part of Springer Nature
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword
“Written by a group of the most active researchers in the field, led by Dr. Deng, an
internationally respected expert in both NLP and deep learning, this book provides
a comprehensive introduction to and up-to-date review of the state of art in applying
deep learning to solve fundamental problems in NLP. Further, the book is highly
timely, as demands for high-quality and up-to-date textbooks and research refer-
ences have risen dramatically in response to the tremendous strides in deep learning
applications to NLP. The book offers a unique reference guide for practitioners in
various sectors, especially the Internet and AI start-ups, where NLP technologies
are becoming an essential enabler and a core differentiator.”
Hongjiang Zhang (Founder, Sourcecode Capital; former CEO of KingSoft)
“This book provides a comprehensive introduction to the latest advances in deep
learning applied to NLP. Written by experienced and aspiring deep learning and
NLP researchers, it covers a broad range of major NLP applications, including
spoken language understanding, dialog systems, lexical analysis, parsing, knowl-
edge graph, machine translation, question answering, sentiment analysis, and social
computing.
The book is clearly structured and moves from major research trends, to the
latest deep learning approaches, to their limitations and promising future work.
Given its self-contained content, sophisticated algorithms, and detailed use cases,
the book offers a valuable guide for all readers who are working on or learning
about deep learning and NLP.”
Haifeng Wang (Vice President and Head of Research, Baidu; former President
of ACL)
“In 2011, at the dawn of deep learning in industry, I estimated that in most speech
recognition applications, computers still made 5 to 10 times more errors than human
subjects, and highlighted the importance of knowledge engineering in future
directions. Within only a handful of years since, deep learning has nearly closed the
gap in the accuracy of conversational speech recognition between human and
computers. Edited and written by Dr. Li Deng—a pioneer in the recent speech
v
vi Foreword
recognition revolution using deep learning—and his colleagues, this book elegantly
describes this part of the fascinating history of speech recognition as an important
subfield of natural language processing (NLP). Further, the book expands this
historical perspective from speech recognition to more general areas of NLP,
offering a truly valuable guide for the future development of NLP.
Importantly, the book puts forward a thesis that the current deep learning trend is
a revolution from the previous data-driven (shallow) machine learning era, although
ostensibly deep learning appears to be merely exploiting more data, more com-
puting power, and more complex models. Indeed, as the book correctly points out,
the current state of the art of deep learning technology developed for NLP appli-
cations, despite being highly successful in solving individual NLP tasks, has not
taken full advantage of rich world knowledge or human cognitive capabilities.
Therefore, I fully embrace the view expressed by the book’s editors and authors that
more advanced deep learning that seamlessly integrates knowledge engineering will
pave the way for the next revolution in NLP.
I highly recommend speech and NLP researchers, engineers, and students to read
this outstanding and timely book, not only to learn about the state of the art in NLP
and deep learning, but also to gain vital insights into what the future of the NLP
field will hold.”
Sadaoki Furui (President, Toyota Technological Institute at Chicago)
Preface
vii
viii Preface
were active participants and were taking leading roles. We thank our Springer’s
senior editor, Dr. Celine Lanlan Chang, who kindly invited us to create this book
and who has been providing much of timely assistance needed to complete this
book. We are grateful also to Springer’s Assistant Editor, Jane Li, for offering
invaluable help through various stages of manuscript preparation.
We thank all authors of Chaps. 2–10 who devoted their valuable time carefully
preparing the content of their chapters: Gokhan Tur, Asli Celikyilmaz, Dilek
Hakkani-Tur, Wanxiang Che, Yue Zhang, Xianpei Han, Zhiyuan Liu, Jiajun Zhang,
Kang Liu, Yansong Feng, Duyu Tang, Meishan Zhang, Xin Zhao, Chenliang Li,
and Xiaodong He. The authors of Chaps. 4–9 are CCL 2016 tutorial speakers. They
spent a considerable amount of time in updating their tutorial material with the
latest advances in the field since October 2016.
Further, we thank numerous reviewers and readers, Sadaoki Furui, Andrew Ng,
Fred Juang, Ken Church, Haifeng Wang, and Hongjiang Zhang, who not only gave
us much needed encouragements but also offered many constructive comments
which substantially improved earlier drafts of the book.
Finally, we give our appreciations to our organizations, Microsoft Research and
Citadel (for Li Deng) and Tsinghua University (for Yang Liu), who provided
excellent environments, supports, and encouragements that have been instrumental
for us to complete this book. Yang Liu is also supported by National Natural
Science Foundation of China (No.61522204, No.61432013, and No.61331013).
xi
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Contributors
xiii
Acronyms
AI Artificial intelligence
AP Averaged perceptron
ASR Automatic speech recognition
ATN Augmented transition network
BiLSTM Bidirectional long short-term memory
BiRNN Bidirectional recurrent neural network
BLEU Bilingual evaluation understudy
BOW Bag-of-words
CBOW Continuous bag-of-words
CCA Canonical correlation analysis
CCG Combinatory categorial grammar
CDL Collaborative deep learning
CFG Context free grammar
CYK Cocke–Younger–Kasami
CLU Conversational language understanding
CNN Convolutional neural network
CNNSM Convolutional neural network based semantic model
cQA Community question answering
CRF Conditional random field
CTR Collaborative topic regression
CVT Compound value typed
DA Denoising autoencoder
DBN Deep belief network
DCN Deep convex net
DNN Deep neural network
DSSM Deep structured semantic model
DST Dialog state tracking
EL Entity linking
EM Expectation maximization
FSM Finite state machine
xv
xvi Acronyms
Abstract In this chapter, we set up the fundamental framework for the book. We
first provide an introduction to the basics of natural language processing (NLP) as an
integral part of artificial intelligence. We then survey the historical development of
NLP, spanning over five decades, in terms of three waves. The first two waves arose
as rationalism and empiricism, paving ways to the current deep learning wave. The
key pillars underlying the deep learning revolution for NLP consist of (1) distributed
representations of linguistic entities via embedding, (2) semantic generalization due
to the embedding, (3) long-span deep sequence modeling of natural language, (4)
hierarchical networks effective for representing linguistic levels from low to high,
and (5) end-to-end deep learning methods to jointly solve many NLP tasks. After
the survey, several key limitations of current deep learning technology for NLP are
analyzed. This analysis leads to five research directions for future advances in NLP.
L. Deng (B)
Citadel, Seattle & Chicago, USA
e-mail: l.deng@ieee.org
Y. Liu
Tsinghua University, Beijing, China
e-mail: liuyang2011@tsinghua.edu.cn
sentiment analysis, social computing, natural language generation, and natural lan-
guage summarization. These NLP application areas form the core content of this
book.
Natural language is a system constructed specifically to convey meaning or seman-
tics, and is by its fundamental nature a symbolic or discrete system. The surface or
observable “physical” signal of natural language is called text, always in a sym-
bolic form. The text “signal” has its counterpart—the speech signal; the latter can
be regarded as the continuous correspondence of symbolic text, both entailing the
same latent linguistic hierarchy of natural language. From NLP and signal processing
perspectives, speech can be treated as “noisy” versions of text, imposing additional
difficulties in its need of “de-noising” when performing the task of understanding the
common underlying semantics. Chapters 2 and 3 as well as current Chap. 1 of this
book cover the speech aspect of NLP in detail, while the remaining chapters start
directly from text in discussing a wide variety of text-oriented tasks that exemplify
the pervasive NLP applications enabled by machine learning techniques, notably
deep learning.
The symbolic nature of natural language is in stark contrast to the continuous
nature of language’s neural substrate in the human brain. We will defer this discussion
to Sect. 1.6 of this chapter when discussing future challenges of deep learning in NLP.
A related contrast is how the symbols of natural language are encoded in several
continuous-valued modalities, such as gesture (as in sign language), handwriting
(as an image), and, of course, speech. On the one hand, the word as a symbol is
used as a “signifier” to refer to a concept or a thing in real world as a “signified”
object, necessarily a categorical entity. On the other hand, the continuous modalities
that encode symbols of words constitute the external signals sensed by the human
perceptual system and transmitted to the brain, which in turn operates in a continuous
fashion. While of great theoretical interest, the subject of contrasting the symbolic
nature of language versus its continuous rendering and encoding goes beyond the
scope of this book.
In the next few sections, we outline and discuss, from a historical perspective, the
development of general methodology used to study NLP as a rich interdisciplinary
field. Much like several closely related sub- and super-fields such as conversational
systems, speech recognition, and artificial intelligence, the development of NLP can
be described in terms of three major waves (Deng 2017; Pereira 2017), each of which
is elaborated in a separate section next.
NLP research in its first wave lasted for a long time, dating back to 1950s. In 1950,
Alan Turing proposed the Turing test to evaluate a computer’s ability to exhibit intelli-
gent behavior indistinguishable from that of a human (Turing 1950). This test is based
on natural language conversations between a human and a computer designed to gen-
erate human-like responses. In 1954, the Georgetown-IBM experiment demonstrated
1 A Joint Introduction to Natural Language Processing and to Deep Learning 3
the first machine translation system capable of translating more than 60 Russian sen-
tences into English.
The approaches, based on the belief that knowledge of language in the human
mind is fixed in advance by generic inheritance, dominated most of NLP research
between about 1960 and late 1980s. These approaches have been called rationalist
ones (Church 2007). The dominance of rationalist approaches in NLP was mainly
due to the widespread acceptance of arguments of Noam Chomsky for an innate
language structure and his criticism of N-grams (Chomsky 1957). Postulating that
key parts of language are hardwired in the brain at birth as a part of the human
genetic inheritance, rationalist approaches endeavored to design hand-crafted rules
to incorporate knowledge and reasoning mechanisms into intelligent NLP systems.
Up until 1980s, most notably successful NLP systems, such as ELIZA for simulating
a Rogerian psychotherapist and MARGIE for structuring real-world information into
concept ontologies, were based on complex sets of handwritten rules.
This period coincided approximately with the early development of artificial
intelligence, characterized by expert knowledge engineering, where domain experts
devised computer programs according to the knowledge about the (very narrow)
application domains they have (Nilsson 1982; Winston 1993). The experts designed
these programs using symbolic logical rules based on careful representations and
engineering of such knowledge. These knowledge-based artificial intelligence sys-
tems tend to be effective in solving narrow-domain problems by examining the
“head” or most important parameters and reaching a solution about the appropriate
action to take in each specific situation. These “head” parameters are identified in
advance by human experts, leaving the “tail” parameters and cases untouched. Since
they lack learning capability, they have difficulty in generalizing the solutions to new
situations and domains. The typical approach during this period is exemplified by
the expert system, a computer system that emulates the decision-making ability of a
human expert. Such systems are designed to solve complex problems by reasoning
about knowledge (Nilsson 1982). The first expert system was created in 1970s and
then proliferated in 1980s. The main “algorithm” used was the inference rules in the
form of “if-then-else” (Jackson 1998). The main strength of these first-generation
artificial intelligence systems is its transparency and interpretability in their (limited)
capability in performing logical reasoning. Like NLP systems such as ELIZA and
MARGIE, the general expert systems in the early days used hand-crafted expert
knowledge which was often effective in narrowly defined problems, although the
reasoning could not handle uncertainty that is ubiquitous in practical applications.
In specific NLP application areas of dialogue systems and spoken language under-
standing, to be described in more detail in Chaps. 2 and 3 of this book, such ratio-
nalistic approaches were represented by the pervasive use of symbolic rules and
templates (Seneff et al. 1991). The designs were centered on grammatical and onto-
logical constructs, which, while interpretable and easy to debug and update, had
experienced severe difficulties in practical deployment. When such systems worked,
they often worked beautifully; but unfortunately this happened just not very often
and the domains were necessarily limited.
4 L. Deng and Y. Liu
The second wave of NLP was characterized by the exploitation of data corpora and
of (shallow) machine learning, statistical or otherwise, to make use of such data
(Manning and Schtze 1999). As much of the structure of and theory about natural
language were discounted or discarded in favor of data-driven methods, the main
approaches developed during this era have been called empirical or pragmatic ones
(Church and Mercer 1993; Church 2014). With the increasing availability of machine-
readable data and steady increase of computational power, empirical approaches have
dominated NLP since around 1990. One of the major NLP conferences was even
named “Empirical Methods in Natural Language Processing (EMNLP)” to reflect
most directly the strongly positive sentiment of NLP researchers during that era
toward empirical approaches.
In contrast to rationalist approaches, empirical approaches assume that the human
mind only begins with general operations for association, pattern recognition, and
generalization. Rich sensory input is required to enable the mind to learn the detailed
structure of natural language. Prevalent in linguistics between 1920 and 1960, empiri-
cism has been undergoing a resurgence since 1990. Early empirical approaches to
NLP focused on developing generative models such as the hidden Markov model
(HMM) (Baum and Petrie 1966), the IBM translation models (Brown et al. 1993),
and the head-driven parsing models (Collins 1997) to discover the regularities of
languages from large corpora. Since late 1990s, discriminative models have become
the de facto approach in a variety of NLP tasks. Representative discriminative mod-
els and methods in NLP include the maximum entropy model (Ratnaparkhi 1997),
supporting vector machines (Vapnik 1998), conditional random fields (Lafferty et al.
2001), maximum mutual information and minimum classification error (He et al.
2008), and perceptron (Collins 2002).
Again, this era of empiricism in NLP was paralleled with corresponding approaches
in artificial intelligence as well as in speech recognition and computer vision. It came
about after clear evidence that learning and perception capabilities are crucial for
complex artificial intelligence systems but missing in the expert systems popular in
the previous wave. For example, when DARPA opened its first Grand Challenge for
autonomous driving, most vehicles then relied on the knowledge-based artificial intel-
ligence paradigm. Much like speech recognition and NLP, the autonomous driving and
1 A Joint Introduction to Natural Language Processing and to Deep Learning 5
understanding and dialogue systems; for a review, see He and Deng (2013). More
specifically, for the dialogue policy component of dialogue systems, powerful rein-
forcement learning based on Markov decision processes had been introduced during
this era; for a review, see Young et al. (2013). And for spoken language understand-
ing, the dominant methods moved from rule- or template-based ones during the first
wave to generative models like hidden Markov models (HMMs) (Wang et al. 2011)
to discriminative models like conditional random fields (Tur and Deng 2011).
Similarly, in speech recognition, over close to 30 years from early 1980 s to around
2010, the field was dominated by the (shallow) machine learning paradigm using the
statistical generative model based on the HMM integrated with Gaussian mixture
models, along with various versions of its generalization (Baker et al. 2009a, b;
Deng and O’Shaughnessy 2003; Rabiner and Juang 1993). Among many versions of
the generalized HMMs were statistical and neural-network-based hidden dynamic
models (Deng 1998; Bridle et al. 1998; Deng and Yu 2007). The former adopted EM
and switching extended Kalman filter algorithms for learning model parameters (Ma
and Deng 2004; Lee et al. 2004), and the latter used backpropagation (Picone et al.
1999). Both of them made extensive use of multiple latent layers of representations for
the generative process of speech waveforms following the long-standing framework
of analysis-by-synthesis in human speech perception. More significantly, inverting
this “deep” generative process to its counterpart of an end-to-end discriminative
process gave rise to the first industrial success of deep learning (Deng et al. 2010,
2013; Hinton et al. 2012), which formed a driving force of the third wave of speech
recognition and NLP that will be elaborated next.
While the NLP systems, including speech recognition, language understanding, and
machine translation, developed during the second wave performed a lot better and
with higher robustness than those during the first wave, they were far from human-
level performance and left much to desire. With a few exceptions, the (shallow)
machine learning models for NLP often did not have the capacity sufficiently large to
absorb the large amounts of training data. Further, the learning algorithms, methods,
and infrastructures were not powerful enough. All this changed several years ago,
giving rise to the third wave of NLP, propelled by the new paradigm of deep-structured
machine learning or deep learning (Bengio 2009; Deng and Yu 2014; LeCun et al.
2015; Goodfellow et al. 2016).
In traditional machine learning, features are designed by humans and feature
engineering is a bottleneck, requiring significant human expertise. Concurrently,
the associated shallow models lack the representation power and hence the ability
to form levels of decomposable abstractions that would automatically disentangle
complex factors in shaping the observed language data. Deep learning breaks away
the above difficulties by the use of deep, layered model structure, often in the form of
neural networks, and the associated end-to-end learning algorithms. The advances in
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
1 A Joint Introduction to Natural Language Processing and to Deep Learning 7
deep learning are one major driving force behind the current NLP and more general
artificial intelligence inflection point and are responsible for the resurgence of neural
networks with a wide range of practical, including business, applications (Parloff
2016).
More specifically, despite the success of (shallow) discriminative models in a
number of important NLP tasks developed during the second wave, they suffered from
the difficulty of covering all regularities in languages by designing features manually
with domain expertise. Besides the incompleteness problem, such shallow models
also face the sparsity problem as features usually only occur once in the training
data, especially for highly sparse high-order features. Therefore, feature design has
become one of the major obstacles in statistical NLP before deep learning comes
to rescue. Deep learning brings hope for addressing the human feature engineering
problem, with a view called “NLP from scratch” (Collobert et al. 2011), which was
in early days of deep learning considered highly unconventional. Such deep learning
approaches exploit the powerful neural networks that contain multiple hidden layers
to solve general machine learning tasks dispensing with feature engineering. Unlike
shallow neural networks and related machine learning models, deep neural networks
are capable of learning representations from data using a cascade of multiple layers of
nonlinear processing units for feature extraction. As higher level features are derived
from lower level features, these levels form a hierarchy of concepts.
Deep learning originated from artificial neural networks, which can be viewed as
cascading models of cell types inspired by biological neural systems. With the advent
of backpropagation algorithm (Rumelhart et al. 1986), training deep neural networks
from scratch attracted intensive attention in 1990s. In these early days, without large
amounts of training data and without proper design and learning methods, during
neural network training the learning signals vanish exponentially with the number
of layers (or more rigorously the depth of credit assignment) when propagated from
layer to layer, making it difficult to tune connection weights of deep neural networks,
especially the recurrent versions. Hinton et al. (2006) initially overcame this problem
by using unsupervised pretraining to first learn generally useful feature detectors.
Then, the network is further trained by supervised learning to classify labeled data.
As a result, it is possible to learn the distribution of a high-level representation using
low-level representations. This seminal work marks the revival of neural networks. A
variety of network architectures have since been proposed and developed, including
deep belief networks (Hinton et al. 2006), stacked auto-encoders (Vincent et al. 2010),
deep Boltzmann machines (Hinton and Salakhutdinov 2012), deep convolutional
neural works (Krizhevsky et al. 2012), deep stacking networks (Deng et al. 2012),
and deep Q-networks (Mnih et al. 2015). Capable of discovering intricate structures
in high-dimensional data, deep learning has since 2010 been successfully applied to
real-world tasks in artificial intelligence including notably speech recognition (Yu
et al. 2010; Hinton et al. 2012), image classification (Krizhevsky et al. 2012; He et al.
2016), and NLP (all chapters in this book). Detailed analyses and reviews of deep
learning have been provided in a set of tutorial survey articles (Deng 2014; LeCun
et al. 2015; Juang 2016).
8 L. Deng and Y. Liu
As speech recognition is one of core tasks in NLP, we briefly discuss it here due to
its importance as the first industrial NLP application in real world impacted strongly
by deep learning. Industrial applications of deep learning to large-scale speech recog-
nition started to take off around 2010. The endeavor was initiated with a collaboration
between academia and industry, with the original work presented at the 2009 NIPS
Workshop on Deep Learning for Speech Recognition and Related Applications. The
workshop was motivated by the limitations of deep generative models of speech, and
the possibility that the big-compute, big-data era warrants a serious exploration of
deep neural networks. It was believed then that pretraining DNNs using generative
models of deep belief nets based on the contrastive divergence learning algorithm
would overcome the main difficulties of neural nets encountered in the 1990s (Dahl
et al. 2011; Mohamed et al. 2009). However, early into this research at Microsoft, it
was discovered that without contrastive divergence pretraining, but with the use of
large amounts of training data together with the deep neural networks designed with
corresponding large, context-dependent output layers and with careful engineering,
dramatically lower recognition errors could be obtained than then-state-of-the-art
(shallow) machine learning systems (Yu et al. 2010, 2011; Dahl et al. 2012). This
finding was quickly verified by several other major speech recognition research
groups in North America (Hinton et al. 2012; Deng et al. 2013) and subsequently
overseas. Further, the nature of recognition errors produced by the two types of sys-
tems was found to be characteristically different, offering technical insights into how
to integrate deep learning into the existing highly efficient, run-time speech decod-
ing system deployed by major players in speech recognition industry (Yu and Deng
2015; Abdel-Hamid et al. 2014; Xiong et al. 2016; Saon et al. 2017). Nowadays,
backpropagation algorithm applied to deep neural nets of various forms is uniformly
used in all current state-of-the-art speech recognition systems (Yu and Deng 2015;
Amodei et al. 2016; Saon et al. 2017), and all major commercial speech recogni-
tion systems—Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google
Assistant, Apple Siri, Baidu and iFlyTek voice search, and more—are all based on
deep learning methods.
The striking success of speech recognition in 2010–2011 heralded the arrival of
the third wave of NLP and artificial intelligence. Quickly following the success of
deep learning in speech recognition, computer vision (Krizhevsky et al. 2012) and
machine translation (Bahdanau et al. 2015) were taken over by the similar deep
learning paradigm. In particular, while the powerful technique of neural embedding
of words was developed in as early as 2011 (Bengio et al. 2001), it is not until more
than 10 year later it was shown to be practically useful at a large and practically useful
scale (Mikolov et al. 2013) due to the availability of big data and faster computation.
In addition, a large number of other real-world NLP applications, such as image
captioning (Karpathy and Fei-Fei 2015; Fang et al. 2015; Gan et al. 2017), visual
question answering (Fei-Fei and Perona 2016), speech understanding (Mesnil et al.
2013), web search (Huang et al. 2013b), and recommendation systems, have been
made successful due to deep learning, in addition to many non-NLP tasks including
drug discovery and toxicology, customer relationship management, recommendation
systems, gesture recognition, medical informatics, advertisement, medical image
1 A Joint Introduction to Natural Language Processing and to Deep Learning 9
analysis, robotics, self-driving vehicles, board and eSports games (e.g., Atari, Go,
Poker, and the latest, DOTA2), and so on. For more details, see https://en.wikipedia.
org/wiki/deep_learning.
In more specific text-based NLP application areas, machine translation is perhaps
impacted the most by deep learning. Advancing from the shallow statistical machine
translation developed during the second wave of NLP, the current best machine
translation systems in real-world applications are based on deep neural networks. For
example, Google announced the first stage of its move to neural machine translation
in September 2016 and Microsoft made a similar announcement 2 months later.
Facebook has been working on the conversion to neural machine translation for
about a year, and by August 2017 it is at full deployment. Details of the deep learning
techniques in these state-of-the-art large-scale machine translation systems will be
reviewed in Chap. 6.
In the area of spoken language understanding and dialogue systems, deep learning
is also making a huge impact. The current popular techniques maintain and expand
the statistical methods developed during second-wave era in several ways. Like the
empirical, (shallow) machine learning methods, deep learning is also based on data-
intensive methods to reduce the cost of hand-crafted complex understanding and
dialogue management, to be robust against speech recognition errors under noise
environments and against language understanding errors, and to exploit the power
of Markov decision processes and reinforcement learning for designing dialogue
policy, e.g., (Gasic et al. 2017; Dhingra et al. 2017). Compared with the earlier
methods, deep neural network models and representations are much more powerful
and they make end-to-end learning possible. However, deep learning has not yet
solved the problems of interpretability and domain scalability associated with earlier
empirical techniques. Details of the deep learning techniques popular for current
spoken language understanding and dialogue systems as well as their challenges
will be reviewed in Chaps. 2 and 3.
Two important recent technological breakthroughs brought about in applying deep
learning to NLP problems are sequence-to-sequence learning (Sutskevar et al. 2014)
and attention modeling (Bahdanau et al. 2015). The sequence-to-sequence learning
introduces a powerful idea of using recurrent nets to carry out both encoding and
decoding in an end-to-end manner. While attention modeling was initially developed
to overcome the difficulty of encoding a long sequence, subsequent developments
significantly extended its power to provide highly flexible alignment of two arbitrary
sequences that can be learned together with neural network parameters. The key
concepts of sequence-to-sequence learning and of attention mechanism boosted the
performance of neural machine translation based on distributed word embedding over
the best system based on statistical learning and local representations of words and
phrases. Soon after this success, these concepts have also been applied successfully
to a number of other NLP-related tasks such as image captioning (Karpathy and
Fei-Fei 2015; Devlin et al. 2015), speech recognition (Chorowski et al. 2015), meta-
learning for program execution, one-shot learning, syntactic parsing, lip reading, text
understanding, summarization, and question answering and more.
10 L. Deng and Y. Liu
Before analyzing the future dictions of NLP with more advanced deep learning, here
we first summarize the significance of the transition from the past waves of NLP to
the present one. We then discuss some clear limitations and challenges of the present
deep learning technology for NLP, to pave a way to examining further development
that would overcome these limitations for the next wave of innovations.
1 A Joint Introduction to Natural Language Processing and to Deep Learning 11
On the surface, the deep learning rising wave discussed in Sect. 1.4 in this chapter
appears to be a simple push of the second, empiricist wave of NLP (Sect. 1.3) into
an extreme end with bigger data, larger models, and greater computing power. After
all, the fundamental approaches developed during both waves are data-driven and
are based on machine learning and computation, and have dispensed with human-
centric “rationalistic” rules that are often brittle and costly to acquire in practical
NLP applications. However, if we analyze these approaches holistically and at a
deeper level, we can identify aspects of conceptual revolution moving from empiricist
machine learning to deep learning, and can subsequently analyze the future directions
of the field (Sect. 1.6). This revolution, in our opinion, is no less significant than the
revolution from the earlier rationalist wave to empiricist one as analyzed at the
beginning (Church and Mercer 1993) and at the end of the empiricist era (Charniak
2011).
Empiricist machine learning and linguistic data analysis during the second NLP
wave started in early 1990 s by crypto-analysts and computer scientists working
on natural language sources that are highly limited in vocabulary and application
domains. As we discussed in Sect. 1.3, surface-level text observations, i.e., words
and their sequences, are counted using discrete probabilistic models without relying
on deep structure in natural language. The basic representations were “one-hot” or
localist, where no semantic similarity between words was exploited. With restric-
tions in domains and associated text content, such structure-free representations and
empirical models are often sufficient to cover much of what needs to be covered.
That is, the shallow, count-based statistical models can naturally do well in limited
and specific NLP tasks. But when the domain and content restrictions are lifted for
more realistic NLP applications in real-world, count-based models would necessarily
become ineffective, no manner how many tricks of smoothing have been invented
in an attempt to mitigate the problem of combinatorial counting sparseness. This
is where deep learning for NLP truly shines—distributed representations of words
via embedding, semantic generalization due to the embedding, longer span deep
sequence modeling, and end-to-end learning methods have all contributed to beat-
ing empiricist, count-based methods in a wide range of NLP tasks as discussed in
Sect. 1.4.
Despite the spectacular successes of deep learning in NLP tasks, most notably in
speech recognition/understanding, language modeling, and in machine translation,
there remain huge challenges. The current deep learning methods based on neu-
ral networks as a black box generally lack interpretability, even further away from
explainability, in contrast to the “rationalist” paradigm established during the first
Random documents with unrelated
content Scribd suggests to you:
workmen, who were allowed to keep on digging and finding all day
long; and scarcely had it struck twelve when we flew to the
Kerkplein, to see what these greedy persons had left us, and to
discover anything that might have escaped their search.
But we found nothing—neither did the diggers. Most of the boys,
therefore, gave up the search—I, alone, did not. I was seeking a dead,
unknown hero,—while they were looking only for coins and
nicknacks. I knew for certain that I should find something, when
there were not so many eyes on the watch, and therefore I remained
away from school one morning in order to go to the old churchyard.
For a long time nothing at all had been found—not even bones or
mouldering boards; so that all the other boys too—those who did not
belong to our school—had grown tired of coming.
Luck, however, was with me!
On one particular spot, at some distance from the church, pieces of
skeletons again began to be dug up. The workmen examined the
earth to see if it contained anything of value, but found nothing. My
eager eye, however, spied among the clods a lump of a different
colour. I loosened the earth from it, and found, to my great joy, a
flattened bullet.
That was a discovery!
I turned over the heap of earth, and thus came into possession of
six bullets, and a little copper plate covered with earth and rust.
The bullets!—My hero was found!
Reverently I picked up some bones which had been thrown aside,
and carefully packed the remains of my hero in my school satchel.
My hero!—a real hero now!—not an imaginary one, like Frans, the
odd man.
When I came home, in a tumult of joyful excitement, I secured my
treasure safely in my play-box, to which I had a key. And then I had
my hero safe—all to myself!
At dinner, I looked round triumphantly, and felt the deepest
disdain for my parents and sisters. They had never made such a
discovery! They could not even understand what it is to possess the
very remains of a hero—the hero himself! I scarcely ate anything for
pride and joy, till my mother said—
“Why, Con, you’re not eating. Are you not well?”
I could only stammer a few words, and then thrust a whole potato
into my mouth in order to prove my appetite, which, happily,
reassured my mother.
As soon as dinner was over, I darted to my own room to assure
myself that I had not been dreaming, and that my hero existed in
very truth. The bones and bullets, and the little metal plate, were
there still.
I contemplated them all once more, with a look full of love and
reverence, and went downstairs again, so as to arouse no suspicion.
Never had I been a better-behaved boy than on that evening. I
played with my little sister as nicely as possible; I was obedient as I
had never been before,—all for fear that some unlucky circumstance
might lead to a discovery of my hero on the part of my parents.
At last it was time to go to bed. At last I was alone with the sacred
relics of the man who had stood six bullets, without reckoning the
innumerable wounds—to be taken for granted—on his breast!
I gazed at the bones, brown and dirty as they looked—at the
flattened bullets, and rusty bit of metal, with deep reverence. The
plate probably bore his name; but if so, it was illegible with the dirt.
Should I clean it? I burned with eagerness to know his name, and felt
half inclined to do it; but desisted, thinking that, being rusty, and
covered with earth, it would prove its age much better than if it were
bright and polished up like new.
At last, after long contemplation of my treasures, I locked them up,
and put the key under my pillow, for fear of burglars. Once in bed,
however, I could get no sleep. All sorts of ideas relating to my hero
crossed and recrossed my brain.
In the first place, I resolved to make a secret of him. It is a glorious
thing to have a secret all to one’s self—and such a secret!
It was settled, then—no one was to see or hear anything of him. I
alone was to possess my Hero, and be able to worship Him.
Then I began to wonder who he could have been, and when he had
lived, and where he had fought and died.
It was quite clear to me that the six bullets represented but a small
part of his wounds, for it was not possible that he had been killed on
the field of battle by the sixth of those bullets. I knew that the fallen
are always buried on the field of honour. Therefore he must have
died of other wounds,—probably sword-cuts, lance-thrusts, or the
like.... Then I fancied all sorts of biographies for my hero.
I should have liked best of all for him to have been a Crusader; but
I was forced to give up that idea, seeing that in those days there were
no guns, and therefore no bullets.
I therefore resolved to seek in more modern times.
A Water Gueux slain in fight? That, too, would not do. They were
wrapped in a flag, and with a “One, two, three—in God’s name,” let
down into the sea.
I weighed all possible cases—to reject them again immediately.
At last I hit upon the following, which satisfied me pretty well:—
My hero had fought in Napoleon’s wars, and was for his valour
promoted by the great Emperor to the rank of general. In all battles
he had been foremost, and many a wound bore witness to his
courage. Napoleon had even chosen out a kingdom for him; when
fortune changed, and all nations rose to free themselves from the
power of the great conqueror.
Then my hero had left his place in the army, and his exalted
offices, and had ranged himself under his country’s flag to serve her
as a private soldier.
After giving numerous proofs of courage, he was so severely
wounded at the battle of Waterloo,—where he defended the colours
of his regiment, single-handed, against a large number of the foe,—
that he felt his end approaching. And when he knew that the victory
was won, he dragged himself home to his native town to die.
His funeral was a splendid one, and the fallen hero was buried in a
spot apart from others, who were not thought worthy to be near him,
even in death.
This last circumstance I added, after long consideration, to explain
the isolated position of my hero’s grave.
Another difficulty, however, presented itself. Why was there no
monument erected to him?
The solution of this question cost me no little trouble. In our
church there were two splendid monuments, with beautiful Latin
verses on them; and the men who slept under them were of far less
importance than my hero. But here, too, there was an explanation.
My hero himself had said on his deathbed that he did not wish for a
monument, but preferred to rest simply under the green grass;—his
name would live well enough without one!
This, however, raised a new difficulty. I had never heard of any
hero buried in the former burying-ground close to the church.
Happily, however, I remembered to have read somewhere that
“ingratitude is the world’s reward.”
He was forgotten!
That grieved me deeply; but I determined with myself to revive the
memory of his name, when I should be somewhat older, and could
write in the papers, and become a member of the Useful Knowledge
Society. Then I would tell people how great my hero had been, and
how ungratefully the world had treated him. Till then, he should
remain my secret.
Of course I had adorned him with all sorts of chivalric qualities. I
had seen him in my thoughts as the protector of helpless women, as
the avenger of wrong; I had seen him risk his life at the command of
his superiors, and in order to win one look from his lady.
And I had ended by endowing him with the crowning grace of
modesty. Of this I was not a little proud. I knew for certain that all
the other boys’ heroes would be brutal and arrogant, and set upon
getting monuments for themselves.
Mine, however, was modest ... and his reward was oblivion.... Yes
—till I should arise ... then my hero should be greater than all others.
Happy that now I knew all about my hero, youth and excitement
were too much for me, and I fell asleep.
Next morning I arose, no longer a boy—not even a man. I was a
great man. I had a task before me. I must give back to my hero his
just fame and honours.
I had even assumed a new manner!—marbles and suchlike games
were now beneath me,—and I thought the other boys uninteresting
and childish. They, on their part, soon found that I had become
tiresome and pedantic, and asked me if I had come in for a fortune,
and was now too much of a swell for them. I only laughed, and
wrapped myself once more in my own glory.
This lasted a few days, and then I began to find out that the
solitary enjoyment of glory and a secret was not so great a pleasure
as I had thought. Happily I had two bosom friends—Wil and Ed.
I resolved, after many heart searchings of heart, to share my
wealth with these two. After I had sworn them to secrecy, and also
exacted a solemn promise that they would not endeavour to
appropriate my hero to themselves, I told them of my discovery, and
all I knew of him,—for what I had myself imagined now seemed like
truth to me. I enjoyed their evident jealousy, and, still more, their
admiration and reverence for me.
“But, Con,” said Wil at last, “what is the hero’s name, really?”
I stood aghast. I had never thought of that! But they shall never
exult over me because I did not know the name of my own hero. So I
mentioned the first name that came into my head—“Jan Liller.”
“DID MY FATHER
SUSPECT ANYTHING?”
That was true—for just a week ago I had found my hero—and after
I had found him, I was satisfied. The charm of rooting about among
the graves had vanished. There was nothing more to find now.
“I only thought,” said my father, jokingly, “that you had found a
treasure—you are so mysterious lately. Say, my boy, have you grown
rich, and are you going to keep your money all to yourself?”
“I have never found anything, father,” I stammered, full of shame
at the lie, and yet full of satisfaction at my courage—in daring to tell a
falsehood to save my hero from discovery.
Happily my father changed the subject, by asking me if I had any
present in view for mother’s birthday. To be honest, I had to answer
no, for my hero had taken up all my thoughts and energies. But just
as I was thinking what to say, a great, a glorious idea rose up in me.
What could be a better present for my mother than my hero?
At the sacrifice of my secret,—of my own discovery,—I would
surprise her with the revelation of my find, and share my hero with
her! It was a hard struggle, but, once resolved, I could say with
cheerful assurance, “Yes, father, I have something very, very nice!”
“That’s good, my boy!” said my father, as he patted me approvingly
on the shoulder. “Do your best to make it so, for your mother
deserves it.”
Since the old churchyard had been mentioned, I was eager to find
out if my father knew anything about my hero. Therefore I asked,
with as careless an air as I could assume—
“Say, father, who used to be buried in that place round the
church?”
“Why, my boy, I don’t know. It must be at least fifty years since
that burying-ground was used. When I came to live here the new
cemetery was already opened, and I really do not know who was
buried in the old place.”
“But, father, did you never hear of any one that was buried there?”
“No,” said my father; but, after thinking a little, he went on: “Yes, I
do, though! they buried Kees Van Assen there. I heard so the other
day from Notary Van Tefelen.”
Could that be my hero? It might well be, why else should the old
burying-ground have been mentioned at the notary’s?
Surely, then, he must have been a great-uncle or distant cousin of
that odious Alfred, whom we always called “the Muff,” because he
never would join our games for fear of getting bruised and scratched,
or soiling his clothes and hands.
“Was Van Assen a hero, father?” I uttered the words with
difficulty.
“A hero, my boy? No, certainly not. No, quite the contrary!”
“A coward, father? I thought as much.”
“Indeed, and why?”
“Because that stupid boy of the minister’s is Van Assen too, and he
is a coward!”
“That does not follow. This Van Assen was not in any way related
to the minister’s family. At least I believe not. But he was not a
coward, he was far worse. He was a traitor to his country. He
betrayed the town to the French.”
“And what did they do to a low fellow like that?” I asked, full of
pain and indignation that a countryman of mine could have betrayed
his native town to the enemy.
“At first, nothing; for at that time he was protected by the French.
But when they were gone, his fellow-townsmen razed his house to
the ground, and he was shot.”
“Then he was buried in the churchyard?”
“Well, yes; because his family was a rich and distinguished one,
they consented to bury him in the churchyard; but, of course, it was
done without show or splendour. I know no more about it.”
“Don’t you know in which corner he was buried?”
“Yes, the corner by the baker’s shop.”
Now there were two corners of the churchyard which had a baker’s
shop near them. Near one of them, I had found my hero; but he was
called Jan Liller, and not Van Assen! I resolved never more to buy
tarts or buns in the corner where the traitor was buried,—that was
accursed from henceforth. We had been in the habit of going there,
because we got far more for our money than elsewhere.
It now at once became clear to me that this baker knew of the
traitor’s neighbourhood, and was afraid of losing his customers
unless he sold his goods very cheap!
I had not thus gained much information by my inquiries. Only I
had found a new point of comparison, my hero versus Van Assen!
Jan Liller was dearer to me than before, now that I could contrast
him with a contemptible Van Assen! My hero had become greater
than ever!
As soon as I reached home, I ran to my own little room, in order to
gaze my fill on his relics—to steep my soul in his greatness.
On the stairs I felt for my key.
What was that? It was not in my pocket! I had not lost it—I was
certain of that. Then I must have left it sticking in my box, and in
that case my secret—my hero was lost!
A terrible fear overcame me. My steps dragged on the stairs. With
a sinking heart I opened my door,—my presentiment had not
deceived me!
There stood my little sister before the open box!
“You horrid girl—what are you doing with my things? Keep off!” I
screamed, when I saw my secret revealed.
“But, Con! you had left the key in the lock, and I just looked in!”
cried my sister, terrified.
“DASHED THEM ON
THE GROUND!”
“Yes—it’s just like girls—always bothering about things that don’t
concern them. You’re always meddling with everything, and spoiling
other people’s things!”
“Oh, Con! don’t be so angry! I only just wanted to look! And, just
see,—I’ve been cleaning up this dirty little brass plate that was inside.
I’ve made it look quite nice—and there’s some writing on it.”
At the same time she thrust the now glittering brass plate into my
hands.
I looked at it.
Everything seemed to turn round with me. Everything was black. I
could see nothing but the glittering yellow plate, and the name
engraved on it:—
KEES VAN ASSEN,
1813.
I dropped the brass plate, seized the bones and the bullets out of
the box, and dashed them on the ground.
There lay my hero!
Conrad van der Liede.
NEWSPAPER HUMOUR.
FLORIST: “LOOK AT
THE BLUSH ON
THESE ROSES, SIR.”
Domestic Morality.
“You have not been looking sharp, Bet; the butcher has given you
more bones than meat again.”
Bet. “Well, I told him so at the time. I said, if it was for myself I
wouldn’t take it.”
Actor. “When I was last acting here, the public were so
enthusiastic, you can’t imagine. Why, they insisted on carrying me
back to the hotel, when I left the theatre.”
Critic. “Man, man, you don’t mean to say you were so far gone as
that?”
A Harmless Insect.
Traveller. “Waiter, how can you give me soup like this?—there’s a
fly in it.”
Waiter. “Oh! that won’t hurt you—it’s quite dead.”
“Why, ma’am,” said the housemaid, when she heard that her
mistress had been very unwell during the night, “why didn’t you tell
me when you felt ill?”
“I didn’t want to wake you, you had been working hard all day, and
——”
“Oh! that’s nothing; you might have called me all the same. I sleep
so sound that you would never have waked me.”
A RASCALLY VALET.
... well, never mind; I shall hit it some other time. How is one to find
a rhyme to “world”? I ought to have begun differently. But what I
mean is, that if ever I find myself mixed up in this fâcheuse affaire of
the cheques, I shall at once make my native country a present of my
citizenship, and take shares in France. Then I shall have the right of
being attended to by a French court, take a few circonstances
atténuantes with me, and Frans is all right—quits the court without a
stain on his character!
A greater fool than that young Huser I never saw in all my born
days. Who ever heard of a man letting himself be ill-treated in the
place of another? I never could understand that story. The imbecile! I
wish I knew where to find him; I should go to him, and say, “Huser,
my dear boy, we’ve made a mistake again about a signature or two;
do make yourself responsible for the error—there’s a good fellow.” I
believe, upon my soul! the man would be fool enough to do it over
again. I can’t explain the matter, but I dare swear that Huser, in spite
of his sour face, was the most faithful chum in the world. I would bet
something that he had been brought up at Paris, or at least had a
French nurse or a Swiss bonne. Sacrifices like that it would be vain to
look for elsewhere ... [A knock at the door.] Ho, hey! entrez.
[Sits down on a chair in the middle of the stage, stretching his
legs straight out in front of him.]
Look here now, Frans, you must represent your master, the noble
and honourable gentleman, Jonkheer Karel Bernhard Anton Jozef
Delmare Van Bergen Van Wiesendaal! (Raises his voice.) Entrez!
Enter General Van Weller, in undress uniform, with a riding-
whip in his hand.
Frans (without looking round). Who’s there?
Van Weller. Look round, and perhaps you’ll know.
Frans. I’m just like Louis Napoleon’s knights—I do well, but I
don’t care to look round.[7]
Van Weller (looks round with displeasure, then approaches and
gazes fixedly for a few moments at Frans). No, you are not he. You
are too low-looking a fellow to be my nephew. Who are you?
Frans (without changing his position). In the first place, or, as we
used to say at Paris, primo, I must request you to allow me to express
my thanks for your very flattering opinion with regard to my
physiognomy; secundo, it would be the proper thing for you to do me
the favour of informing me of your name.
“ENTREZ.”
Sophie. Oh! (Turns to Van Weller.) Please, sir, may I ask why you
wouldn’t let Andries go out on leave the other day? That was not nice
of you. I’d——
Van Weller (to Andries). Who is this pretty child?
Andries. She’s Mam’zelle Sophie, Freule Van Wachler’s maid.
Van Weller. Why, that’s fortunate! (To Sophie.) Well, and how is
my sister Koosje?
Sophie (surprised). Your sister Koosje, sir?
Van Weller. Yes—Mevrouw Wachler!
Sophie (curtseys). Very well, sir. (To Andries.) Think of that—I
never knew that the mistress’s Christian name was Koosje. Why,
that’s a name any one of us might have!
Van Weller (to Andries). Are you in love with this charming
creature?
Andries. With your permission,—yes, General.
Sophie. That’s nice of you, Andries! I never asked any one’s
permission. And supposing the gentleman were to say, No?
Van Weller. Well, well,—you may make your mind easy,—I won’t
say No! (To Andries, calling him aside.) Something has just occurred
to me. I don’t want that rascal to tell his master that I have been
questioning him. Does he care about...? [makes the gesture of
drinking].
Andries. Yes, General.
Van Weller. And try to persuade your sweetheart to stay here a
little. I should like to talk to her.
Andries. Yes, General. (To Sophie.) Just give your note to the
General, Sophie, and answer him nicely if he asks anything, and be
as pleasant and polite as you can. Remember, he can let me off on
leave!
[Salutes, and turns to go, but comes back.]
Van Weller. Well—what is it now?
Andries. Am I to be drunk, too, General?
Van Weller. No need for that! March!
[Exit Andries.
Sophie. Do you wish to take the note, sir?
Van Weller. Just lay it down here.
Sophie. But I think there’s some hurry about it. The young
gentleman said I was to bring a key back with me.
[Van Weller takes the note, and reads it, with gestures of
astonishment. Looking round, he sees the casket. He goes up
to it, and stands still, lost in thought. At last he takes the key
out of the lock and gives it to Sophie.]
Van Weller. There’s the key, my girl. You have done your errand
well—so go now—just go.
Sophie. But Andries said you wanted to talk to me.