100% found this document useful (1 vote)
68 views

Instant Download Deep Learning in Natural Language Processing 1st Edition Li Deng PDF All Chapters

Li

Uploaded by

tebbshyduktk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
68 views

Instant Download Deep Learning in Natural Language Processing 1st Edition Li Deng PDF All Chapters

Li

Uploaded by

tebbshyduktk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Download the Full Version of textbook for Fast Typing at textbookfull.

com

Deep Learning in Natural Language Processing 1st


Edition Li Deng

https://textbookfull.com/product/deep-learning-in-natural-
language-processing-1st-edition-li-deng/

OR CLICK BUTTON

DOWNLOAD NOW

Download More textbook Instantly Today - Get Yours Now at textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Deep learning in natural language processing Deng

https://textbookfull.com/product/deep-learning-in-natural-language-
processing-deng/

textboxfull.com

Deep Learning for Natural Language Processing Develop Deep


Learning Models for Natural Language in Python Jason
Brownlee
https://textbookfull.com/product/deep-learning-for-natural-language-
processing-develop-deep-learning-models-for-natural-language-in-
python-jason-brownlee/
textboxfull.com

Python Natural Language Processing Advanced machine


learning and deep learning techniques for natural language
processing 1st Edition Jalaj Thanaki
https://textbookfull.com/product/python-natural-language-processing-
advanced-machine-learning-and-deep-learning-techniques-for-natural-
language-processing-1st-edition-jalaj-thanaki/
textboxfull.com

Applied Natural Language Processing with Python:


Implementing Machine Learning and Deep Learning Algorithms
for Natural Language Processing 1st Edition Taweh Beysolow
Ii
https://textbookfull.com/product/applied-natural-language-processing-
with-python-implementing-machine-learning-and-deep-learning-
algorithms-for-natural-language-processing-1st-edition-taweh-beysolow-
ii/
textboxfull.com
Deep Learning for Natural Language Processing (MEAP V07)
Stephan Raaijmakers

https://textbookfull.com/product/deep-learning-for-natural-language-
processing-meap-v07-stephan-raaijmakers/

textboxfull.com

Natural language processing with TensorFlow Teach language


to machines using Python s deep learning library 1st
Edition Thushan Ganegedara
https://textbookfull.com/product/natural-language-processing-with-
tensorflow-teach-language-to-machines-using-python-s-deep-learning-
library-1st-edition-thushan-ganegedara/
textboxfull.com

Natural Language Processing 1st Edition Jacob Eisenstein

https://textbookfull.com/product/natural-language-processing-1st-
edition-jacob-eisenstein/

textboxfull.com

Machine Learning with PySpark: With Natural Language


Processing and Recommender Systems 1st Edition Pramod
Singh
https://textbookfull.com/product/machine-learning-with-pyspark-with-
natural-language-processing-and-recommender-systems-1st-edition-
pramod-singh/
textboxfull.com

Natural Language Processing in Artificial Intelligence 1st


Edition Brojo Kishore Mishra

https://textbookfull.com/product/natural-language-processing-in-
artificial-intelligence-1st-edition-brojo-kishore-mishra/

textboxfull.com
Li Deng · Yang Liu Editors

Deep Learning
in Natural
Language
Processing
Deep Learning in Natural Language Processing
Li Deng Yang Liu

Editors

Deep Learning in Natural


Language Processing

123
Editors
Li Deng Yang Liu
AI Research at Citadel Tsinghua University
Chicago, IL Beijing
USA China

and

AI Research at Citadel
Seattle, WA
USA

ISBN 978-981-10-5208-8 ISBN 978-981-10-5209-5 (eBook)


https://doi.org/10.1007/978-981-10-5209-5
Library of Congress Control Number: 2018934459

© Springer Nature Singapore Pte Ltd. 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
part of Springer Nature
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Foreword

“Written by a group of the most active researchers in the field, led by Dr. Deng, an
internationally respected expert in both NLP and deep learning, this book provides
a comprehensive introduction to and up-to-date review of the state of art in applying
deep learning to solve fundamental problems in NLP. Further, the book is highly
timely, as demands for high-quality and up-to-date textbooks and research refer-
ences have risen dramatically in response to the tremendous strides in deep learning
applications to NLP. The book offers a unique reference guide for practitioners in
various sectors, especially the Internet and AI start-ups, where NLP technologies
are becoming an essential enabler and a core differentiator.”
Hongjiang Zhang (Founder, Sourcecode Capital; former CEO of KingSoft)
“This book provides a comprehensive introduction to the latest advances in deep
learning applied to NLP. Written by experienced and aspiring deep learning and
NLP researchers, it covers a broad range of major NLP applications, including
spoken language understanding, dialog systems, lexical analysis, parsing, knowl-
edge graph, machine translation, question answering, sentiment analysis, and social
computing.
The book is clearly structured and moves from major research trends, to the
latest deep learning approaches, to their limitations and promising future work.
Given its self-contained content, sophisticated algorithms, and detailed use cases,
the book offers a valuable guide for all readers who are working on or learning
about deep learning and NLP.”
Haifeng Wang (Vice President and Head of Research, Baidu; former President
of ACL)
“In 2011, at the dawn of deep learning in industry, I estimated that in most speech
recognition applications, computers still made 5 to 10 times more errors than human
subjects, and highlighted the importance of knowledge engineering in future
directions. Within only a handful of years since, deep learning has nearly closed the
gap in the accuracy of conversational speech recognition between human and
computers. Edited and written by Dr. Li Deng—a pioneer in the recent speech

v
vi Foreword

recognition revolution using deep learning—and his colleagues, this book elegantly
describes this part of the fascinating history of speech recognition as an important
subfield of natural language processing (NLP). Further, the book expands this
historical perspective from speech recognition to more general areas of NLP,
offering a truly valuable guide for the future development of NLP.
Importantly, the book puts forward a thesis that the current deep learning trend is
a revolution from the previous data-driven (shallow) machine learning era, although
ostensibly deep learning appears to be merely exploiting more data, more com-
puting power, and more complex models. Indeed, as the book correctly points out,
the current state of the art of deep learning technology developed for NLP appli-
cations, despite being highly successful in solving individual NLP tasks, has not
taken full advantage of rich world knowledge or human cognitive capabilities.
Therefore, I fully embrace the view expressed by the book’s editors and authors that
more advanced deep learning that seamlessly integrates knowledge engineering will
pave the way for the next revolution in NLP.
I highly recommend speech and NLP researchers, engineers, and students to read
this outstanding and timely book, not only to learn about the state of the art in NLP
and deep learning, but also to gain vital insights into what the future of the NLP
field will hold.”
Sadaoki Furui (President, Toyota Technological Institute at Chicago)
Preface

Natural language processing (NLP), which aims to enable computers to process


human languages intelligently, is an important interdisciplinary field crossing
artificial intelligence, computing science, cognitive science, information processing,
and linguistics. Concerned with interactions between computers and human lan-
guages, NLP applications such as speech recognition, dialog systems, information
retrieval, question answering, and machine translation have started to reshape the
way people identify, obtain, and make use of information.
The development of NLP can be described in terms of three major waves:
rationalism, empiricism, and deep learning. In the first wave, rationalist approaches
advocated the design of handcrafted rules to incorporate knowledge into NLP
systems based on the assumption that knowledge of language in the human mind is
fixed in advance by generic inheritance. In the second wave, empirical approaches
assume that rich sensory input and the observable language data in surface form are
required and sufficient to enable the mind to learn the detailed structure of natural
language. As a result, probabilistic models were developed to discover the regu-
larities of languages from large corpora. In the third wave, deep learning exploits
hierarchical models of nonlinear processing, inspired by biological neural systems
to learn intrinsic representations from language data, in ways that aim to simulate
human cognitive abilities.
The intersection of deep learning and natural language processing has resulted in
striking successes in practical tasks. Speech recognition is the first industrial NLP
application that deep learning has strongly impacted. With the availability of
large-scale training data, deep neural networks achieved dramatically lower
recognition errors than the traditional empirical approaches. Another prominent
successful application of deep learning in NLP is machine translation. End-to-end
neural machine translation that models the mapping between human languages
using neural networks has proven to improve translation quality substantially.
Therefore, neural machine translation has quickly become the new de facto tech-
nology in major commercial online translation services offered by large technology
companies: Google, Microsoft, Facebook, Baidu, and more. Many other areas of
NLP, including language understanding and dialog, lexical analysis and parsing,

vii
viii Preface

knowledge graph, information retrieval, question answering from text, social


computing, language generation, and text sentiment analysis, have also seen much
significant progress using deep learning, riding on the third wave of
NLP. Nowadays, deep learning is a dominating method applied to practically all
NLP tasks.
The main goal of this book is to provide a comprehensive survey on the recent
advances in deep learning applied to NLP. The book presents state of the art of
NLP-centric deep learning research, and focuses on the role of deep learning played
in major NLP applications including spoken language understanding, dialog sys-
tems, lexical analysis, parsing, knowledge graph, machine translation, question
answering, sentiment analysis, social computing, and natural language generation
(from images). This book is suitable for readers with a technical background in
computation, including graduate students, post-doctoral researchers, educators, and
industrial researchers and anyone interested in getting up to speed with the latest
techniques of deep learning associated with NLP.
The book is organized into eleven chapters as follows:
• Chapter 1: A Joint Introduction to Natural Language Processing and to Deep
Learning (Li Deng and Yang Liu)
• Chapter 2: Deep Learning in Conversational Language Understanding (Gokhan
Tur, Asli Celikyilmaz, Xiaodong He, Dilek Hakkani-Tür, and Li Deng)
• Chapter 3: Deep Learning in Spoken and Text-Based Dialog Systems
(Asli Celikyilmaz, Li Deng, and Dilek Hakkani-Tür)
• Chapter 4: Deep Learning in Lexical Analysis and Parsing (Wanxiang Che and
Yue Zhang)
• Chapter 5: Deep Learning in Knowledge Graph (Zhiyuan Liu and Xianpei Han)
• Chapter 6: Deep Learning in Machine Translation (Yang Liu and Jiajun Zhang)
• Chapter 7: Deep Learning in Question Answering (Kang Liu and Yansong Feng)
• Chapter 8: Deep Learning in Sentiment Analysis (Duyu Tang and Meishan
Zhang)
• Chapter 9: Deep Learning in Social Computing (Xin Zhao and Chenliang Li)
• Chapter 10: Deep Learning in Natural Language Generation from Images
(Xiaodong He and Li Deng)
• Chapter 11: Epilogue (Li Deng and Yang Liu)
Chapter 1 first reviews the basics of NLP as well as the main scope of NLP
covered in the following chapters of the book, and then goes in some depth into the
historical development of NLP summarized as three waves and future directions.
Subsequently, in Chaps. 2–10, an in-depth survey on the recent advances in deep
learning applied to NLP is organized into nine separate chapters, each covering a
largely independent application area of NLP. The main body of each chapter is
written by leading researchers and experts actively working in the respective field.
The origin of this book was the set of comprehensive tutorials given at the 15th
China National Conference on Computational Linguistics (CCL 2016) held in
October 2016 in Yantai, Shandong, China, where both of us, editors of this book,
Preface ix

were active participants and were taking leading roles. We thank our Springer’s
senior editor, Dr. Celine Lanlan Chang, who kindly invited us to create this book
and who has been providing much of timely assistance needed to complete this
book. We are grateful also to Springer’s Assistant Editor, Jane Li, for offering
invaluable help through various stages of manuscript preparation.
We thank all authors of Chaps. 2–10 who devoted their valuable time carefully
preparing the content of their chapters: Gokhan Tur, Asli Celikyilmaz, Dilek
Hakkani-Tur, Wanxiang Che, Yue Zhang, Xianpei Han, Zhiyuan Liu, Jiajun Zhang,
Kang Liu, Yansong Feng, Duyu Tang, Meishan Zhang, Xin Zhao, Chenliang Li,
and Xiaodong He. The authors of Chaps. 4–9 are CCL 2016 tutorial speakers. They
spent a considerable amount of time in updating their tutorial material with the
latest advances in the field since October 2016.
Further, we thank numerous reviewers and readers, Sadaoki Furui, Andrew Ng,
Fred Juang, Ken Church, Haifeng Wang, and Hongjiang Zhang, who not only gave
us much needed encouragements but also offered many constructive comments
which substantially improved earlier drafts of the book.
Finally, we give our appreciations to our organizations, Microsoft Research and
Citadel (for Li Deng) and Tsinghua University (for Yang Liu), who provided
excellent environments, supports, and encouragements that have been instrumental
for us to complete this book. Yang Liu is also supported by National Natural
Science Foundation of China (No.61522204, No.61432013, and No.61331013).

Seattle, USA Li Deng


Beijing, China Yang Liu
October 2017
Contents

1 A Joint Introduction to Natural Language Processing and to


Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Li Deng and Yang Liu
2 Deep Learning in Conversational Language Understanding . . . . . . 23
Gokhan Tur, Asli Celikyilmaz, Xiaodong He, Dilek Hakkani-Tür
and Li Deng
3 Deep Learning in Spoken and Text-Based Dialog Systems . . . . . . 49
Asli Celikyilmaz, Li Deng and Dilek Hakkani-Tür
4 Deep Learning in Lexical Analysis and Parsing . . . . . . . . . . . . . . . 79
Wanxiang Che and Yue Zhang
5 Deep Learning in Knowledge Graph . . . . . . . . . . . . . . . . . . . . . . 117
Zhiyuan Liu and Xianpei Han
6 Deep Learning in Machine Translation . . . . . . . . . . . . . . . . . . . . 147
Yang Liu and Jiajun Zhang
7 Deep Learning in Question Answering . . . . . . . . . . . . . . . . . . . . . 185
Kang Liu and Yansong Feng
8 Deep Learning in Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 219
Duyu Tang and Meishan Zhang
9 Deep Learning in Social Computing . . . . . . . . . . . . . . . . . . . . . . . . 255
Xin Zhao and Chenliang Li
10 Deep Learning in Natural Language Generation from Images . . . . 289
Xiaodong He and Li Deng
11 Epilogue: Frontiers of NLP in the Deep Learning Era . . . . . . . . . . 309
Li Deng and Yang Liu
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

xi
Contributors

Asli Celikyilmaz Microsoft Research, Redmond, WA, USA


Wanxiang Che Harbin Institute of Technology, Harbin, China
Li Deng Citadel, Seattle & Chicago, USA
Yansong Feng Peking University, Beijing, China
Dilek Hakkani-Tür Google, Mountain View, CA, USA
Xianpei Han Institute of Software, Chinese Academy of Sciences, Beijing, China
Xiaodong He Microsoft Research, Redmond, WA, USA
Chenliang Li Wuhan University, Wuhan, China
Kang Liu Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yang Liu Tsinghua University, Beijing, China
Zhiyuan Liu Tsinghua University, Beijing, China
Duyu Tang Microsoft Research Asia, Beijing, China
Gokhan Tur Google, Mountain View, CA, USA
Jiajun Zhang Institute of Automation, Chinese Academy of Sciences, Beijing,
China
Meishan Zhang Heilongjiang University, Harbin, China
Yue Zhang Singapore University of Technology and Design, Singapore
Xin Zhao Renmin University of China, Beijing, China

xiii
Acronyms

AI Artificial intelligence
AP Averaged perceptron
ASR Automatic speech recognition
ATN Augmented transition network
BiLSTM Bidirectional long short-term memory
BiRNN Bidirectional recurrent neural network
BLEU Bilingual evaluation understudy
BOW Bag-of-words
CBOW Continuous bag-of-words
CCA Canonical correlation analysis
CCG Combinatory categorial grammar
CDL Collaborative deep learning
CFG Context free grammar
CYK Cocke–Younger–Kasami
CLU Conversational language understanding
CNN Convolutional neural network
CNNSM Convolutional neural network based semantic model
cQA Community question answering
CRF Conditional random field
CTR Collaborative topic regression
CVT Compound value typed
DA Denoising autoencoder
DBN Deep belief network
DCN Deep convex net
DNN Deep neural network
DSSM Deep structured semantic model
DST Dialog state tracking
EL Entity linking
EM Expectation maximization
FSM Finite state machine

xv
xvi Acronyms

GAN Generative adversarial network


GRU Gated recurrent unit
HMM Hidden Markov model
IE Information extraction
IRQA Information retrieval-based question answering
IVR Interactive voice response
KBQA Knowledge-based question answering
KG Knowledge graph
L-BFGS Limited-memory Broyden–Fletcher–Goldfarb–Shanno
LSI Latent semantic indexing
LSTM Long short-term memory
MC Machine comprehension
MCCNN Multicolumn convolutional neural network
MDP Markov decision process
MERT Minimum error rate training
METEOR Metric for evaluation of translation with explicit ordering
MIRA Margin infused relaxed algorithm
ML Machine learning
MLE Maximum likelihood estimation
MLP Multiple layer perceptron
MMI Maximum mutual information
M-NMF Modularized nonnegative matrix factorization
MRT Minimum risk training
MST Maximum spanning tree
MT Machine translation
MV-RNN Matrix-vector recursive neural network
NER Named entity recognition
NFM Neural factorization machine
NLG Natural language generation
NMT Neural machine translation
NRE Neural relation extraction
OOV Out-of-vocabulary
PA Passive aggressive
PCA Principal component analysis
PMI Point-wise mutual information
POS Part of speech
PV Paragraph vector
QA Question answering
RAE Recursive autoencoder
RBM Restricted Boltzmann machine
RDF Resource description framework
RE Relation extraction
RecNN Recursive neural network
RL Reinforcement learning
RNN Recurrent neural network
Acronyms xvii

ROUGE Recall-oriented understudy for gisting evaluation


RUBER Referenced metric and unreferenced metric blended evaluation routine
SDS Spoken dialog system
SLU Spoken language understanding
SMT Statistical machine translation
SP Semantic parsing
SRL Semantic role labeling
SRNN Segmental recurrent neural network
STAGG Staged query graph generation
SVM Support vector machine
UAS Unlabeled attachment score
UGC User-generated content
VIME Variational information maximizing exploration
VPA Virtual personal assistant
Chapter 1
A Joint Introduction to Natural
Language Processing and to Deep
Learning

Li Deng and Yang Liu

Abstract In this chapter, we set up the fundamental framework for the book. We
first provide an introduction to the basics of natural language processing (NLP) as an
integral part of artificial intelligence. We then survey the historical development of
NLP, spanning over five decades, in terms of three waves. The first two waves arose
as rationalism and empiricism, paving ways to the current deep learning wave. The
key pillars underlying the deep learning revolution for NLP consist of (1) distributed
representations of linguistic entities via embedding, (2) semantic generalization due
to the embedding, (3) long-span deep sequence modeling of natural language, (4)
hierarchical networks effective for representing linguistic levels from low to high,
and (5) end-to-end deep learning methods to jointly solve many NLP tasks. After
the survey, several key limitations of current deep learning technology for NLP are
analyzed. This analysis leads to five research directions for future advances in NLP.

1.1 Natural Language Processing: The Basics

Natural language processing (NLP) investigates the use of computers to process or to


understand human (i.e., natural) languages for the purpose of performing useful tasks.
NLP is an interdisciplinary field that combines computational linguistics, computing
science, cognitive science, and artificial intelligence. From a scientific perspective,
NLP aims to model the cognitive mechanisms underlying the understanding and pro-
duction of human languages. From an engineering perspective, NLP is concerned
with how to develop novel practical applications to facilitate the interactions between
computers and human languages. Typical applications in NLP include speech recog-
nition, spoken language understanding, dialogue systems, lexical analysis, parsing,
machine translation, knowledge graph, information retrieval, question answering,

L. Deng (B)
Citadel, Seattle & Chicago, USA
e-mail: l.deng@ieee.org
Y. Liu
Tsinghua University, Beijing, China
e-mail: liuyang2011@tsinghua.edu.cn

© Springer Nature Singapore Pte Ltd. 2018 1


L. Deng and Y. Liu (eds.), Deep Learning in Natural
Language Processing, https://doi.org/10.1007/978-981-10-5209-5_1
2 L. Deng and Y. Liu

sentiment analysis, social computing, natural language generation, and natural lan-
guage summarization. These NLP application areas form the core content of this
book.
Natural language is a system constructed specifically to convey meaning or seman-
tics, and is by its fundamental nature a symbolic or discrete system. The surface or
observable “physical” signal of natural language is called text, always in a sym-
bolic form. The text “signal” has its counterpart—the speech signal; the latter can
be regarded as the continuous correspondence of symbolic text, both entailing the
same latent linguistic hierarchy of natural language. From NLP and signal processing
perspectives, speech can be treated as “noisy” versions of text, imposing additional
difficulties in its need of “de-noising” when performing the task of understanding the
common underlying semantics. Chapters 2 and 3 as well as current Chap. 1 of this
book cover the speech aspect of NLP in detail, while the remaining chapters start
directly from text in discussing a wide variety of text-oriented tasks that exemplify
the pervasive NLP applications enabled by machine learning techniques, notably
deep learning.
The symbolic nature of natural language is in stark contrast to the continuous
nature of language’s neural substrate in the human brain. We will defer this discussion
to Sect. 1.6 of this chapter when discussing future challenges of deep learning in NLP.
A related contrast is how the symbols of natural language are encoded in several
continuous-valued modalities, such as gesture (as in sign language), handwriting
(as an image), and, of course, speech. On the one hand, the word as a symbol is
used as a “signifier” to refer to a concept or a thing in real world as a “signified”
object, necessarily a categorical entity. On the other hand, the continuous modalities
that encode symbols of words constitute the external signals sensed by the human
perceptual system and transmitted to the brain, which in turn operates in a continuous
fashion. While of great theoretical interest, the subject of contrasting the symbolic
nature of language versus its continuous rendering and encoding goes beyond the
scope of this book.
In the next few sections, we outline and discuss, from a historical perspective, the
development of general methodology used to study NLP as a rich interdisciplinary
field. Much like several closely related sub- and super-fields such as conversational
systems, speech recognition, and artificial intelligence, the development of NLP can
be described in terms of three major waves (Deng 2017; Pereira 2017), each of which
is elaborated in a separate section next.

1.2 The First Wave: Rationalism

NLP research in its first wave lasted for a long time, dating back to 1950s. In 1950,
Alan Turing proposed the Turing test to evaluate a computer’s ability to exhibit intelli-
gent behavior indistinguishable from that of a human (Turing 1950). This test is based
on natural language conversations between a human and a computer designed to gen-
erate human-like responses. In 1954, the Georgetown-IBM experiment demonstrated
1 A Joint Introduction to Natural Language Processing and to Deep Learning 3

the first machine translation system capable of translating more than 60 Russian sen-
tences into English.
The approaches, based on the belief that knowledge of language in the human
mind is fixed in advance by generic inheritance, dominated most of NLP research
between about 1960 and late 1980s. These approaches have been called rationalist
ones (Church 2007). The dominance of rationalist approaches in NLP was mainly
due to the widespread acceptance of arguments of Noam Chomsky for an innate
language structure and his criticism of N-grams (Chomsky 1957). Postulating that
key parts of language are hardwired in the brain at birth as a part of the human
genetic inheritance, rationalist approaches endeavored to design hand-crafted rules
to incorporate knowledge and reasoning mechanisms into intelligent NLP systems.
Up until 1980s, most notably successful NLP systems, such as ELIZA for simulating
a Rogerian psychotherapist and MARGIE for structuring real-world information into
concept ontologies, were based on complex sets of handwritten rules.
This period coincided approximately with the early development of artificial
intelligence, characterized by expert knowledge engineering, where domain experts
devised computer programs according to the knowledge about the (very narrow)
application domains they have (Nilsson 1982; Winston 1993). The experts designed
these programs using symbolic logical rules based on careful representations and
engineering of such knowledge. These knowledge-based artificial intelligence sys-
tems tend to be effective in solving narrow-domain problems by examining the
“head” or most important parameters and reaching a solution about the appropriate
action to take in each specific situation. These “head” parameters are identified in
advance by human experts, leaving the “tail” parameters and cases untouched. Since
they lack learning capability, they have difficulty in generalizing the solutions to new
situations and domains. The typical approach during this period is exemplified by
the expert system, a computer system that emulates the decision-making ability of a
human expert. Such systems are designed to solve complex problems by reasoning
about knowledge (Nilsson 1982). The first expert system was created in 1970s and
then proliferated in 1980s. The main “algorithm” used was the inference rules in the
form of “if-then-else” (Jackson 1998). The main strength of these first-generation
artificial intelligence systems is its transparency and interpretability in their (limited)
capability in performing logical reasoning. Like NLP systems such as ELIZA and
MARGIE, the general expert systems in the early days used hand-crafted expert
knowledge which was often effective in narrowly defined problems, although the
reasoning could not handle uncertainty that is ubiquitous in practical applications.
In specific NLP application areas of dialogue systems and spoken language under-
standing, to be described in more detail in Chaps. 2 and 3 of this book, such ratio-
nalistic approaches were represented by the pervasive use of symbolic rules and
templates (Seneff et al. 1991). The designs were centered on grammatical and onto-
logical constructs, which, while interpretable and easy to debug and update, had
experienced severe difficulties in practical deployment. When such systems worked,
they often worked beautifully; but unfortunately this happened just not very often
and the domains were necessarily limited.
4 L. Deng and Y. Liu

Likewise, speech recognition research and system design, another long-standing


NLP and artificial intelligence challenge, during this rationalist era were based
heavily on the paradigm of expert knowledge engineering, as elegantly analyzed
in (Church and Mercer 1993). During 1970s and early 1980s, the expert system
approach to speech recognition was quite popular (Reddy 1976; Zue 1985). How-
ever, the lack of abilities to learn from data and to handle uncertainty in reasoning was
acutely recognized by researchers, leading to the second wave of speech recognition,
NLP, and artificial intelligence described next.

1.3 The Second Wave: Empiricism

The second wave of NLP was characterized by the exploitation of data corpora and
of (shallow) machine learning, statistical or otherwise, to make use of such data
(Manning and Schtze 1999). As much of the structure of and theory about natural
language were discounted or discarded in favor of data-driven methods, the main
approaches developed during this era have been called empirical or pragmatic ones
(Church and Mercer 1993; Church 2014). With the increasing availability of machine-
readable data and steady increase of computational power, empirical approaches have
dominated NLP since around 1990. One of the major NLP conferences was even
named “Empirical Methods in Natural Language Processing (EMNLP)” to reflect
most directly the strongly positive sentiment of NLP researchers during that era
toward empirical approaches.
In contrast to rationalist approaches, empirical approaches assume that the human
mind only begins with general operations for association, pattern recognition, and
generalization. Rich sensory input is required to enable the mind to learn the detailed
structure of natural language. Prevalent in linguistics between 1920 and 1960, empiri-
cism has been undergoing a resurgence since 1990. Early empirical approaches to
NLP focused on developing generative models such as the hidden Markov model
(HMM) (Baum and Petrie 1966), the IBM translation models (Brown et al. 1993),
and the head-driven parsing models (Collins 1997) to discover the regularities of
languages from large corpora. Since late 1990s, discriminative models have become
the de facto approach in a variety of NLP tasks. Representative discriminative mod-
els and methods in NLP include the maximum entropy model (Ratnaparkhi 1997),
supporting vector machines (Vapnik 1998), conditional random fields (Lafferty et al.
2001), maximum mutual information and minimum classification error (He et al.
2008), and perceptron (Collins 2002).
Again, this era of empiricism in NLP was paralleled with corresponding approaches
in artificial intelligence as well as in speech recognition and computer vision. It came
about after clear evidence that learning and perception capabilities are crucial for
complex artificial intelligence systems but missing in the expert systems popular in
the previous wave. For example, when DARPA opened its first Grand Challenge for
autonomous driving, most vehicles then relied on the knowledge-based artificial intel-
ligence paradigm. Much like speech recognition and NLP, the autonomous driving and
1 A Joint Introduction to Natural Language Processing and to Deep Learning 5

computer vision researchers immediately realized the limitation of the knowledge-


based paradigm due to the necessity for machine learning with uncertainty handling
and generalization capabilities.
The empiricism in NLP and speech recognition in this second wave was based
on data-intensive machine learning, which we now call “shallow” due to the general
lack of abstractions constructed by many-layer or “deep” representations of data
which would come in the third wave to be described in the next section. In machine
learning, researchers do not need to concern with constructing precise and exact rules
as required for the knowledge-based NLP and speech systems during the first wave.
Rather, they focus on statistical models (Bishop 2006; Murphy 2012) or simple neural
networks (Bishop 1995) as an underlying engine. They then automatically learn or
“tune” the parameters of the engine using ample training data to make them handle
uncertainty, and to attempt to generalize from one condition to another and from one
domain to another. The key algorithms and methods for machine learning include EM
(expectation-maximization), Bayesian networks, support vector machines, decision
trees, and, for neural networks, backpropagation algorithm.
Generally speaking, the machine learning based NLP, speech, and other artificial
intelligence systems perform much better than the earlier, knowledge-based counter-
parts. Successful examples include almost all artificial intelligence tasks in machine
perception—speech recognition (Jelinek 1998), face recognition (Viola and Jones
2004), visual object recognition (Fei-Fei and Perona 2005), handwriting recognition
(Plamondon and Srihari 2000), and machine translation (Och 2003).
More specifically, in a core NLP application area of machine translation, as to be
described in detail in Chap. 6 of this book as well as in (Church and Mercer 1993), the
field has switched rather abruptly around 1990 from rationalistic methods outlined in
Sect. 1.2 to empirical, largely statistical methods. The availability of sentence-level
alignments in the bilingual training data made it possible to acquire surface-level
translation knowledge not by rules but from data directly, at the expense of discarding
or discounting structured information in natural languages. The most representative
work during this wave is that empowered by various versions of IBM translation
models (Brown et al. 1993). Subsequent developments during this empiricist era of
machine translation further significantly improved the quality of translation systems
(Och and Ney 2002; Och 2003; Chiang 2007; He and Deng 2012), but not at the
level of massive deployment in real world (which would come after the next, deep
learning wave).
In the dialogue and spoken language understanding areas of NLP, this empiri-
cist era was also marked prominently by data-driven machine learning approaches.
These approaches were well suited to meet the requirement for quantitative evalua-
tion and concrete deliverables. They focused on broader but shallow, surface-level
coverage of text and domains instead of detailed analyses of highly restricted text
and domains. The training data were used not to design rules for language under-
standing and response action from the dialogue systems but to learn parameters of
(shallow) statistical or neural models automatically from data. Such learning helped
reduce the cost of hand-crafted complex dialogue manager’s design, and helped
improve robustness against speech recognition errors in the overall spoken language
6 L. Deng and Y. Liu

understanding and dialogue systems; for a review, see He and Deng (2013). More
specifically, for the dialogue policy component of dialogue systems, powerful rein-
forcement learning based on Markov decision processes had been introduced during
this era; for a review, see Young et al. (2013). And for spoken language understand-
ing, the dominant methods moved from rule- or template-based ones during the first
wave to generative models like hidden Markov models (HMMs) (Wang et al. 2011)
to discriminative models like conditional random fields (Tur and Deng 2011).
Similarly, in speech recognition, over close to 30 years from early 1980 s to around
2010, the field was dominated by the (shallow) machine learning paradigm using the
statistical generative model based on the HMM integrated with Gaussian mixture
models, along with various versions of its generalization (Baker et al. 2009a, b;
Deng and O’Shaughnessy 2003; Rabiner and Juang 1993). Among many versions of
the generalized HMMs were statistical and neural-network-based hidden dynamic
models (Deng 1998; Bridle et al. 1998; Deng and Yu 2007). The former adopted EM
and switching extended Kalman filter algorithms for learning model parameters (Ma
and Deng 2004; Lee et al. 2004), and the latter used backpropagation (Picone et al.
1999). Both of them made extensive use of multiple latent layers of representations for
the generative process of speech waveforms following the long-standing framework
of analysis-by-synthesis in human speech perception. More significantly, inverting
this “deep” generative process to its counterpart of an end-to-end discriminative
process gave rise to the first industrial success of deep learning (Deng et al. 2010,
2013; Hinton et al. 2012), which formed a driving force of the third wave of speech
recognition and NLP that will be elaborated next.

1.4 The Third Wave: Deep Learning

While the NLP systems, including speech recognition, language understanding, and
machine translation, developed during the second wave performed a lot better and
with higher robustness than those during the first wave, they were far from human-
level performance and left much to desire. With a few exceptions, the (shallow)
machine learning models for NLP often did not have the capacity sufficiently large to
absorb the large amounts of training data. Further, the learning algorithms, methods,
and infrastructures were not powerful enough. All this changed several years ago,
giving rise to the third wave of NLP, propelled by the new paradigm of deep-structured
machine learning or deep learning (Bengio 2009; Deng and Yu 2014; LeCun et al.
2015; Goodfellow et al. 2016).
In traditional machine learning, features are designed by humans and feature
engineering is a bottleneck, requiring significant human expertise. Concurrently,
the associated shallow models lack the representation power and hence the ability
to form levels of decomposable abstractions that would automatically disentangle
complex factors in shaping the observed language data. Deep learning breaks away
the above difficulties by the use of deep, layered model structure, often in the form of
neural networks, and the associated end-to-end learning algorithms. The advances in
1 A Joint Introduction to Natural Language Processing and to Deep Learning 7

deep learning are one major driving force behind the current NLP and more general
artificial intelligence inflection point and are responsible for the resurgence of neural
networks with a wide range of practical, including business, applications (Parloff
2016).
More specifically, despite the success of (shallow) discriminative models in a
number of important NLP tasks developed during the second wave, they suffered from
the difficulty of covering all regularities in languages by designing features manually
with domain expertise. Besides the incompleteness problem, such shallow models
also face the sparsity problem as features usually only occur once in the training
data, especially for highly sparse high-order features. Therefore, feature design has
become one of the major obstacles in statistical NLP before deep learning comes
to rescue. Deep learning brings hope for addressing the human feature engineering
problem, with a view called “NLP from scratch” (Collobert et al. 2011), which was
in early days of deep learning considered highly unconventional. Such deep learning
approaches exploit the powerful neural networks that contain multiple hidden layers
to solve general machine learning tasks dispensing with feature engineering. Unlike
shallow neural networks and related machine learning models, deep neural networks
are capable of learning representations from data using a cascade of multiple layers of
nonlinear processing units for feature extraction. As higher level features are derived
from lower level features, these levels form a hierarchy of concepts.
Deep learning originated from artificial neural networks, which can be viewed as
cascading models of cell types inspired by biological neural systems. With the advent
of backpropagation algorithm (Rumelhart et al. 1986), training deep neural networks
from scratch attracted intensive attention in 1990s. In these early days, without large
amounts of training data and without proper design and learning methods, during
neural network training the learning signals vanish exponentially with the number
of layers (or more rigorously the depth of credit assignment) when propagated from
layer to layer, making it difficult to tune connection weights of deep neural networks,
especially the recurrent versions. Hinton et al. (2006) initially overcame this problem
by using unsupervised pretraining to first learn generally useful feature detectors.
Then, the network is further trained by supervised learning to classify labeled data.
As a result, it is possible to learn the distribution of a high-level representation using
low-level representations. This seminal work marks the revival of neural networks. A
variety of network architectures have since been proposed and developed, including
deep belief networks (Hinton et al. 2006), stacked auto-encoders (Vincent et al. 2010),
deep Boltzmann machines (Hinton and Salakhutdinov 2012), deep convolutional
neural works (Krizhevsky et al. 2012), deep stacking networks (Deng et al. 2012),
and deep Q-networks (Mnih et al. 2015). Capable of discovering intricate structures
in high-dimensional data, deep learning has since 2010 been successfully applied to
real-world tasks in artificial intelligence including notably speech recognition (Yu
et al. 2010; Hinton et al. 2012), image classification (Krizhevsky et al. 2012; He et al.
2016), and NLP (all chapters in this book). Detailed analyses and reviews of deep
learning have been provided in a set of tutorial survey articles (Deng 2014; LeCun
et al. 2015; Juang 2016).
8 L. Deng and Y. Liu

As speech recognition is one of core tasks in NLP, we briefly discuss it here due to
its importance as the first industrial NLP application in real world impacted strongly
by deep learning. Industrial applications of deep learning to large-scale speech recog-
nition started to take off around 2010. The endeavor was initiated with a collaboration
between academia and industry, with the original work presented at the 2009 NIPS
Workshop on Deep Learning for Speech Recognition and Related Applications. The
workshop was motivated by the limitations of deep generative models of speech, and
the possibility that the big-compute, big-data era warrants a serious exploration of
deep neural networks. It was believed then that pretraining DNNs using generative
models of deep belief nets based on the contrastive divergence learning algorithm
would overcome the main difficulties of neural nets encountered in the 1990s (Dahl
et al. 2011; Mohamed et al. 2009). However, early into this research at Microsoft, it
was discovered that without contrastive divergence pretraining, but with the use of
large amounts of training data together with the deep neural networks designed with
corresponding large, context-dependent output layers and with careful engineering,
dramatically lower recognition errors could be obtained than then-state-of-the-art
(shallow) machine learning systems (Yu et al. 2010, 2011; Dahl et al. 2012). This
finding was quickly verified by several other major speech recognition research
groups in North America (Hinton et al. 2012; Deng et al. 2013) and subsequently
overseas. Further, the nature of recognition errors produced by the two types of sys-
tems was found to be characteristically different, offering technical insights into how
to integrate deep learning into the existing highly efficient, run-time speech decod-
ing system deployed by major players in speech recognition industry (Yu and Deng
2015; Abdel-Hamid et al. 2014; Xiong et al. 2016; Saon et al. 2017). Nowadays,
backpropagation algorithm applied to deep neural nets of various forms is uniformly
used in all current state-of-the-art speech recognition systems (Yu and Deng 2015;
Amodei et al. 2016; Saon et al. 2017), and all major commercial speech recogni-
tion systems—Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google
Assistant, Apple Siri, Baidu and iFlyTek voice search, and more—are all based on
deep learning methods.
The striking success of speech recognition in 2010–2011 heralded the arrival of
the third wave of NLP and artificial intelligence. Quickly following the success of
deep learning in speech recognition, computer vision (Krizhevsky et al. 2012) and
machine translation (Bahdanau et al. 2015) were taken over by the similar deep
learning paradigm. In particular, while the powerful technique of neural embedding
of words was developed in as early as 2011 (Bengio et al. 2001), it is not until more
than 10 year later it was shown to be practically useful at a large and practically useful
scale (Mikolov et al. 2013) due to the availability of big data and faster computation.
In addition, a large number of other real-world NLP applications, such as image
captioning (Karpathy and Fei-Fei 2015; Fang et al. 2015; Gan et al. 2017), visual
question answering (Fei-Fei and Perona 2016), speech understanding (Mesnil et al.
2013), web search (Huang et al. 2013b), and recommendation systems, have been
made successful due to deep learning, in addition to many non-NLP tasks including
drug discovery and toxicology, customer relationship management, recommendation
systems, gesture recognition, medical informatics, advertisement, medical image
1 A Joint Introduction to Natural Language Processing and to Deep Learning 9

analysis, robotics, self-driving vehicles, board and eSports games (e.g., Atari, Go,
Poker, and the latest, DOTA2), and so on. For more details, see https://en.wikipedia.
org/wiki/deep_learning.
In more specific text-based NLP application areas, machine translation is perhaps
impacted the most by deep learning. Advancing from the shallow statistical machine
translation developed during the second wave of NLP, the current best machine
translation systems in real-world applications are based on deep neural networks. For
example, Google announced the first stage of its move to neural machine translation
in September 2016 and Microsoft made a similar announcement 2 months later.
Facebook has been working on the conversion to neural machine translation for
about a year, and by August 2017 it is at full deployment. Details of the deep learning
techniques in these state-of-the-art large-scale machine translation systems will be
reviewed in Chap. 6.
In the area of spoken language understanding and dialogue systems, deep learning
is also making a huge impact. The current popular techniques maintain and expand
the statistical methods developed during second-wave era in several ways. Like the
empirical, (shallow) machine learning methods, deep learning is also based on data-
intensive methods to reduce the cost of hand-crafted complex understanding and
dialogue management, to be robust against speech recognition errors under noise
environments and against language understanding errors, and to exploit the power
of Markov decision processes and reinforcement learning for designing dialogue
policy, e.g., (Gasic et al. 2017; Dhingra et al. 2017). Compared with the earlier
methods, deep neural network models and representations are much more powerful
and they make end-to-end learning possible. However, deep learning has not yet
solved the problems of interpretability and domain scalability associated with earlier
empirical techniques. Details of the deep learning techniques popular for current
spoken language understanding and dialogue systems as well as their challenges
will be reviewed in Chaps. 2 and 3.
Two important recent technological breakthroughs brought about in applying deep
learning to NLP problems are sequence-to-sequence learning (Sutskevar et al. 2014)
and attention modeling (Bahdanau et al. 2015). The sequence-to-sequence learning
introduces a powerful idea of using recurrent nets to carry out both encoding and
decoding in an end-to-end manner. While attention modeling was initially developed
to overcome the difficulty of encoding a long sequence, subsequent developments
significantly extended its power to provide highly flexible alignment of two arbitrary
sequences that can be learned together with neural network parameters. The key
concepts of sequence-to-sequence learning and of attention mechanism boosted the
performance of neural machine translation based on distributed word embedding over
the best system based on statistical learning and local representations of words and
phrases. Soon after this success, these concepts have also been applied successfully
to a number of other NLP-related tasks such as image captioning (Karpathy and
Fei-Fei 2015; Devlin et al. 2015), speech recognition (Chorowski et al. 2015), meta-
learning for program execution, one-shot learning, syntactic parsing, lip reading, text
understanding, summarization, and question answering and more.
10 L. Deng and Y. Liu

Setting aside their huge empirical successes, models of neural-network-based


deep learning are often simpler and easier to design than the traditional machine
learning models developed in the earlier wave. In many applications, deep learning
is performed simultaneously for all parts of the model, from feature extraction all
the way to prediction, in an end-to-end manner. Another factor contributing to the
simplicity of neural network models is that the same model building blocks (i.e., the
different types of layers) are generally used in many different applications. Using
the same building blocks for a large variety of tasks makes the adaptation of models
used for one task or data to another task or data relatively easy. In addition, software
toolkits have been developed to allow faster and more efficient implementation of
these models. For these reasons, deep neural networks are nowadays a prominent
method of choice for a large variety of machine learning and artificial intelligence
tasks over large datasets including, prominently, NLP tasks.
Although deep learning has proven effective in reshaping the processing of speech,
images, and videos in a revolutionary way, the effectiveness is less clear-cut in inter-
secting deep learning with text-based NLP despite its empirical successes in a number
of practical NLP tasks. In speech, image, and video processing, deep learning effec-
tively addresses the semantic gap problem by learning high-level concepts from raw
perceptual data in a direct manner. However, in NLP, stronger theories and structured
models on morphology, syntax, and semantics have been advanced to distill the under-
lying mechanisms of understanding and generation of natural languages, which have
not been as easily compatible with neural networks. Compared with speech, image,
and video signals, it seems less straightforward to see that the neural representations
learned from textual data can provide equally direct insights onto natural language.
Therefore, applying neural networks, especially those having sophisticated hierar-
chical architectures, to NLP has received increasing attention and has become the
most active area in both NLP and deep learning communities with highly visible
progresses made in recent years (Deng 2016; Manning and Socher 2017). Surveying
the advances and analyzing the future directions in deep learning for NLP form the
main motivation for us to write this chapter and to create this book, with the desire
for the NLP researchers to accelerate the research further in the current fast pace of
the progress.

1.5 Transitions from Now to the Future

Before analyzing the future dictions of NLP with more advanced deep learning, here
we first summarize the significance of the transition from the past waves of NLP to
the present one. We then discuss some clear limitations and challenges of the present
deep learning technology for NLP, to pave a way to examining further development
that would overcome these limitations for the next wave of innovations.
1 A Joint Introduction to Natural Language Processing and to Deep Learning 11

1.5.1 From Empiricism to Deep Learning: A Revolution

On the surface, the deep learning rising wave discussed in Sect. 1.4 in this chapter
appears to be a simple push of the second, empiricist wave of NLP (Sect. 1.3) into
an extreme end with bigger data, larger models, and greater computing power. After
all, the fundamental approaches developed during both waves are data-driven and
are based on machine learning and computation, and have dispensed with human-
centric “rationalistic” rules that are often brittle and costly to acquire in practical
NLP applications. However, if we analyze these approaches holistically and at a
deeper level, we can identify aspects of conceptual revolution moving from empiricist
machine learning to deep learning, and can subsequently analyze the future directions
of the field (Sect. 1.6). This revolution, in our opinion, is no less significant than the
revolution from the earlier rationalist wave to empiricist one as analyzed at the
beginning (Church and Mercer 1993) and at the end of the empiricist era (Charniak
2011).
Empiricist machine learning and linguistic data analysis during the second NLP
wave started in early 1990 s by crypto-analysts and computer scientists working
on natural language sources that are highly limited in vocabulary and application
domains. As we discussed in Sect. 1.3, surface-level text observations, i.e., words
and their sequences, are counted using discrete probabilistic models without relying
on deep structure in natural language. The basic representations were “one-hot” or
localist, where no semantic similarity between words was exploited. With restric-
tions in domains and associated text content, such structure-free representations and
empirical models are often sufficient to cover much of what needs to be covered.
That is, the shallow, count-based statistical models can naturally do well in limited
and specific NLP tasks. But when the domain and content restrictions are lifted for
more realistic NLP applications in real-world, count-based models would necessarily
become ineffective, no manner how many tricks of smoothing have been invented
in an attempt to mitigate the problem of combinatorial counting sparseness. This
is where deep learning for NLP truly shines—distributed representations of words
via embedding, semantic generalization due to the embedding, longer span deep
sequence modeling, and end-to-end learning methods have all contributed to beat-
ing empiricist, count-based methods in a wide range of NLP tasks as discussed in
Sect. 1.4.

1.5.2 Limitations of Current Deep Learning Technology

Despite the spectacular successes of deep learning in NLP tasks, most notably in
speech recognition/understanding, language modeling, and in machine translation,
there remain huge challenges. The current deep learning methods based on neu-
ral networks as a black box generally lack interpretability, even further away from
explainability, in contrast to the “rationalist” paradigm established during the first
12 L. Deng and Y. Liu

NLP wave where the rules devised by experts were naturally explainable. In practice,
however, it is highly desirable to explain the predictions from a seemingly “black-
box” model, not only for improving the model but for providing the users of the
prediction system with interpretations of the suggested actions to take (Koh and
Liang 2017).
In a number of applications, deep learning methods have proved to give recog-
nition accuracy close to or exceeding humans, but they require considerably more
training data, power consumption, and computing resources than humans. Also,
the accuracy results are statistically impressive but often unreliable on the individ-
ual basis. Further, most of the current deep learning models have no reasoning and
explaining capabilities, making them vulnerable to disastrous failures or attacks with-
out the ability to foresee and thus to prevent them. Moreover, the current NLP models
have not taken into account the need for developing and executing goals and plans
for decision-making via ultimate NLP systems. A more specific limitation of current
NLP methods based on deep learning is their poor abilities for understanding and
reasoning inter-sentential relationships, although huge progresses have been made
for interwords and phrases within sentences.
As discussed earlier, the success of deep learning in NLP has largely come from a
simple strategy thus far—given an NLP task, apply standard sequence models based
on (bidirectional) LSTMs, add attention mechanisms if information required in the
task needs to flow from another source, and then train the full models in an end-to-
end manner. However, while sequence modeling is naturally appropriate for speech,
human understanding of natural language (in text form) requires more complex
structure than sequence. That is, current sequence-based deep learning systems for
NLP can be further advanced by exploiting modularity, structured memories, and
recursive, tree-like representations for sentences and larger text (Manning 2016).
To overcome the challenges outlined above and to achieve the ultimate success
of NLP as a core artificial intelligence field, both fundamental and applied research
are needed. The next new wave of NLP and artificial intelligence will not come until
researchers create new paradigmatic, algorithmic, and computation (including hard-
ware) breakthroughs. Here, we outline several high-level directions toward potential
breakthroughs.

1.6 Future Directions of NLP

1.6.1 Neural-Symbolic Integration

A potential breakthrough is in developing advanced deep learning models and meth-


ods that are more effective than current methods in building, accessing, and exploit-
ing memories and knowledge, including, in particular, common-sense knowledge.
It is not clear how to best integrate the current deep learning methods, centered
on distributed representations (of everything), with explicit, easily interpretable, and
1 A Joint Introduction to Natural Language Processing and to Deep Learning 13

localist-represented knowledge about natural language and the world and with related
reasoning mechanisms.
One path to this goal is to seamlessly combine neural networks and symbolic
language systems. These NLP and artificial intelligence systems will aim to discover
by themselves the underlying causes or logical rules that shape their prediction and
decision-making processes interpretable to human users in symbolic natural language
forms. Recently, very preliminary work in this direction made use of an integrated
neural-symbolic representation called tensor-product neural memory cells, capable
of decoding back to symbolic forms. This structured neural representation is provably
lossless in the coded information after extensive learning within the neural-tensor
domain (Palangi et al. 2017; Smolensky et al. 2016; Lee et al. 2016). Extensions
of such tensor-product representations, when applied to NLP tasks such as machine
reading and question answering, are aimed to learn to process and understand mas-
sive natural language documents. After learning, the systems will be able not only to
answer questions sensibly but also to truly understand what it reads to the extent that
it can convey such understanding to human users in providing clues as to what steps
have been taken to reach the answer. These steps may be in the form of logical reason-
ing expressed in natural language which is thus naturally understood by the human
users of this type of machine reading and comprehension systems. In our view, natu-
ral language understanding is not just to accurately predict an answer from a question
with relevant passages or data graphs as its contextual knowledge in a supervised
way after seeing many examples of matched questions–passages–answers. Rather,
the desired NLP system equipped with real understanding should resemble human
cognitive capabilities. As an example of such capabilities (Nguyen et al. 2017)—
after an understanding system is trained well, say, in a question answering task
(using supervised learning or otherwise), it should master all essential aspects of the
observed text material provided to solve the question answering tasks. What such
mastering entails is that the learned system can subsequently perform well on other
NLP tasks, e.g., translation, summarization, recommendation, etc., without seeing
additional paired data such as raw text data with its summary, or parallel English and
Chinese texts, etc.
One way to examine the nature of such powerful neural-symbolic systems is
to regard them as ones incorporating the strength of the “rationalist” approaches
marked by expert reasoning and structure richness popular during the first wave of
NLP discussed in Sect. 1.2. Interestingly, prior to the rising of deep learning (third)
wave of NLP, (Church 2007) argued that the pendulum from rationalist to empiri-
cist approaches has swung too far at almost the peak of the second NLP wave, and
predicted that the new rationalist wave would arrive. However, rather than swinging
back to a renewed rationalist era of NLP, deep learning era arrived in full force in just
a short period from the time of writing by Church (2007). Instead of adding the ratio-
nalist flavor, deep learning has been pushing empiricism of NLP to its pinnacle with
big data and big compute, and with conceptually revolutionary ways of representing
a sweeping range of linguistic entities by massive parallelism and distributedness,
thus drastically enhancing the generalization capability of new-generation NLP mod-
els. Only after the sweeping successes of current deep learning methods for NLP
14 L. Deng and Y. Liu

(Sect. 1.4) and subsequent analyses of a series of their limitations, do researchers


look into the next wave of NLP—not swinging back to rationalism while abandon-
ing empiricism but developing more advanced deep learning paradigms that would
organically integrate the missing essence of rationalism into the structured neural
methods that are aimed to approach human cognitive functions for language.

1.6.2 Structure, Memory, and Knowledge

As discussed earlier in this chapter as well as in the current NLP literature (Man-
ning and Socher 2017), NLP researchers at present still have very primitive deep
learning methods for exploiting structure and for building and accessing memories
or knowledge. While LSTM (with attention) has been pervasively applied to NLP
tasks to beat many NLP benchmarks, LSTM is far from a good memory model
for human cognition. In particular, LSTM lacks adequate structure for simulating
episodic memory, and one key component of human cognitive ability is to retrieve
and re-experience aspects of a past novel event or thought. This ability gives rise
to one-shot learning skills and can be crucial in reading comprehension of natural
language text or speech understanding, as well as reasoning over events described by
natural language. Many recent studies have been devoted to better memory model-
ing, including external memory architectures with supervised learning (Vinyals et al.
2016; Kaiser et al. 2017) and augmented memory architectures with reinforcement
learning (Graves et al. 2016; Oh et al. 2016). However, they have not shown general
effectiveness, but have suffered from a number of of limitations including notably
scalability (arising from the use of attention which has to access every stored element
in the memory). Much work remains in the direction of better modeling of memory
and exploitation of knowledge for text understanding and reasoning.

1.6.3 Unsupervised and Generative Deep Learning

Another potential breakthrough in deep learning for NLP is in new algorithms for
unsupervised deep learning, which makes use of ideally no direct teaching signals
paired with inputs (token by token) to guide the learning. Word embedding discussed
in Sect. 1.4 can be viewed as a weak form of unsupervised learning, making use of
adjacent words as “cost-free” surrogate teaching signals, but for real-world NLP pre-
diction tasks, such as translation, understanding, summarization, etc., such embed-
ding obtained in an “unsupervised manner” has to be fed into another supervised
architecture which requires costly teaching signals. In truly unsupervised learning
which requires no expensive teaching signals, new types of objective functions and
new optimization algorithms are needed, e.g., the objective function for unsupervised
learning should not require explicit target label data aligned with the input data as
in cross entropy that is most popular for supervised learning. Development of unsu-
1 A Joint Introduction to Natural Language Processing and to Deep Learning 15

pervised deep learning algorithms has been significantly behind that of supervised
and reinforcement deep learning where backpropagation and Q-learning algorithms
have been reasonably mature.
The most recent preliminary development in unsupervised learning takes the
approach of exploiting sequential output structure and advanced optimization meth-
ods to alleviate the need for using labels in training prediction systems (Russell and
Stefano 2017; Liu et al. 2017). Future advances in unsupervised learning are promis-
ing by exploiting new sources of learning signals including the structure of input data
and the mapping relationships from input to output and vice versa. Exploiting the rela-
tionship from output to input is closely connected to building conditional generative
models. To this end, the recent popular topic in deep learning—generative adversar-
ial networks (Goodfellow et al. 2014)—is a highly promising direction where the
long-standing concept of analysis-by-synthesis in pattern recognition and machine
learning is likely to return to spotlight in the near future in solving NLP tasks in new
ways.
Generative adversarial networks have been formulated as neural nets, with dense
connectivity among nodes and with no probabilistic setting. On the other hand,
probabilistic and Bayesian reasoning, which often takes computational advantage
of sparse connections among “nodes” as random variables, has been one of the
principal theoretical pillars to machine learning and has been responsible for many
NLP methods developed during the empiricist wave of NLP discussed in Sect. 1.3.
What is the right interface between deep learning and probabilistic modeling? Can
probabilistic thinking help understand deep learning techniques better and motivate
new deep learning methods for NLP tasks? How about the other way around? These
issues are widely open for future research.

1.6.4 Multimodal and Multitask Deep Learning

Multimodal and multitask deep learning are related learning paradigms, both con-
cerning the exploitation of latent representations in the deep networks pooled from
different modalities (e.g., audio, speech, video, images, text, source codes, etc.) or
from multiple cross-domain tasks (e.g., point and structured prediction, ranking,
recommendation, time-series forecasting, clustering, etc.). Before the deep learning
wave, multimodal and multitask learning had been very difficult to be made effective,
due to the lack of intermediate representations that share across modalities or tasks.
See a most striking example of this contrast for multitask learning—multilingual
speech recognition during the empiricist wave (Lin et al. 2008) and during the deep
learning wave (Huang et al. 2013a).
Multimodal information can be exploited as low-cost supervision. For instance,
standard speech recognition, image recognition, and text classification methods make
use of supervision labels within each of the speech, image, and text modalities sepa-
rately. This, however, is far from how children learn to recognize speech, image, and
to classify text. For example, children often get the distant “supervision” signal for
Exploring the Variety of Random
Documents with Different Content
Schimmelbusch Method.

Helferich Method.—A lining flap is made, according to the


French method, from the one cheek, which is dissected up and
turned over to bridge most of the loss of nasal tissue, and sutured to
the opposite freshened margin, as showed in Fig. 377.
A frontal flap, as outlined in the same illustration, is now cut from
the forehead, leaving a pedicle as shown, and containing a section
of bone at its median line. This is rotated downward and into place,
and sutured along the same margin to which the genian flap is fixed,
as shown in Fig. 378.
When the frontal and genian flaps have become well united, the
latter’s pedicle is cut when the freshened lateral margin of the
frontal flap is sutured into place.
A subseptum is now made or deemed necessary by this surgeon.
At a later period the pedicle of the frontal flap is cut, and fixed by
suture and some cutting, to reduce the resultant prominence
thereof.

Fig. 377. Fig. 378.


Helferich Method.

Preidesberger Method.—This author cuts away the skin


surrounding the arch of the old nose, and turns this flap downward
to form the lining to the flap made from the forehead made in the
same manner as Helferich.
The bone section is made in the median line, and is one
centimeter wide and four long.
The frontal flap should be made long enough to permit of building
a subseptum and the nostrils.
Krause Method.—This frontal cutaneo-osteo-periostitic flap is
made according to the method of König.
After turning down the flap it was covered with a
nonpedunculated skin flap taken from the upper part of the arm by
transplanting after its subcutaneous fatty tissue had been removed.
(See Fig. 379.)
This method necessitates a long-continued dressing of the
forehead before the pedicle is cut, because of the needed nutrition
to make the two flaps heal upon each other.
After union has been established the sides of the transplanted
flaps are raised by dissection, as shown in Fig. 380, to expose the
bone plate of the frontal flap. A median strip is left intact.
With a fine saw the bony plate is cut into three sections, making
the narrowest the median.
The margins of the old nose are now freshened, and the combined
flap is sutured along the sides, preserving what tissue the surgeon
can use to add support to the nose, which is done by dissection and
turning or folding, as heretofore described.
The lower or forehead flap is sutured to the soft parts of the old
nose, and the transplanted lateral margins to the marginal skin of
the cheeks, giving to the nose the appearance as shown in Fig. 381.
At a later period the pedicle is cut and the wound that cannot, at
this time, be overcome by sliding of the adjacent skin, is covered by
skin grafting.

Fig. 379.—First step.


Fig. 380.—Second step.
Fig. 381.—Third step.
Krause Method.

Nélaton Method.—A lateral flap of skin is taken from the cheeks,


beginning on a line with the root of the nose and as low as a point
two thirds of its normal length. These flaps are made wide enough,
so that when dissected up and folded inward they will meet on the
median line, as shown in Fig. 382, having their raw surface facing
outward. They are sutured along the median line. The frontal flap
was cut in the form of a horse-shoe having its pedicle at the root of
the nose just above the eyebrows, and being about three
centimeters wide and six long.
The skin at the outer margins was dissected up from the bone,
leaving sufficient attachment at its center to allow for a bony plate.
With a fine saw, and in the manner shown in Fig. 383, this plate
was made from the frontal bone, being about two and a half
centimeters wide and four long.
Fig. 382.—First step.
Fig. 383.—Making bony support to flap.
Nélaton Method.

There is some difficulty associated with the making of the flap,


which ends at the superior border of the frontal, leaving the pedicle
composed only of skin.
The flap is now turned down, exposing its raw surface. The bony
plate is sawed through at the median line, as shown in Fig. 384, and
the skin of the flap is also divided along this line, giving two partly
bone-lined flaps.
The two flaps are now rotated downward before the lost nose, so
that their raw surfaces face inward, and in this position they are
sutured along the median line and the sides, as shown in Fig. 385.
The method gives an angular dorsum of satisfactory consistency
to the new nose, but furnishes a serious drawback, in that the
cicatrization along the median line is liable to affect the shape of the
organ and leaves a prominent scar line. The use of two small
pedicles is another objection in that the danger of gangrene is
greater as the nourishment to each flap is less.
Fig. 384.—Cutting through bony plate.
Fig. 385.—Disposition of frontal flap.
Nélaton Method.

Israel Method.—From the ulnar side of the left forearm Israel


cuts a skin flap, as shown in Fig. 386, with its smaller end nearest to
the wrist, where it is detached, the pedicle being broad, assuring of
better nourishment to the flap.
The narrow end of the flap is cut down to the bone, then the sides
are dissected up until the borders of the ulna are reached on both
sides, reserving an adherent strip about eight millimeters wide and
six centimeters long.

Fig. 386.—Israel Method.

The bone below this strip is now removed with the saw from the
lower end upward, and ending about one centimeter beyond the
base line of the flap, where the strip so made is left connected to
the bone proper.
The flap is now raised gently and bent upward without breaking
the bone. It is sawed half through, transversely, at a point
corresponding to the lobule of the nose.
The flap is then enveloped in iodoform gauze, and the head,
forearm, and arm are fixed in plaster of Paris, the forearm being
bent at a right angle to the arm (see Fig. 387).
Fig. 387.—Israel Method. Position of forearm for placing of
flap.

After nine days the osseous connection still remaining is severed,


and the nose is modeled upon the forearm, as heretofore described
in these operations, this surgeon using silver wire to retain the parts.
The raw skin surfaces are allowed to heal upon each other and the
flap is permitted to come in contact with the wound on the forearm
temporarily, to which it might adhere, the gauze being now
removed.
After twelve days the newly modeled nose is freed from such
adhesions and kept from healing to the parts by using dressings
between the flap and wound.
Five days after, the margins of the old nose are freshened in the
form of an inverted V. If there be sufficient cicatricial tissue it is
turned down, raw surface out, to line the new nose.
A prolongation of the pedicle is now cut, widening out toward the
radial side of the arm, made obliquely, as shown, so that its pedicle
now corresponds to a width of seven centimeters.
The whole flap except this newly formed pedicle is cut free of this
forearm. The arm is put into the position shown in Fig. 387, and the
freshened flap margins at the root, the whole length of the left side,
and part of the upper right lateral. The plaster dressing to hold the
arm in the proper position until complete union is established is
used. This done, the pedicle is cut, and such minor operations are
done to fix the remaining free margin and the base of the new nose.

Cartilaginous Support of Flap

The methods just described in which an osseous plate of various


size and form is included with skin flaps for the restoration of the
nose give undoubtedly the best rhinoplastic results. The new nose is
given not only better shape, but a permanency of such form that
skin flaps of themselves could never give.
The unfortunate factors in these osteo-cutaneous operations are
the many difficulties experienced.
The cutting or making of the bony plate is no simple task.
The skin is an uncertain agent to employ, because of the peculiar
contour of the bony surface from which the plate is to be removed.
The chisel, no matter how dexterously used, is liable to cut through
the entire bone thickness, which has occurred in several recorded
cases.
There is also the possibility of necrosis of a part or all of the bony
plate thus obtained, and where the latter is not lined interiorly there
is the added danger of infection.
Furthermore, the secondary wound is more extensive; the bone
exposed requires about a month’s time to granulate over before skin
grafts can be successfully applied over it.
With the employment of a cheek-flap lining there is the added
objection of cicatrization. The use of a flap from the arm is
complicated and requires considerable time for the completion of the
operation, and there is always the added danger of infection and
consequent death of the osseous plate.
To overcome these many difficulties von Mangold advocates the
use of a section of cartilage to support the anterior prominence of
the nose.
It has been found, since the first attempt of and the successful
result obtained in 1897 by this surgeon, that cartilage to be used for
this purpose should be taken from the costal cartilage, where a strip
of the required length and width can be obtained.
The results thus far recorded are excellent, and much is hoped for
from this method, especially in the reconstruction of loss about the
wing of the nose in partial rhinoplasties, where the convexed
contour may be reproduced to a nicety.
The first attempt to support the flap for a total rhinoplasty by this
method was made in 1902 by Charles Nélaton.
The use of cartilaginous supports may be combined with any of
the methods given heretofore. The flap containing the cartilage may
be lined or unlined. All tissue found about the old nose should, of
course, be utilized to give added support and to reduce as far as
possible extensive secondary cicatrization.
The combined Hindu and Italian methods give splendid results,
the frontal flap and its support being brought down from the
forehead, raw surface outward, and the arm or forearm flap being
placed immediately in front of it.
The frontal flap with the support requires a preliminary operation
to permit of the attachment of the cartilage. Fortunately, this step
requires but little time and shows a very slight disfigurement during
this period.
The secondary wound at the site of the cartilage excision requires
little attention and heals readily, and the cicatrix involved is very
small.
Steinthal proposes taking the flap and cartilage from the thoracic
region, grafting it during the preparatory period to the forearm, from
which it is transplanted to the face at a second sitting.
There is the objection to this method that it requires the arm to be
retained in position for a very long time.
The author advocated the use of an arm flap made by the Italian
method to line the one to be brought down from the forehead in
cases of total rhinoplasty where little or no tissue can be obtained
from the remains of the old nose. Such procedure reduces the time
required by the Steinthal method to one half, and therefore greatly
lessens the discomfort to the patient.
The fundamental principles as laid down by Nélaton are excellent,
and may be applied to any modification of method the surgeon may
decide upon where a section of costal cartilage is employed to
support the flap, whether this be taken from the forehead, other
parts of the face, or remote places.
The procedure of Nélaton is as follows:
Nélaton Method.—The method involved a preparatory and a
final operation.
The preparatory operation has to do with obtaining and placing in
position the section of cartilage under the skin flap wherever
located.
The final operation may or may not consist of two sittings, the first
being necessitated by the bringing upon the remains of the nose a
flap of skin to line the one brought down in front of it and containing
the support.
Preparatory Operation.—To begin properly, the frontal flap to be
utilized is marked out on the forehead with nitrate of silver the day
before the operation, so that its outline will be plainly discernible,
and act as a guide for the placing of the cartilage. The shape of the
flap is fashioned as shown in Fig. 388.

Fig. 388.—Nélaton Method. Outlining of frontal flap.

In the illustration is also shown the incisions later made to utilize


the borders of the remaining nose to line the frontal flap. This is
done by making an inverted V incision at a distance from the inner
borders, corresponding to the lateral line of union of the frontal flap
with the face. The resultant flap is turned down, raw surface
outward, curtainlike, and is sutured to the frontal flap, where it falls
into position.
The flap outline shows that its pedicle lies between the outer end
of the inner third and above the right eyebrow and a little to the left
of the median line at the root of the old nose. This will avoid
considerable tension at this point, the rotation as made being ninety
degrees.
Nearly horizontally, as shown in the figure, a line is drawn through
the center of the flap, showing the position the strip of cartilage is to
occupy.
This done, a pattern of the outline is cut from stiff paper or oiled
silk to preserve as a guide for the making of the flap, it being
understood that the outlining has been made to the measurement of
the required nose, allowance being given for cicatricial contraction.
This done, the surgeon having prepared the skin about the costal
prominences of the left thorax, he proceeds as follows:
A vertical line is drawn the width of two fingers to the right of the
nipple, as shown in Fig. 389, the length of the line being obvious.
Fig. 389.—Method of Locating Strip of Cartilage.

Where the vertical crosses the eighth costal cartilage an incision is


made downward over and not under the border of the cartilage.
The incision extends downward for a distance of eight
centimeters, where it is turned upward at an angle, as shown, to a
distance of three centimeters.
By separating the muscular aponeurosis made visible by this
incision the lower edge of the eighth costal cartilage is exposed. The
knife is moved along the lower edge of the cartilage, dividing the
fibers of the insertion of the transverse muscle from without inward.
The cartilage can now be grasped between the thumb and forefinger
and be forced out of its normal position after a slight anterior
dissection.
The union between cartilage and bone is exposed. The chisel is
used to divide the cartilage about one centimeter from the rib, after
the costal or inner extremity has been made.
The position of the hands and the exposed cartilage is shown in
Fig. 390.

Fig. 390.—Excising Strip of Costal Cartilage.

This accomplished, the wound is temporarily dressed. The


cartilage is then fashioned to suit the required size and shape.
It is thinned down on its lower surface to about three millimeters
in diameter. This thickness is maintained to a length of two and a
half centimeters, the part being intended for the subseptum.
A notch is made on the upper surface at this distance from the
end, which marks the point at which it must be eventually bent to
form the point of the nose. This notch is cut to two thirds of the
entire thickness.
The required length, that of the nasal line and its added septal
length, is preserved.
The cartilage being prepared is now ready for the insertion under
the frontal periosteum at the site already marked.
For this purpose a vertical incision one and a half centimeters,
extending down to the bone, is made, as shown in Fig. 391.

Fig. 391.—Cartilage Placed under Frontal


Flap.

The periosteum is peeled away from the bone with the dull or
rounded handle of a knife.
The cartilage is now thrust into the tunnel thus made, the
thinned-down, notched-off section facing forward and lying toward
the vertical incision.
The skin wound is sutured and a gentle compress is used to keep
the cartilage in contact with the periosteum, which requires at least
two months. A longer interval of time is advocated to give greater
vitality to the cartilage.
The wound of the thorax is simply sutured and dressed as any
surgical wound.
Final Operation.—The part cut is prepared as in the Hindu method.
A lining for the frontal is made of such tissue as remains, and its
freshened borders are sutured where possible, as shown in the last
figure.
When this cannot be done, a flap may be taken from the arm, as
already suggested, or a Krause nonpedunculated skin flap may be
used, according to the methods given heretofore.
The epidermis is made to face inward. If either of these methods
is used, the frontal lap is not brought down until healthy granulation
has been established.
The frontal flap is made to include the periosteum, from which it is
separated with a blunt instrument. The cartilaginous strip will be
found to be attached to the periosteum.
The freed flap is now brought before the nasal defect and fitted
into place. The cartilaginous strip should occupy the anterior median
line.
The subseptal cartilage is bent inward and downward and the skin
of the flap is sutured to it with catgut to form the subseptum, as
shown in Fig. 392.
Fig. 392.—Bringing Down Frontal Flap.

The free margins of skin remaining at the septal bone of the flap
are folded inward to line the new nostrils. Catgut sutures are used to
keep these folds in position.
The nose is now ready to be sutured into place. The subseptum is
inserted first and fixed into the upper lip, then the nose being held
so that its median line occupies the proper position, both wings are
sutured to the freshened margins, and lastly the sides (see Fig.
393).

Fig. 393.—Placing of Frontal Flap.

The frontal wound may be drawn together as near as possible by


suture.
Rubber drainage-tubes are kept in the nares for a few days, and
are thereafter replaced by rolls of gauze.
Dry dressings are preferred for the nasal wounds, which heal in
about five days.
A month after, Thiersch grafts are employed to cover the frontal
wound remaining. They require about eight days to heal into place.
PARTIAL RHINOPLASTY

Restoration of Base of Nose

In this defect there may be a loss of the lobule and both alæ,
including the subseptum, or there may be a lateral loss, involving
more or less of the base.
There are many types of this deformity, so that to include all
would involve considerable space, and at best most of the operations
involved would be those utilizing the methods heretofore mentioned.
The earlier operations for the correction of lesions of large extent
are founded upon the use of skin flaps, which have been shown to
be unsatisfactory because of their consequent cicatrization.
Reference is made, however, to several of these to exhibit the
disposition of the remaining parts of the old nose.
Later will be considered the methods involving osteo-cartilaginous
supports.
Steinhausen Method.—The inferior remains of the old nose are
detached from the margins and brought downward; a Hindu flap is
fashioned as shown in Fig. 394, and brought down to form the new
nose; the size of the flap is given as being four inches wide and
eight inches long.
The distal end of the flap is sutured to the freed flaps obtained
from the borders, as shown in Fig. 395.
The method is purely of the Hindu type, and the results are not,
therefore, very satisfactory.
Fig. 394. Fig. 395.
Steinhausen Method.

Neumann Method.—This author cuts down the remains of both


lower margins of the old nose, as in the Steinhausen operation. A
wedge-shaped section is cut from the entire thickness of the upper
lid and turned upward to form the subseptum, and is sutured to the
lateral parts brought down by the former incisions, to which it is
sutured at the median line, as shown in Fig. 396.
Two lateral flaps are now made from the sides of the remaining
nose retaining their cartilages, as shown in the illustration, A, B, C,
D, showing one of them. The two flaps remain attached, anteriorly
along the median line over the bridge of the nose. These two lateral
flaps A, B, C, are turned down from the point A, which represents
the pedicle, and are sutured at the median line by their lower
borders, A, B, the borders B, C, being thus brought down, fall before
the fresh borders taken from the margins of the old nose, to which
they are sutured, as shown in Fig. 397.
This procedure will leave two exposed areas at either side of the
nose, which are permitted to heal by granulation.

Fig. 396. Fig. 397.


Neumann Method.

Later Neumann Method.—An incision is made to circumscribe


the remains of the old nose at either side, extending upward in
rectangular form above the root of the nose, between the inner
canthi and upward, and somewhat above the eyebrows, as shown in
Fig. 398.
This flap thus outlined is freely dissected down to the bones of the
nose, leaving it attached only at the roots of the wings, so that it can
be turned downward, hanging over the mouth, like a curtain.
A deep transverse incision is then made through the remaining
cartilaginous structure of the nose, just below the inferior borders of
the nasal bones. This gives a cartilaginous, archlike support to this
part of the flap, which is utilized to give firmness and shape to the
base of the new nose.
The incision just mentioned is depicted in Fig. 399, in which is also
shown the turned-down flap.
After the hemorrhage has been controlled the flap is turned
upward and into such position as to form the new nose, utilizing the
cartilaginous arch, above referred to, to the best advantage to give
the proper contour. This will lower the apex of the flap considerably.
The lateral borders are sutured to the freshened margins where
possible, but as a rule an opening is left at either side,
communicating with the inner nose, which must be healed by
granulation.
The wound on the forehead may be brought together completely
by suture. The appearance of the nose assumes at this time the
form shown in Fig. 400.
The objection to this method lies in the fact that the cartilaginous
arch brought down with the flap is usually insufficient to give proper
support to the base of the nose, permitting the lobule to contract
and sink. In most cases there is an absence of sufficient cartilage to
employ the method at all. An osseous arch would, therefore,
preferably be incorporated with the flap, taken from the remaining
nasal bones.
Fig. 398. Fig. 399. Fig. 400.
Later Neumann Method.

Bardenheuer Method.—This author makes a transverse incision


across the root of the nose, and two lateral incisions from either end
of the first, carrying them downward and outward, as shown in Fig.
401. These incisions are made down to the bone. With a chisel the
nasal bones are separated from their frontal and superior maxillary
attachments, giving an arch of bone to the flap, which is brought
downward and outward, the bone being dissected from the
underlying mucosa. To facilitate the bringing down of this flap the
anterior border of the cartilaginous septum must be divided if
present.
The flap thus made is attached only at the two points of skin at
the inferior borders, the epidermal surface looking inward. The
archlike mass of bone is gently bent backward at either side to
practically reverse its convexity. The position of the flap is shown in
Fig. 402.
The raw surface of the flap above mentioned is now covered with
a flap taken from the forehead in the form shown in the figures.
The resultant nose is entirely lined with skin, and contains
sufficient bone to support it. The objection is that there must
necessarily be a large secondary wound in the forehead, which must
be covered with Thiersch grafts.
Fig. 401.—Shape of flap.
Fig. 402.—Disposition of nasal flap.
Bardenheuer Method.

Ollier Method.—This author uses an inverted V incision,


beginning on the forehead at a point about three centimeters above
the superior margin of the eyebrows. The diverging incisions are
carried down to a point just above the base of what remains of the
old nose, where it remains attached.
The shape of the flap thus made is shown in Fig. 403.
Fig. 403.—Ollier Method. First Step.

The flap is dissected up and made to contain the periosteum as


far as the juncture of the frontal nasal bones.
The skin over the right nasal bone is now dissected up, without,
however, including the periosteum. The left nasal bone, still adherent
to the skin, is removed with the chisel, beginning at the median line,
then at its frontal attachment, and lastly along its union with the
superior maxillary bone.
On the right side what remained of the cartilaginous structure was
divided so as to include it in the flap.
This gave a large triangular flap, periosteo-cutaneous above,
osteo-cutaneous below that, and ending in a chondro-cutaneous
border, attached to the face by a double pedicle, as shown in Fig.
404.
To give further support to this flap at the median line, Ollier
divided the septum with the scissors in such a way as to form an
antero-posterior cartilaginous flap attached by its lower base.
The flap was brought downward in the same manner as in the
method of Neumann and sutured into position, the parts involved
assuming the position shown in Fig. 405, in which the lateral nasal
surface is left uncovered to show the space occasioned by the
removal of the nasal bone, and in dotted line the position that bone
now occupies.
In five weeks the two nasal bones united, end to end, and three
months after the operation the space made by the removal of the
bone had become filled with hard tissue, that eventually ossified in
about seven months.

Fig. 404.—Second step.


Fig. 405.—Position nasal bone occupies.
Ollier Method.
Langenbeck Method.—A median incision is made through the
remaining skin of the old nose, dividing it into halves. The incisions
about the base and the shape of flap to be brought down from the
forehead are shown in Fig. 406.
The skin over the nose is dissected up, moving toward the cheek,
exposing the bony frame of the nose.
From the lower border of the pyriform aperture two elongated
triangular plates of bone are made, being attached posteriorly to
superior maxillary bones. They should be made about one sixth inch
wide.
By their subsequent displacement they are made to lie antero-
posteriorly. With a saw the nasal bones are separated from their
maxillary connection from below upward, making a median bone
plate, which is raised with a levator to the height desired for the new
nasal bridge, remaining attached to the frontal bone, as shown in
Fig. 407.
A frontal flap is taken from the forehead and sutured to the
freshened raw margins of the lateral flaps.
The bone plates are fastened to each side of the frontal flap by
suture.
The nasal base is preferably made of the tissue remaining of the
old nose, as depicted, to prevent closure of the nostrils, the only
difficulty being to keep the poorly nourished tissue from dying. When
used the raw surface is brought in contact with that of the frontal
flap.
The objection in this case is that the median third anterior line
usually falls in rapidly, leaving the nose dished or saddled, and
unless there be sufficient tissue to construct the base, the objections
so often referred to heretofore will occur.
Fig. 406.—First step.
Fig. 407.—Showing separation and elevation of nose flaps.
Langenbeck Method.

Ch. Nélaton Method.—This author uses an osteo-cutaneous flap


taken from the forehead. The shape of the latter is shown in Fig.
408.
The lateral incisions are to be made the width of a finger from the
margins of the old nose, extending upward in curved fashion
through the inner edge of the eyebrows and meeting at a point on
the forehead, becoming slightly oblique near the border of the hair.
The flap is dissected up from the borders inward, including the
periosteum, leaving a strip of bony attachment at the median line.
The dissected sides of the flap are held up by an assistant while
the operator proceeds to chisel a thin bony plate from the frontal.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like