Instant Download Deep Learning and Linguistic Representation 1st Edition Shalom Lappin PDF All Chapters
Instant Download Deep Learning and Linguistic Representation 1st Edition Shalom Lappin PDF All Chapters
com
https://textbookfull.com/product/deep-learning-
and-linguistic-representation-1st-edition-shalom-
lappin/
https://textbookfull.com/product/deep-learning-pipeline-building-a-
deep-learning-model-with-tensorflow-1st-edition-hisham-el-amir/
textbookfull.com
https://textbookfull.com/product/r-deep-learning-essentials-1st-
edition-wiley/
textbookfull.com
https://textbookfull.com/product/sara-moulton-s-home-cooking-101-how-
to-make-everything-taste-better-first-edition-sara-moulton/
textbookfull.com
Stem Cell Genetics for Biomedical Research Raul Delgado-
Morales
https://textbookfull.com/product/stem-cell-genetics-for-biomedical-
research-raul-delgado-morales/
textbookfull.com
https://textbookfull.com/product/pricing-done-right-the-pricing-
framework-proven-successful-by-the-worlds-most-profitable-
companies-1st-edition-smith/
textbookfull.com
https://textbookfull.com/product/numerical-methods-and-optimization-
in-finance-manfred-gilli/
textbookfull.com
Highlights Of The 2020 American Heart Association AHA
Guidelines for CPR and ECC 1st Edition Eric J. Lavonas
https://textbookfull.com/product/highlights-of-the-2020-american-
heart-association-aha-guidelines-for-cpr-and-ecc-1st-edition-eric-j-
lavonas/
textbookfull.com
Deep Learning
and Linguistic
Representation
Chapman & Hall/CRC Machine Learning & Pattern Recognition
Introduction to Machine Learning with Applications in Information Security
Mark Stamp
Bayesian Programming
Pierre Bessiere, Emmanuel Mazer, Juan Manuel Ahuactzin, Kamel Mekhnacha
Shalom Lappin
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
The right of Shalom Lappin to be identified as author of this work has been asserted by him in accor-
dance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
sions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
Preface xi
vii
viii Contents
Over the past 15 years deep learning has produced a revolution in artifi-
cial intelligence. In natural language processing it has created robust,
large coverage systems that achieve impressive results across a wide
range of applications, where these were resistant to more traditional
machine learning methods, and to symbolic approaches. Deep neural
networks (DNNs) have become dominant throughout many domains of
AI in general, and in NLP in particular, by virtue of their success as
engineering techniques.
Recently, a growing number of computational linguists and cognitive
scientists have been asking what deep learning might teach us about the
nature of human linguistic knowledge. Unlike the early connectionists of
the 1980s, these researchers have generally avoided making claims about
analogies between deep neural networks and the operations of the brain.
Instead, they have considered the implications of these models for the
cognitive foundations of natural language, in nuanced and indirect ways.
In particular, they are interested in the types of syntactic structure that
DNNs identify, and the semantic relations that they can recognise. They
are concerned with the manner in which DNNs represent this informa-
tion, and the training procedures through which they obtain it.
This line of research suggests that it is worth exploring points of
similarity and divergence between the ways in which DNNs and humans
encode linguistic information. The extent to which DNNs approach (and,
in some cases, surpass) human performance on linguistically interesting
NLP tasks, through efficient learning, gives some indication of the capac-
ity of largely domain general computational learning devices for language
learning. An obvious question is whether humans could, in principle, ac-
quire this knowledge through similar sorts of learning processes.
This book draws together work on deep learning applied to natural
language processing that I have done, together with colleagues, over the
past eight years. It focusses on the question of what current methods
of machine learning can contribute to our understanding of the way in
which humans acquire and represent knowledge of the syntactic and
xi
xii Preface
Shashi Kumar, the Latex support person, have given me much needed
assistance throughout the production process.
My family, particularly my children and grandchildren, are the source
of joy and wider sense of purpose needed to complete this, and many
other projects. While they share the consensus that I am a hopeless nerd,
they assure me that the scientific issues that I discuss here are worth-
while. They frequently ask thoughtful questions that help to advance my
thinking on these issues. They remain permanently surprised that some-
one so obviously out of it could work in such a cool field. In addition to
having them, this is indeed one of the many blessings I enjoy. Above all,
my wife Elena is a constant source of love and encouragement. Without
her none of this would be possible.
The book was written in the shadow of the covid 19 pandemic. This
terrible event has brought at least three phenomena clearly into view.
First, it has underlined the imperative of taking the results of scientific
research seriously, and applying them to public policy decisions. Leaders
who dismiss well motivated medical advice, and respond to the crisis
through denial and propaganda, are inflicting needless suffering on their
people. By contrast, governments that allow themselves to be guided by
well supported scientific work have been able to mitigate the damage
that the crisis is causing.
Second, the crisis has provided a case study in the damage that
large scale campaigns of disinformation, and defamation, can cause to
the health and the well-being of large numbers of people. Unfortunately,
digital technology, some of it involving NLP applications, has provided
the primary devices through which these campaigns are conducted. Com-
puter scientists working on these technologies have a responsibility to
address the misuse of their work for socially destructive purposes. In
many cases, this same technology can be applied to filter disinformation
and hate propaganda. It is also necessary to insist that the agencies for
which we do this work be held accountable for the way in which they
use it.
Third, the pandemic has laid bare the devastating effects of extreme
economic and social inequality, with the poor and ethnically excluded
bearing the brunt of its effects. Nowhere has this inequality been more
apparent than in the digital technology industry. The enterprises of this
industry sustain much of the innovative work being done in deep learn-
ing. They also instantiate the sharp disparities of wealth, class, and
opportunity that the pandemic has forced into glaring relief. The engi-
neering and scientific advances that machine learning is generating hold
xiv Preface
out the promise of major social and environmental benefit. In order for
this promise to be realised, it is necessary to address the acute deficit in
democratic accountability, and in equitable economic arrangements that
the digital technology industry has helped to create.
Scientists working in this domain can no longer afford to treat these
problems as irrelevant to their research. The survival and stability of
the societies that sustain this research depend on finding reasonable
solutions to them.
NLP has blossomed into a wonderfully vigorous field of research.
Deep learning is still in its infancy, and it is likely that the architec-
tures of its systems will change radically in the near future. By using it
to achieve perspective on human cognition, we stand to gain important
insight into linguistic knowledge. In pursuing this work it is essential
that we pay close attention to the social consequences of our scientific
research.
Shalom Lappin
London
October, 2020
CHAPTER 1
Introduction: Deep
Learning in Natural
Language Processing
1
2 Introduction: Deep Learning in Natural Language Processing
which confirms that this effect is a genuine property of the data, rather
than regression to the mean.
Lau et al. (2020) expand the set of neural language models to
include unidirectional and bidirectional transformers. They find that
bidirectional, but not unidirectional, transformers approach a plausible
estimated upper bound on individual human prediction of sentence ac-
ceptability, across context types. This result raises interesting questions
concerning the role of directionality in human sentence processing.
In Chapter 5 I discuss whether DNNs, particularly those described
in previous chapters, offer cognitively plausible models of linguistic rep-
resentation and language acquisition. I suggest that if linguistic theories
provide accurate explanations of linguistic knowledge, then NLP systems
that incorporate their insights should perform better than those that
do not, and I explore whether these theories, specifically those of for-
mal syntax, have, in fact, made significant contributions to solving NLP
tasks. Answering this question involves looking at more recent DNNs
enriched with syntactic structure. I also compare DNNs with grammars,
as models of linguistic knowledge. I respond to criticisms that Sprouse,
Yankama, Indurkhya, Fong, and Berwick (2018) raise against Lau, Clark,
and Lappin (2017)’s work on neural language models for the sentence
acceptability task to support the view that syntactic knowledge is proba-
bilistic rather than binary in nature. Finally, I consider three well-known
cases from the history of linguistics and cognitive science in which the-
orists reject an entire class of models as unsuitable for encoding human
linguistic knowledge, on the basis of the limitations of a particular mem-
ber of the class. The success of more sophisticated models in the class
has subsequently shown these inferences to be unsound. They represent
influential cases of over reach, in which convincing criticism of a fairly
simple computational model is used to dismiss all models of a given
type, without considering straightforward improvements that avoid the
limitations of the simpler system.
I conclude Chapter 5 with a discussion of the application of deep
learning to distributional semantics. I first briefly consider the type the-
oretic model that Coecke, Sadrzadeh, and Clark (2010) and Grefenstette,
Sadrzadeh, Clark, Coecke, and Pulman (2011) develop to construct com-
positional interpretations for phrases and sentences from distributional
vectors, on the basis of the syntactic structure specified by a pregroup
grammar. This view poses a number of conceptual and empirical prob-
lems. I then suggest an alternative approach on which semantic interpre-
tation in a deep learning context is an instance of sequence to sequence
4 Introduction: Deep Learning in Natural Language Processing
NLP tasks, we stand to gain useful insights into possible ways of encoding
and representing natural language as part of the language learning pro-
cess. We will also deepen our understanding of the relative contributions
of domain general induction procedures on one hand, and language spe-
cific learning biases on other, to the success and efficiency of this process.
(ii) one or more hidden layers, in which units (neurons) compute the
weights for components of the data, and
ezi
sof tmax(z)i = P zj
je
This function applies the exponential function to each input value, and
normalises it by dividing it with the sum of the exponentials for all the
inputs, to insure that the output values sum to 1. The softmax function
is widely used in the output layer of a DNN to generate a probability
distribution for a classifier, or for a probability model.
Words are represented in a DNN by vectors of real numbers. Each
element of the vector expresses a distributional feature of the word.
These features are the dimensions of the vectors, and they encode its
co-occurrence patterns with other words in a training corpus. Word em-
beddings are generally compressed into low dimensional vectors (200–300
dimensions) that express similarity and proximity relations among the
words in the vocabulary of a DNN model. These models frequently use
large pre-trained word embeddings, like word2vec (Mikolov, Kombrink,
Deoras, Burget, & Èernocký, 2011) and GloVe (Pennington, Socher, &
Manning, 2014), compiled from millions of words of text.
In supervised learning a DNN is trained on data annotated with
the features that it is learning to predict. For example, if the DNN is
learning to identify the objects that appear in graphic images, then its
training data may consist of large numbers of labelled images of the
objects that it is intended to recognise in photographs. In unsupervised
10 Introduction: Deep Learning in Natural Language Processing
learning the training data are not labelled.5 A generative neural language
model may be trained on large quantities of raw text. It will generate the
most likely word in a sequence, given the previous words, on the basis
of the probability distribution over words, and sequences of words, that
it estimates from the unlabelled training corpus.
ht h0 h1 h2 ht
A = A A A A
xt x0 x1 x2 ... xt
Author: T. D. Hamm
Language: English
By T. D. HAMM
Illustrated by ADKINS
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.