Deep Learning Applications, Volume 2 M. Arif Wani all chapter instant download
Deep Learning Applications, Volume 2 M. Arif Wani all chapter instant download
com
https://textbookfull.com/product/deep-learning-applications-
volume-2-m-arif-wani/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-loucas/
textboxfull.com
https://textbookfull.com/product/deep-learning-vol-2-from-basics-to-
practice-andrew-glassner/
textboxfull.com
https://textbookfull.com/product/computational-methods-for-deep-
learning-theoretic-practice-and-applications-wei-qi-yan/
textboxfull.com
Deep Learning in Python An Object Oriented Programming 1st
Edition Hong M. Lei [Lei
https://textbookfull.com/product/deep-learning-in-python-an-object-
oriented-programming-1st-edition-hong-m-lei-lei/
textboxfull.com
https://textbookfull.com/product/deep-learning-in-computer-vision-
principles-and-applications-first-edition-edition-mahmoud-hassaballah/
textboxfull.com
https://textbookfull.com/product/deep-learning-book-ian-goodfellow/
textboxfull.com
https://textbookfull.com/product/deep-learning-on-windows-building-
deep-learning-computer-vision-systems-on-microsoft-windows-thimira-
amaratunga/
textboxfull.com
Advances in Intelligent Systems and Computing 1232
M. Arif Wani
Taghi M. Khoshgoftaar
Vasile Palade Editors
Deep Learning
Applications,
Volume 2
Advances in Intelligent Systems and Computing
Volume 1232
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
Vasile Palade
Editors
123
Editors
M. Arif Wani Taghi M. Khoshgoftaar
Department of Computer Science Computer and Electrical Engineering
University of Kashmir Florida Atlantic University
Srinagar, India Boca Raton, FL, USA
Vasile Palade
Faculty of Engineering and Computing
Coventry University
Coventry, UK
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
Machine learning algorithms have influenced many aspects of our day-to-day living
and transformed major industries around the world. Fueled by an exponential
growth of data, improvements in computer hardware, scalable cloud resources, and
accessible open-source frameworks, machine learning technology is being used by
companies in big and small alike for innumerable applications. At home, machine
learning models are suggesting TV shows, movies, and music for entertainment,
providing personalized ecommerce suggestions, shaping our digital social net-
works, and improving the efficiency of our appliances. At work, these data-driven
methods are filtering our emails, forecasting trends in productivity and sales, tar-
geting customers with advertisements, improving the quality of video conferences,
and guiding critical decisions. At the frontier of machine learning innovation are
deep learning systems, a class of multi-layered networks is capable of automatically
learning meaningful hierarchical representations from a variety of structured and
unstructured data. Breakthroughs in deep learning allow us to generate new rep-
resentations, extract knowledge, and draw inferences from raw images, video
streams, text and speech, time series, and other complex data types. These powerful
deep learning methods are being applied to new and exciting real-world problems in
medical diagnostics, factory automation, public safety, environmental sciences,
autonomous transportation, military applications, and much more.
The family of deep learning architectures continues to grow as new methods and
techniques are developed to address a wide variety of problems. A deep learning
network is composed of multiple layers that form universal approximators capable
of learning any function. For example, the convolutional layers in Convolutional
Neural Networks use shared weights and spatial invariance to efficiently learn
hierarchical representations from images, natural language, and temporal data.
Recurrent Neural Networks use backpropagation through time to learn from vari-
able length sequential data. Long Short-Term Memory networks are a type of
recurrent network capable of learning order dependence in sequence prediction
problems. Deep Belief Networks, Autoencoders, and other unsupervised models
generate meaningful latent features for downstream tasks and model the underlying
concepts of distributions by reconstructing their inputs. Generative Adversarial
v
vi Preface
vii
viii Contents
ix
x Editors and Contributors
conferences and has given many invited talks at various venues. Also, he has served
as North American Editor of the Software Quality Journal, was on the editorial
boards of the journals Multimedia Tools and Applications, Knowledge and
Information Systems, and Empirical Software Engineering, and is on the editorial
boards of the journals Software Quality, Software Engineering and Knowledge
Engineering, and Social Network Analysis and Mining.
Contributors
Abstract The term “information overload” has gained popularity over the last few
years. It defines the difficulties people face in finding what they want from a huge
volume of available information. Recommender systems have been recognized to be
an effective solution to such issues, such that suggestions are made based on users’
preferences. This chapter introduces an application of deep learning techniques in
the domain of recommender systems. Generally, collaborative filtering approaches,
and Matrix Factorization (MF) techniques in particular, are widely known for their
convincing performance in recommender systems. We introduce a Collaborative
Attentive Autoencoder (CATA) that improves the matrix factorization performance
by leveraging an item’s contextual data. Specifically, CATA learns the proper features
from scientific articles through the attention mechanism that can capture the most
pertinent parts of information in order to make better recommendations. The learned
features are then incorporated into the learning process of MF. Comprehensive exper-
iments on three real-world datasets have shown our method performs better than other
state-of-the-art methods according to various evaluation metrics. The source code of
our model is available at: https://github.com/jianlin-cheng/CATA.
This chapter is an extended version of our published paper at the IEEE ICMLA conference 2019
[1]. This chapter incorporates new experimental contributions compared to the original confere-
nce paper.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 1
to Springer Nature Singapore Pte Ltd. 2021
M. A. Wani et al. (eds.), Deep Learning Applications, Volume 2,
Advances in Intelligent Systems and Computing 1232,
https://doi.org/10.1007/978-981-15-6759-9_1
2 M. Alfarhood and J. Cheng
1 Introduction
The era of e-commerce has vastly changed people’s lifestyles during the first part
of the twenty-first century. People today tend to do many of their daily routines
online, such as shopping, reading the news, and watching movies. Nevertheless,
consumers often face difficulties while exploring related items such as new fashion
trends because they are not aware of their existence due to the overwhelming amount
of information available online. This phenomenon is widely known as “information
overload”. Therefore, Recommender Systems (RSs) are a critical solution for helping
users make decisions when there are lots of choices. RSs have been integrated into
and have become an essential part of every website due to their impact on increasing
customer interactions, attracting new customers, and growing businesses’ revenue.
Scientific article recommendation is a very common application for RSs. It keeps
researchers updated on recent related work in their field. One traditional way to
find relevant articles is to go through the references section in other articles. Yet,
this approach is biased toward heavily cited articles, such that new relevant articles
with higher impact have less chance to be found. Another method is to search for
articles using keywords. Although this technique is popular among researchers, they
must filter out a tremendous number of articles from the search results to retrieve
the most suitable articles. Moreover, all users get the same search results with the
same keywords, and these results are not personalized based on the users’ personal
interests. Thus, recommendation systems can address this issue and help scientists
and researchers find valuable articles while being aware of recent related work.
Over the last few decades, a lot of effort has been made by both academia and
industry on proposing new ideas and solutions for RSs, which ultimately help ser-
vice providers in adopting such models in their system architecture. The research in
RSs has evolved remarkably following the Netflix prize competition1 in 2006, where
the company offered one million dollars for any team that could improve their rec-
ommendation accuracy by 10%. Since that time, collaborative filtering models and
matrix factorization techniques in particular have become the most common models
due to their effective performance. Generally, recommendation models are classified
into three categories: Collaborative Filtering Models (CF), Content-Based Filter-
ing models (CBF), and hybrid models. CF models [2–4] focus on users’ histories,
such that users with similar past behaviors tend to have similar future tastes. On the
other hand, CBF models work by learning the item’s features from its informational
description, such that two items are possibly similar to each other if they share more
characteristics. For example, two songs are similar to each other if they both share
the same artist, genre, tempo, energy, etc. However, similarities between items in CF
models are different such that two items are likely similar to each other once they are
rated by multiple users in the same manner, even though those items have different
characteristics.
1 www.netflixprize.com.
Deep Learning-Based Recommender Systems 3
2 www.citeulike.org.
4 M. Alfarhood and J. Cheng
against multiple recent works. The experimental results prove that our model can
extract more constructive information from an article’s contextual data than other
models. More importantly, CATA performs very well where the data sparsity is
extremely high.
The remainder of this chapter is organized in the following manner. First, we
demonstrate the matrix factorization method in Sect. 2. We introduce our model,
CATA, in Sect. 3. The experimental results of our model against the state-of-the-art
models are discussed thoroughly in Sect. 4. We then conclude our work in Sect. 5.
2 Background
Matrix Factorization (MF) [2] is the most popular CF method, mainly due to its
simplicity and efficiency. The idea behind MF is to decompose the user-item matrix,
R ∈ Rn×m , into two lower dimensional matrices, U ∈ Rn×d and V ∈ Rm×d , such
that the inner product of U and V will approximate the original matrix R, where d
is the dimension of the latent factors, such that d min(n, m). n and m correspond
to the number of users and items in the system. Figure 1 illustrates the MF process.
R ≈ U · VT (1)
where Ii j is an indicator function that equals 1 if useri has rated item j , and 0 if
otherwise. Also, ||U || and ||V || are the Euclidean norms, and λu , λv are two regu-
larization terms preventing the values of U and V from being too large. This avoid
model overfitting.
Explicit data, such as ratings (ri j ) are not regularly available. Therefore, Weighted
Regularized Matrix Factorization (WRMF) [9] introduces two modifications to the
previous objective function to make it work for implicit feedback. The optimization
Deep Learning-Based Recommender Systems 5
process in this case runs through all user-item pairs with different confidence levels
assigned to each pair, as in the following:
ci j λu λv
L= ( pi j − u i v Tj )2 + u i 2 + v j 2 (3)
i, j∈R
2 2 2
where pi j is the user preference score with a value of 1 when useri and item j have
an interaction, and 0 otherwise. ci j is a confidence variable where its value shows
how confident the user like the item. In general, ci j = a when pi j = 1, and ci j = b
when pi j = 0, such that a > b > 0.
Stochastic Gradient Decent (SGD) [10] and Alternating Least Squares (ALS) [11]
are two optimization methods that can be used to minimize the objective function
of MF in Eq. 2. The first method, SGD, loops over each single training sample and
then computes the prediction error as ei j = ri j − u i v Tj . The gradient of the objective
function with respect to u i and v j can be computed as follows:
∂L
=− Ii j (ri j − u i v Tj )v j + λu u i
∂u i j
(4)
∂L
=− Ii j (ri j − u i v Tj )u i + λv v j
∂v j i
After calculating the gradient, SGD updates the user and item latent factors in the
opposite direction of the gradient using the following equations:
6 M. Alfarhood and J. Cheng
⎛ ⎞
ui ← ui + α ⎝ Ii j ei j v j − λu u i ⎠
j
(5)
vj ← vj + α Ii j ei j u i − λ j v j
i
∂L
=− ci j ( pi j − u i v Tj )v j + λu u i
∂u i j
0 = −Ci (Pi − u i V T )V + λu u i
0 = −Ci V Pi + Ci V u i V T + λu u i (6)
V Ci Pi = u i V Ci V + λu u i
T
V Ci Pi = u i (V Ci V T + λu I )
ui = V Ci Pi (V Ci V T + λu I )−1
ui = (V Ci V T + λu I )−1 V Ci Pi
v j = (U C j U T + λv I )−1 U C j P j (7)
3 Proposed Model
In this section, we illustrate our proposed model in depth. The intuition behind our
model is to learn the latent factors of items in PMF with the use of available side
textual contents. We use an attentive unsupervised model to catch more plentiful
information from the available data. The architecture of our model is displayed in
Fig. 2. We first define the problem with implicit feedback before we go through the
details of our model.
Deep Learning-Based Recommender Systems 7
λu λv
X̂ j
Decoder
Attention
Ui Vj Zj X Softmax
Rij Encoder
Xj
i = 1:n j = 1:m
we treat all missing data as unobserved data without considering including negative
feedback in the model training, the corresponding trained model is probably useless
since it is only trained on positive data. As a result, sampling negative feedback
from positive feedback is one practical solution for this problem, which has been
proposed by [12]. In addition, Weighted Regularized Matrix Factorization (WRMF)
[9] is another proposed solution that introduces a confidence variable that works as
a weight to measure how likely a user is to like an item.
In general, the recommendation problem with implicit data is usually formulated
as follows:
1, if there is user-item interaction
Rnm = (8)
0, otherwise
where the ones in implicit feedback represent all the positive feedback. However,
it is important to note that a value of 0 does not imply always negative feedback.
It may be that users are not aware of the existence of those items. In addition,
the user-item interaction matrix (R) is usually highly imbalanced, such that the
number of the observed interactions is much less than the number of the unobserved
interactions. In other words, matrix R is very sparse, meaning that users only interact
explicitly or implicitly with a very small number of items compared to the total
number of items in this matrix. Sparsity is one frequent problem in RSs, which brings
a real challenge for any proposed model to have the capability to provide effective
personalized recommendations under this situation. The following sections explain
our methodology, where we aim to eliminate the influence of the aforementioned
problems.
Autoencoder [13] is an unsupervised learning neural network that is useful for com-
pressing high-dimensional input data into a lower dimensional representation while
preserving the abstract nature of the data. The autoencoder network is generally
composed of two main components, i.e., the encoder and the decoder. The encoder
takes the input and encodes it through multiple hidden layers and then generates a
compressed representative vector, Z j . The encoding function can be formulated as
Z j = f (X j ). Subsequently, the decoder can be used then to reconstruct and estimate
the original input, Xˆ j , using the representative vector, Z j . The decoder function can
be formulated as Xˆ j = f (Z j ). Each the encoder and the decoder usually consist of
the same number of hidden layers and neurons. The output of each hidden layer is
computed as follows:
where () is the layer number, W is the weights matrix, b is the bias vector, and σ
is a non-linear activation function. We use the Rectified Linear Unit (ReLU) as the
activation function.
Our model takes input from the article’s textual data, X j = {x 1 , x 2 , . . . , x s },
where x i is a value between [0, 1] and s represents the vocabulary size of the arti-
cles’ titles and abstracts. In other words, the input of our autoencoder network is
a normalized bag-of-words histograms of filtered vocabularies of the articles’ titles
and abstracts.
Batch Normalization (BN) [14] has been proven to be a proper solution for the
internal covariant shift problem, where the layer’s input distribution in deep neural
networks changes across the time of training, and causes difficulty in training the
model. In addition, BN can work as a regularization procedure like Dropout [15]
in deep neural networks. Accordingly, we apply a batch normalization layer after
each hidden layer in our autoencoder to obtain a stable distribution from each layer’s
output.
Furthermore, we use the idea of the attention mechanism to work between the
encoder and the decoder, such that only the relevant parts of the encoder output are
selected for the input reconstruction. Attention in deep learning can be described
simply as a vector of weights to show the importance of the input elements. Thus,
the intuition behind attention is that not all parts of the input are equally significant,
i.e., only few parts are significant for the model. We first calculate the scores as the
probability distribution of the encoder’s output using the so f tmax(.) function.
ezc
f (z c ) = zd
(10)
de
The probability distribution and the encoder output are then multiplied using
element-wise multiplication function to get Z j .
We use the attentive autoencoder to pretrain the items’ contextual information
and then integrate the compressed representation, Z j , in computing the items’ latent
factors, V j , from the matrix factorization method. The dimension space of Z j and V j
are set to be equal to each other. Finally, we adopt the binary cross-entropy (Eq. 11)
as the loss function we want to minimize in our attentive autoencoder model.
L=− yk log( pk ) − (1 − yk ) log(1 − pk ) (11)
k
u i ∼ N (0, λ−1
u I)
v j ∼ N (0, λ−1
v I) (12)
pi j ∼ N (u i v Tj , σ 2 )
We integrate the items’ contents, trained through the attentive autoencoder, into
PMF. Therefore, the objective function in Eq. 3 has been changed slightly to become
ci j λu λv
L= ( pi j − u i v Tj )2 + u i 2 + v j − θ (X j )2 (13)
i, j∈R
2 2 2
where θ (X j ) = Encoder (X j ) = Z j .
Thus, taking the partial derivative of our previous objective function with respect
to both u i and v j results in the following equations that minimize our objective
function the most
u i = (V Ci V T + λu I )−1 V Ci Pi
(14)
v j = (U C j U T + λv I )−1 U C j P j + λv θ (X j )
We optimize the values of u i and v j using the Alternating Least Squares (ALS)
optimization method.
3.4 Prediction
After our model has been trained and the latent factors of users and articles, U and
V , are identified, we calculate our model’s prediction scores of useri and each article
as the dot product of vector u i with all vectors in V as scor esi = u i V T . Then, we
sort all articles based on our model predication scores in descending order, and then
recommend the top-K articles for that useri . We go through all users in U in our
evaluation and report the average performance among all users. The overall process
of our approach is illustrated in Algorithm 1.
Deep Learning-Based Recommender Systems 11
4 Experiments
4.1 Datasets
Three scientific article datasets are used to evaluate our model against the state-of-
the-art methods. All datasets are collected from CiteULike website. The first dataset
is called Citeulike-a, which is collected by [5]. It has 5,551 users, 16,980 articles, and
204,986 user-article pairs. The sparseness of this dataset is extremely high, where
only around 0.22% of the user-article matrix has interactions. Each user has at least
12 M. Alfarhood and J. Cheng
ten articles in his or her library. On average, each user has 37 articles in his or her
library and each article has been added to 12 users’ libraries. The second dataset is
called Citeulike-t, which is collected by [6]. It has 7,947 users, 25,975 articles, and
134,860 user-article pairs. This dataset is actually sparser than the first one with only
0.07% available user-article interactions. Each user has at least three articles in his
or her library. On average, each user has 17 articles in his or her library and each
article has been added to five users’ libraries. Lastly, Citeulike-2004–2007 is the third
dataset, and it is collected by [16]. It is three times bigger than the previous ones with
regard to the user-article matrix. It has 3,039 users, 210,137 articles, and 284,960
user-article pairs. This dataset is the sparsest in this experiment, with a sparsity equal
to 99.95%. Each user has at least ten articles in his or her library. On average, each
user has 94 articles in his or her library and each article has been added only to one
user library. Brief statistics of the datasets are shown in Table 1.
Title and abstract of each article are given in each dataset. The average number
of words per article in both title and abstract after our text preprocessing is 67 words
in Citeulike-a, 19 words in Citeulike-t, and 55 words in Citeulike-2004–2007. We
follow the same preprocessing techniques as the state-of-the-art models in [5, 7,
8]. A five-stage procedure to preprocess the textual content is displayed in Fig. 3.
Each article title and abstract are combined together and then are preprocessed such
that stop words are removed. After that, top-N distinct words based on the TF-IDF
measurement are picked out. 8,000 distinct words are selected for the Citeulike-a
dataset, 20,000 distinct words are selected for the Citeulike-t dataset, and 19,871
distinct words are selected for the Citeulike-2004–2007 dataset to form the bag-of-
words histogram, which are then normalized into values between 0 and 1 based on
the vocabularies’ occurrences.
Figure 4 shows the ratio of articles that have been added to five or fewer users’
libraries. For example, 15, 77, and 99% of the articles in Citeulike-a, Citeulike-t, and
Citeulike-2004–2007, respectively, are added to five or fewer users’ libraries. Also,
only 1% of the articles in Citeulike-a have been added only to one user library, while
the rest of the articles have been added to more than this number. On the contrary,
13, and 77% of the articles in Citeulike-t and Citeulike-2004–2007 have been added
only to one user library. This proves the sparseness of the data with regard to articles
as we go from one dataset to another.
We follow the state-of-the-art techniques [6–8] to generate our training and testing
sets. For each dataset, we create two versions of the dataset for sparse and dense
settings. In total, six dataset cases are used in our evaluation. To form the sparse
(P = 1) and the dense (P = 10) datasets, P items are randomly selected from each
user library to generate the training set while the remaining items from each user
library are used to generate the testing set. As a result, when P = 1, only 2.7, 5.9,
and 1.1% of the data entries are used to generate the training set in Citeulike-a,
Citeulike-t, and Citeulike-2004–2007, respectively. Similarly, 27.1, 39.6, and 10.7%
of the data entries are used to generate the training set when P = 10 as Fig. 5 shows.
14 M. Alfarhood and J. Cheng
Fig. 5 The percentage of the data entries that forms the training and testing sets in all citeulike
datasets
We use recall and Discounted Cumulative Gain (DCG) as our evaluation metrics
to test how our model performs. Recall is usually used to evaluate recommender
systems with implicit feedback. However, precision is not favorable to use with
implicit feedback because the zero value in the user-article interaction matrix has
two meanings: either the user is not interested in the article, or the user is not aware
of the existence of this article. Therefore, using the precision metric only assumes
that for each zero value the user is not interested in the article, which is not the case.
Recall per user can be measured using the following formula:
where |U | is the total number of users, i is the rank of the top-K articles recommended
by the model, and rel(i) is an indicator function that outputs 1 if the article at rank i
is a relevant article, and 0 otherwise.
Deep Learning-Based Recommender Systems 15
4.3 Baselines
For each dataset, we repeat the data splitting four times with different random splits
of training and testing set, which has been previously described in the evaluation
methodology section. We use one split as a validation experiment to find the optimal
parameters of λu and λv for our model and the state-of-the-art models as well. We
search a grid of the following values {0.01, 0.1, 1, 10, 100} and the best values on
the validation experiment have been reported in Table 2. The other three splits are
used to report the average performance of our model against the baselines. In this
section, we address the research questions that have been previously defined in the
beginning of this section.
16 M. Alfarhood and J. Cheng
4.4.1 RQ1
To evaluate how our model performs, we conduct quantitative and qualitative com-
parisons to answer this question. Figures 6, 7, 8, and 9 show the performance of the
top-K recommendations under the sparse and dense settings in terms of recall and
DCG. First, the sparse cases are very challenging for any proposed model since there
is less data for training. In the sparse setting where there is only one article in each
user’s library in the training set, our model, CATA, outperforms the baselines in all
datasets in terms of recall and DCG, as Figs. 6 and 7 show. More importantly, CATA
outperforms the baselines by a wide margin in the Citeulike-2004–2007 dataset,
where it is actually sparser and contains a huge number of articles. This validates the
robustness of our model against data sparsity.
I have not adorned nor distended this book with ample cadences, nor
with precious or magnificent words or any other extrinsic charm or
ornament, such as many are wont to use for descriptive decoration; for I
have wished that nothing might win it praise, in other words that it should
be acceptable only for the truth of its matter and the gravity of its subject
(Dedication to Lorenzo).
Since my object is to write something useful to him who understands it,
I have thought it more fitting to follow rather the effectual truth of the
thing itself than its concept [immaginazione] (Opening of xv).
Dividing it into human, natural, and divine, he would have us begin with
a chronological reference table (ii), proceed to a more detailed survey,
such as Funck’s or Melanchthon’s, advance to the histories of particular
nations, Jews, Greeks, Romans, and then to such smaller communities as
Rhodes, Venice, and Sicily, with constant attention to geography.
In iii, De locis historiarum recte instituendis, the topics are first the
commonplaces of encomium: birth, endowments, achievements, morals,
culture. From the family, which for Bodin is the starting point of history,
we are to proceed to the organization of the state and the developments
of the arts.
De historicorum delectu (iv) has many specific and acute estimates of
both ancients and moderns. “Somehow those who are active in wars and
affairs (44) shy at writing; and those who have given themselves
somewhat more to literature are so possessed with its charms and
sweetness as hardly to think in other terms.” Bodin himself is broad
enough to praise both Plutarch and Tacitus.
De recto historiarum iudicio (v), beginning with geography, proceeds to
regional traits. The approach is suggestive; but the development is little
more than aggregation under those dubious headings Northern and
Southern, Eastern and Western.
At this point (vi) Bodin begins the analysis of the state: the elemental
family, the citizen, the magistrate, the king. “Macchiavelli, indeed, the first
after some twelve hundred years since the barbarians to write on the
state, has won general currency; but there is no doubt that he would have
written several things more truly and better if he had added legal tradition
(usus) to his knowledge of ancient philosophers and historians” (140).
Monarchy is found to be the ideal form of government. The golden age of
primitive peace and happiness is proved to be a senile fancy (vii). Let us
rather, relying on the science of numbers, De temporis universi ratione
(viii), compute the recurrence of historical “cycles.” Strange conclusion to
so much hard reasoning!
2. MONTAIGNE
The other kind of essay, the literary form that has kept the original
meaning of attempt, sketch, experiment, had its pace set late in the
sixteenth century by Montaigne. Nothing could be farther removed
than his habit from tidy system or consecutive argument. Devoted to
the reading of history, and eager to share its profits, he had no mind
to follow the Italian tradition of writing history. Essai in his practice is
not the settling of a subject, but the trying. He makes one approach,
then another, suggesting relations that he does not carry out. With
many exempla he invites us to accumulate philosophy of living. If we
do not coöperate, if we do not think them over, his essays remain
collections of items in memorable phrase, without compelling
sequence of ideas. For Montaigne is not the kind of philosopher who
integrates a system; he is a sage. He has the sage’s oral habit. No
writing conveys more the impression of thinking aloud. Again and
again he writes as if making up his mind, not before utterance, but
by the very process of utterance. Macchiavelli, or Bodin, having
made up his mind fully and finally, tries to convince us; Montaigne,
as if making up his in our company, throws out suggestions.
True, some few of his essays are more consecutive developments
of what he has concluded. His early and widely quoted Education of
Children (II. xxvi) has even some logical progress.
But logical sequence is not Montaigne’s habit. His many
revisions[89] show him leaning more and more on the aggregation of
separate suggestions. He changes words, he adds instances, but he
does not seek a stricter order.
But I am going off a little to the left of my theme.... I, who take more
pains with the weight and usefulness of my discourses than with their
order and sequence, need not fear to lodge here, a little off the track, a
fine story (II. xxvii).
This bundling of so many various pieces is made on condition that I put
hand to it only when urged by too lax a leisure, and only when I am at
home (II. xxxvii, opening).
They themselves do not yet know what they mean, and you see them
stammer in bringing it forth, and judge that their labor is not in childbirth,
but in conception, and that they are only licking what is not yet formed (I.
xxvi).
(1) They do still worse who keep the revelation of some intention of
hatred toward their neighbor for their last will,
(2) having hid it during their lives,
(3) and show that they care little for their own honor,
(4) irritating the offense by bringing it to mind,
(5) instead of bringing it to conscience,
(6) not knowing how, even in view of death, to let their grudge die,
(7) and extending its life beyond their own. (I. vii.)
(1) Nature has furnished us, as with feet for walking, so with foresight
to guide our lives,
(2) foresight not so ingenious, robust, and pretentious as the sort that
explores (invention),
(3) but as things come, easy, quiet, and healthful,
(4) and doing very well what other people say,
(5) in those who have the knack of using it simply and regularly,
(6) that is to say, naturally. (III. xiii.)
It is not a soul, not a body, that we are educating; it is a man (I. xxvi).
Unable to regulate events, I regulate myself, and adjust myself to them
if they do not adjust themselves to me (II. xvii).
The teaching that could not reach their souls has stayed on their lips
(III. iii).
Between ourselves, two things have always seemed to me in singular
accord, supercelestial opinions and subterranean morals (III. xiii).
Such sentences, such diction, are not only his practice; they are
part of his literary theory.
The speech that I like is simple and direct, the same on paper as on the
lips, speech succulent and prompt (nerveux), curt and compact, not so
much delicate and smoothed as vehement and brusque—Haec demum
sapiet dictio quae feriet—rather tough than tiresome, shunning
affectation, irregular, loose, and bold, each bit for itself, not pedantic, not
scholastic, not legal, but rather soldierly (I. xxvi).
The urgent metrical sentence of poetry seems to me to soar far more
suddenly and strike with a sharper shock [The figure is of a falcon] (I.
xxvi).
These good people (Vergil and Lucretius) had no need of keen and
subtle antitheses. Their diction is all full, and big with a natural and
constant force. They are all epigram, not only the tail, but the head, the
stomach, and the feet.... It is an eloquence not merely soft and faultless;
it is prompt and firm, not so much pleasing as filling and quickening the
strongest minds. When I see those brave forms of expression, so vivid, so
deep, I do not call it good speaking; I call it good thinking (III. v).[91]
Going to war only after having announced it, and often after having
assigned the hour and place of battle (I. v).
Those Lancelots, Amadis, Huons, and such clutter of books to amuse
children (I. xxvi).
I think it has been raised to the highest degree it will ever attain; and in
those directions in which Ronsard and Du Bellay excel I find them hardly
below the ancient perfection (II. xvii).
These, and even Cicero and Vergil, he sought not for style, but for
philosophy and morals. That sounder classicism of composition
which, through the Italian tradition of history, had animated
Renaissance essayists of the stricter sort he put aside. He was not
interested in the ancient rhetoric of composition, nor, to judge from
his slight attention to it, in that field of ancient poetic. He quotes
both Dante and Tasso, but not in that aspect. He is not interested in
the growing appreciation of Aristotle’s Poetic. In this disregard of
composition, indeed, he was of the Renaissance; but he rejected and
even repudiated Renaissance pursuit of classicism in style. There he
adopted the sound doctrine of Quintilian and scornfully, to use his
own word, abjured borrowed plumes and decorative dilation. If we
use the word classical in its typical Renaissance connotation, we
must call Montaigne, as well as Rabelais, anti-classical. Unlike as
they are otherwise, they agree in satirizing Renaissance classicism.
The positive aspect of this rejection is Montaigne’s homely
concreteness. Trying to teach his readers, not to dazzle them, he is
very carefully specific. To leave no doubt of his meaning, he will
have it not merely accepted, but felt. Therefore he is more than
specific; he is concrete. Imagery for him is not mythology; it is of
native vintage.
“In this last scene between death and us there is no more
pretending. We have to speak French; we have to show how much
that is good and clean is left at the bottom of the pot” (I. xix). Such
expression strikes us not as wit, not as an aristocrat’s catering to the
new public, but as the sincere use of sensory terms to animate
ideas. If it reminds us sometimes of popular preaching, that is
because Montaigne was a sage.
FOOTNOTES
[1] In H. Chamard, Les Origines de la poésie française de la
Renaissance (Paris, 1920), p. 256.
[2] Bembo, Prose, II. xxi (Venice, 1525).
[3] Allen, Age of Erasmus, p. 121.
[4] É. Egger, L’Hellénisme en France (Paris, 1869), pp. 358-359;
see Monnier, II, 134 for modern estimate of Renaissance Greek
texts.
[5] Prose, I, vi (1525).
[6] Egger, p. 398.
[7] Ibid., p. 205.
[8] Edition of Osgood, pp. 119, 193.
[9] Probably the source of Rabelais’s Abbey of Thelème. He had
read the book.
[10] Page references to 1596 edition.
[11] Edited by Louis Humbert, Paris, 1914.
[12] Sir John Cheke, however, spoke as a scholar when he
wrote to Hoby: “I am of opinion that our own tung shold be
written cleane and pure, vnmixt and vnmangeled with borrowing
of other tunges.” Quoted in Arber’s Introduction to Ascham’s
Scholemaster, p. 5.
[13] Parodied by Orationes obscurorum virorum (before 1515),
which was part of the Reuchlin-Pfefferkorn controversy.
[14] This is the exercise called by the ancients declamatio. See
ARP (Ancient Rhetoric and Poetic) and a letter of Erasmus, May 1,
1506.
[15] Bartholomaei Riccii De imitatione libri tres (Venice, 1545),
folio 38 verso. See below, Chapter III, Sect. 3.
[16] MRP (Medieval Rhetoric and Poetic) I and II.
[17] Ep. 221 in Migne’s Patrologia latina (Vol. 199, p. 247),
which dates it 1167; Ep. 223, p. 389, in the collection of the
letters of Gerbert, John of Salisbury, and Stephen of Tournay
printed by Ruette (Paris, 1611). The letter is translated MRP 209.
[18] Apologia dei dialoghi, opening; p. 516 of the Venice, 1596,
edition.
[19] For De oratore, see ARP.
[20] Minturno, Arte poetica, is mere catechism. Perionius hardly
achieves dialogue at all; his interlocutors merely interrupt.
[21] Analecta hymnica.
[22] For the pattern of the classical rhetoric, see ARP.
[23] MRP.
[24] Paul Spaak, Jean Lemaire (Paris, 1926).
[25] Pierre Villey, Les Grands Écrivains du xviᵉ siècle, I, 83-97,
110-148.
[26] Evvres de Louize Labé, Lionnoize, revues et corrigées par
la dite dame, à Lion, par Jean de Tournes, MDLVI (dedicatory
epistle dated 1555).
[27] Each stanza of the Epithalamion ends with a longer line (6
beats), which is the common refrain. The other lines have
generally five beats, but the sixth and eleventh have only three;
and this variation is occasionally extended. Generally there is a
rhyme-shift after the eleventh line, but not a break (11 lines on 5
rhymes [or 4] plus 7 lines on 3 rhymes [or 4]). A few stanzas are
lengthened to nineteen lines (11 plus 8). Thus the typical
variations in this triumph of metrical interweaving are as follows,
the underlined letters indicating the lines of three beats:
Baif, 34
Balade, 10
Balsamo-Crivelli, Gustavo, 190
Bandello, Matteo, 190-94, 197, 198-99
Bankette of Sapience (Elyot), 227
Baptistes (Buchanan), 139
Basia (Secundus), 66
Bede, 9
Bellay, Cardinal du, 211
Bellay, Joachim du, 10, 32, 34, 69, 163, 238
Belleau, Remi, 21
Belleforest, 198
Bembo, 9, 15, 22, 26, 28, 30, 31, 36, 87, 170, 179, 188
Beolco, Angelo, see Ruzzante
Berni, 111
Besaucèle, 195n
Bessarion, Johannes, 20, 39
Beza, 179
Bibbiena, Cardinal, 179
Blason, 118
Boccaccio, 5, 6, 13, 14, 15, 20, 23, 27, 28, 29, 31, 32, 60, 67,
82, 84, 87, 104, 121, 153, 170, 177, 179, 185, 186, 194-96
Bodin, 228, 229, 230, 230n, 231, 232
Boece, Hector, 216
Boethius, 9, 25, 37, 85
Boiardo, Matteo Maria, 11, 91, 92, 93, 94, 101-11, 120, 124,
125, 127, 159
Bouchet, Jean, 134
Boulanger, André, 163
Bradamante (Garnier), 142
Brocardo, Jacopo, 63
Brome Abraham and Isaac, 136
Brown, H., 202n
Browne, Sir Thomas, 52
Bruni, Leonardo, 39, 214-17, 226
Buchanan, George, 17, 137-39, 179, 216, 226, 230, 231
Bucolics (Vergil), 80, 165
Bucolicum carmen (Petrarch), 82
Budé, Guillaume, 4, 22, 34, 207n
Bundy, M. W., 162
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com