Full Download Learning Representation for Multi-View Data Analysis: Models and Applications Zhengming Ding PDF DOCX
Full Download Learning Representation for Multi-View Data Analysis: Models and Applications Zhengming Ding PDF DOCX
com
https://textbookfull.com/product/learning-representation-
for-multi-view-data-analysis-models-and-applications-
zhengming-ding/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/linking-and-mining-heterogeneous-and-
multi-view-data-deepak-p/
textboxfull.com
https://textbookfull.com/product/data-analysis-in-the-cloud-models-
techniques-and-applications-1st-edition-marozzo/
textboxfull.com
https://textbookfull.com/product/machine-learning-and-big-data-
analytics-paradigms-analysis-applications-and-challenges-aboul-ella-
hassanien/
textboxfull.com
Time Series Analysis Methods and Applications for Flight
Data Zhang
https://textbookfull.com/product/time-series-analysis-methods-and-
applications-for-flight-data-zhang/
textboxfull.com
https://textbookfull.com/product/statistical-modeling-for-degradation-
data-1st-edition-ding-geng-din-chen/
textboxfull.com
https://textbookfull.com/product/electrolyzed-water-in-food-
fundamentals-and-applications-tian-ding/
textboxfull.com
https://textbookfull.com/product/practical-machine-learning-for-data-
analysis-using-python-1st-edition-abdulhamit-subasi/
textboxfull.com
Advanced Information and Knowledge Processing
Zhengming Ding
Handong Zhao
Yun Fu
Learning
Representation for
Multi-View Data
Analysis
Models and Applications
Advanced Information and Knowledge
Processing
Series editors
Lakhmi C. Jain
Bournemouth University, Poole, UK, and
University of South Australia, Adelaide, Australia
Xindong Wu
University of Vermont
Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably
combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index.
Yun Fu
Learning Representation
for Multi-View Data Analysis
Models and Applications
123
Zhengming Ding Yun Fu
Indiana University-Purdue Northeastern University
University Indianapolis Boston, MA, USA
Indianapolis, IN, USA
Handong Zhao
Adobe Research
San Jose, CA, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
transfer learning problem when all the sources are incomplete. Chapter 9 proposes
three deep domain adaptation models to address the challenge where target data has
limited or no label. Following this, Chap. 10 provides a deep domain generalization
model aiming to deal with the target domain that is not available in the training
stage while only with multiple related sources at hand.
In particular, this book can be used by these audiences in the background of
computer science, information systems, data science, statistics, and mathematics.
Other potential audiences can be attracted from broad fields of science and engi-
neering since this topic has potential applications in many disciplines.
We would like to thank our collaborators Ming Shao, Hongfu Liu, and Shuyang
Wang. We would also like to thank editor Helen Desmond from Springer for the
help and support.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Are Multi-view Data and Problem? . . . . . . . . . . . . . . . . . 1
1.2 A Unified Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
vii
viii Contents
Multi-view data generated from various view-points or multiple sensors are com-
monly seen in real-world applications. For example, the popular commercial depth
sensor Kinect uses both visible light and near infrared sensors for depth estimation;
autopilot uses both visual and radar sensors to produce real-time 3D information
on the road; face analysis algorithms prefer face images from different views for
high-fidelity reconstruction and recognition. However, such data with large view
divergence would lead to an enormous challenge: data across various views have a
large divergence preventing them from a fair comparison. Generally, different views
tend to be treated as different domains from different distributions. Thus, there is an
urgent need to mitigate the view divergence when facing specific problems by either
fusing the knowledge across multiple views or adapting knowledge from some views
to others. Since there are different terms regarding “multi-view” data analysis and
its aliasing, we first give a formal definition and narrow down our research focus to
differentiate it from other related works but in different lines.
First, multi-view learning aims to to merge the knowledge from different views
to either uncover common knowledge, or employ the complementary knowledge in
Fig. 1.1 Different scenarios of multi-view data analytics. a Different types of features from single
image; b different sources to represent information; c–e images from different viewpoints
specific views to assist learning tasks. For example, in vision, multiple features
extracted from the same object by various visual descriptors, e.g., LBP, SIFT and
HOG are very discriminant in recognition tasks. Another example is multi-modal data
captured, represented, and stored in varied formats, e.g., near-infrared and visible
face, and image and text. For multi-view learning, the goal is to fuse the knowledge
from multiple views to facilitate common learning tasks, e.g., clustering and classi-
fication. The key challenge is exploring data correspondence across multiple views.
The mappings among different views are able to couple view-specific knowledge
while additional labels would help formulate supervised regularizers. The general
setting of multi-view clustering is to group n data samples in v different views (e.g.,
v types of features, sensors, or modalities) by fusing the knowledge across different
views to seek a consistent clustering result. The general setting of multi-view clas-
sification is that it needs to build a model with given v views of training data. In
the test stage, we would have two different scenarios. First, one view will be used
to recognize other views with the learned model. In this case, the label information
across training and test data is different; Second, specifically for multi-features based
learning, is that v-view training data is used to seek a model by fusing the cross-view
knowledge, which is also used as gallery data to recognize v-view probe data.
Second, domain adaptation attempts to transfer knowledge from labeled source
domains to facilitate the learning burden in the target domains with sparsely or
no labeled samples. For example, in surveillance, faces are captured by long wave
infrared sensor in night-time, but recognition model is trained on regular face images
collected under visible light. Conventional domain adaptation methods consider seek-
ing domain-invariant representation for the data or modifying classifiers to fight off
the marginal or conditional distribution mismatch across source and target domains.
The goal of domain adaptation is to transfer knowledge from well-labeled sources
to unlabeled targets, which accounts for the more general settings that some source
views are labeled while target views are unlabeled. The general setting of domain
adaptation is that we build a model on both labeled source data and unlabeled target
data. Then we use the model to predict the unlabeled target data, either the same
1.1 What Are Multi-view Data and Problem? 3
data in the training stage or different data. Thus, we have corresponding transductive
domain adaptation and inductive domain adaption.
There are different strategies to deal with multi-view data, e.g., translation, fusion,
alignment, co-learning and representation learning. This book will focus on represen-
tation learning and fusion. The following chapters would discuss multi-view data ana-
lytic algorithms along with our proposed unified model Sect. 1.2 from three aspects.
Furthermore, we will discuss the challenging situation where test data are sampled
from unknown categories, e.g., zero-shot learning, and more challenging tasks with
incomplete data, e.g., missing modality transfer learning, incomplete multi-source
adaptation and domain generalization.
v
v
min A ( f i (X i ), f j (X j )) + λ R( f k (X k )),
f 1 (·),..., f v (·)
i=1,i< j k=1
where f i (·) is a feature learning function for view i, either linear, non-linear mapping,
or deep network.
The first common term A (·) is a pairwise symmetric alignment function across
multiple views to either fuse the knowledge among multiple views or transfer knowl-
edge across different views. Due to different problem settings, multi-view learning
and domain adaptation would explore various strategies to define the loss functions.
While multi-view learning employs data correspondence (i.e., sample-wise relation-
ship w/ or w/o labels) to seek common representation, domain adaptation employs
domain- or class-wise relationship during the model learning for discriminant domain
invariant feature.
The second common term R(·) is the feature learning regularizer by incorporat-
ing either the labeled information or the intrinsic structure of the data, or both during
the mapping learning. To name a few, logistic regression, Softmax regression, graph
regularizers are usually incorporated to carry the label and manifold information.
When we turn to deep learning, this term is mostly Softmax regression. For a part of
multi-view learning algorithms, they would merge feature learning regularizer into
the alignment term. Generally, the formulation of the second term is very similar
4 1 Introduction
between multi-view learning and domain adaptation within our research concentra-
tion.
Along the unified model, we will cover both shallow structure learning and deep
learning approaches for multi-view data analysis, e.g., subspace learning, matrix
factorization, low-rank modeling, deep auto-encoder, deep neural networks, deep
convolutional neural networks. For example, multi-view clustering models will be
explored including multi-view matrix factorization, multi-view subspace learning,
multi-view deep structure learning in unsupervised setting.
The rest of this book is organized as follows. The first two parts are for multi-view
data analysis with sample-wise correspondence; and the third part is for multi-view
data analysis with class-wise correspondence.
Part I focuses on developing unsupervised multi-view clustering (MVC) models.
It consists of the following three chapters. Chapter 2 explores complementary infor-
mation across views to benefit the clustering problem and presents a deep matrix
factorization framework for MVC, where semi-nonnegative matrix factorization is
adopted to learn the hierarchical semantics of multi-view data in a layer-wise fashion.
To maximize the mutual information from each view, we enforce the non-negative
representation of each view in the final layer to be the same. Furthermore, to respect
the intrinsic geometric structure in each view data, graph regularizers are introduced
to couple the output representation of deep structures.
Chapter 3 considers an underlying problem hidden behind the emerging multi-
view techniques: What if one/more view data fail? Thus, we propose an unsuper-
vised method which well handles the incomplete multi-view data by transforming
the original and incomplete data to a new and complete representation in a latent
space. Different from the existing efforts that simply project data from each view
into a common subspace, a novel graph Laplacian term with a good probabilistic
interpretation is proposed to couple the incomplete multi-view samples. In such a
way, a compact global structure over the entire heterogeneous data is well preserved,
leading to a strong grouping discriminability.
Chapter 4 presents a multi-view outlier detection algorithm based on clustering
techniques to identify two different types of data outliers with abnormal behaviors.
We first give the definition of both types of outliers in multi-view setting. Then we
propose a multi-view outlier detection method with a novel consensus regularizer
on the latent representations. Specifically, we explicitly characterize each kind of
outliers by the intrinsic cluster assignment labels and sample-specific errors. We
experimentally show that this practice generalizes well when the number of views are
greater than two. Last but the least, we make a thorough discussion on the connection
and difference between the proposed consensus-regularization and the state-of-the-
art pairwise-regularization.
1.3 Organization of the Book 5
Abstract Multi-view Clustering (MVC) has garnered more attention recently since
many real-world data are comprised of different representations or views. The key
is to explore complementary information to benefit the clustering problem. In this
chapter, we consider the conventional complete-view scenario. Specifically, in the
first section, we present a deep matrix factorization framework for MVC, where
semi-nonnegative matrix factorization is adopted to learn the hierarchical semantics
of multi-view data in a layer-wise fashion. In the second section, we make an exten-
sion and consider the different sampled feature sets as multi-view data. We propose
a novel graph-based method, Ensemble Subspace Segmentation under Block-wise
constraints (ESSB), which is jointly formulated in the ensemble learning framework.
2.1.1 Overview
1 Thischapter is reprinted with permission from AAAI. “Multi-view Clustering via Deep Matrix
Factorization”. 31st AAAI Conference on Artificial Intelligence, pp. 2921–2927, 2017.
© Springer Nature Switzerland AG 2019 9
Z. Ding et al., Learning Representation for Multi-View Data Analysis,
Advanced Information and Knowledge Processing,
https://doi.org/10.1007/978-3-030-00734-8_2
10 2 Multi-view Clustering with Complete Information
in developing effective MVC methods (Cai et al. 2013a; Gao et al. 2015; Xu et al.
2016; Zhao et al. 2016). Along this line, Kumar et al. developed co-regularized Multi-
view spectral clustering to do clustering on different views simultaneously with a
co-regularization constraint (Kumar et al. 2011). Gao et al. proposed to perform
clustering on the subspace representation of each view simultaneously guided by a
common cluster structure for the consistence across different views (Gao et al. 2015).
A good survey can be found in Xu et al. (2013).
Recently, lots of research activities on MVC have achieved promising performance
based on Non-negative Matrix Factorization (NMF) and its variants, because the non-
negativity constraints allow for better interpretability (Guan et al. 2012; Trigeorgis
et al. 2014). The general idea is to seek a common latent factor through non-negative
matrix factorization among Multi-view data (Liu et al. 2013; Zhang et al. 2014, 2015).
Semi Non-negative Matrix Factorization (Semi-NMF), as one of the most popular
variants of NMF, was proposed to extend NMF by relaxing the factorized basis matrix
to be real values. This practice allows Semi-NMF to have a wider application in the
real world than NMF. Apart from exploring Semi-NMF in MVC application for the
first time, our method has another distinction from the existing NMF-based MVC
methods: we adopt a deep structure to conduct Semi-NMF hierarchically as shown in
Fig. 2.1. As illustrated, through the deep Semi-NMF structure, we push data samples
from the same class closer layer by layer. We borrow the idea from deep learning
(Bengio 2009), thus this practice has such a flavor. Note that the proposed method
is different from the existing deep auto-encoder based MVC approaches (Andrew
et al. 2013; Wang et al. 2015), though all of us are of deep structure. One major
difference is that Andrew et al. (2013), Wang et al. (2015) are based on Canonical
Correlation Analysis (CCA), which is limited to 2-view case, while our method has
no such limitation.
Fig. 2.1 Framework of our proposed method. Same shape denotes the same class. For demonstra-
tion purposes, we only show the two-view case, where two deep matrix factorization structures
are proposed to capture rich information behind each view in a layer-wise fashion. With the deep
structure, samples from the same class but different views gather close to each other to generate
more discriminative representation
2.1 Deep Multi-view Clustering 11
To sum up, in this section we propose a deep MVC algorithm through graph reg-
ularized semi-nonnegative matrix factorization. The key is to build a deep structure
through semi-nonnegative matrix factorization to seek a common feature represen-
tation with more consistent knowledge to facilitate clustering. To the best of our
knowledge, this is the first attempt applying semi-nonnegative matrix factorization
to MVC in a deep structure. We summarize our major contributions as follows:
• Deep Semi-NMF structure is built to capture the hidden information by leveraging
benefits of strong interpretability from Semi-NMF and effective feature learning
from deep structure. Through this deep matrix factorization structure, we dis-
semble unimportant factors layer by layer and generate an effective consensus
representation in the final layer for MVC.
• To respect the intrinsic geometric relationship among data samples, we introduce
graph regularizers to guide the shared representation learning in each view. This
practice makes the consensus representation in the final layer preserve most shared
structures across multiple graphs. It can be considered as a fusion scheme to boost
the final MVC performance.
where X ∈ Rd×n denotes the input data with n samples, each sample is of d dimen-
sional feature. In the discussion on equivalence of semi-NMF and K-means clustering
(Ding et al. 2010), Z ∈ Rd×K can be considered as the cluster centroid matrix,2 and
H ∈ R K ×n , H ≥ 0 is the “soft” cluster assignment matrix in latent space.3 Similar
to the traditional NMF, the compact representation H uncovers the hidden semantics
by simulating the part-based representation in human brain, i.e., psychological and
physiological interpretation.
While in reality, natural data may contain different modalities (or factors), e.g.,
expression, illumination, pose in face datasets (Samaria and Harter 1994; Georghi-
ades et al. 2001). Single NMF is not strong enough to eliminate the effect of
2 For a neat presentation, we do not follow the notation style in Ding et al. (2010), and remove the
mix-sign notation “±” on X and Z , which does not affect the rigorousness.
3 In some literatures (Ding et al. 2010; Zhao et al. 2015), Semi-NMF is also called the soft version
of K-means clustering.
12 2 Multi-view Clustering with Complete Information
those undesirable factors and extract the intrinsic class information. To solve this,
Trigeorgis et al. (2014) showed that a deep model based on Semi-NMF has a promis-
ing result in data representation. The multi-layer decomposition process can be
expressed as
X ≈ Z 1 H1+
X ≈ Z 1 Z 2 H2+
.. (2.2)
.
X ≈ Z 1 . . . Z m Hm+
where Z i denotes the ith layer basis matrix, Hi+ is the ith layer representation matrix.
Trigeorgis et al. (2014) proved that each hidden representations layer is able to
identify the different attributes. Inspired by this work, we propose a MVC method
based on deep matrix factorization technique.
In the MVC setting, let us denote X = {X (1) , . . . , X (v) , . . . , X (V ) } as the data
sample set. V represents the number of views. X (v) ∈ Rdv ×n , where dv denotes the
dimensionality of the v-view data and n is the number of data samples. Then we
formulate our model as:
V
min (α (v) )γ X (v) −Z 1(v) Z 2(v) . . . Z m(v) Hm 2F + βtr(Hm L (v) HmT )
Z i(v) , Hi(v) v=1
Hm , α (v) (2.3)
V
s.t. Hi(v) ≥ 0, Hm ≥ 0, α (v) = 1, α (v) ≥ 0,
v=1
where X (v) is the given data for vth view. Z i(v) , i ∈ {1, 2, . . . , m} is the ith layer map-
ping for view v. m is the number of layers. Hm is the consensus latent representation
for all views. α (v) is the weighting coefficient for the vth view. γ is the parameter to
control the weights distribution. L (v) is the graph Laplacian of the graph for view v,
where each graph is constructed in k-nearest neighbor (k-NN) fashion. Theweight
matrix of the graph for view v is A(v) and L (v) = A(v) − D (v) , where Dii(v) = j Ai(v) j
(He and Niyogi 2003; Ding and Fu 2016).
Remark 1 Due to the homology of Multi-view data, the final layer representation
Hm(v) for vth view data should be close to each other. Here, we use the consensus
Hm as a constraint to enforce Multi-view data to share the same representation after
multi-layer factorization.
2.1.2.2 Optimization
To expedite the approximation of the variables in the proposed model, each of the
layers is pre-trained to have an initial approximation of variables Z i(v) and Hi(v)
for the ith layer in vth view. The effectiveness of pre-training has been proven
before Hinton and Salakhutdinov (2006) on deep autoencoder networks. Similar
to Trigeorgis et al. (2014), we decompose the input data matrix X (v) ≈ Z 1(v) H1(v)
to perform the pre-training, where Z 1(v) ∈ Rdv × p1 and H1(v) ∈ R p1 ×n . Then the vth
view feature matrix H1(v) is decomposed as H1(v) ≈ Z 2(v) H2(v) , where Z 2(v) ∈ R p1 × p2
and H2(v) ∈ R p2 ×n . p1 and p2 are the dimensionalities for layer 1 and layer 2,
respectively.4 Continue to do so until we have pre-trained all layers. Follow-
ing this, the weights of each layer is fine-tuned by alternating minimizations of
the proposed objective function Eq. (2.3). First, we denote the cost function as
V
C = (α (v) )γ X (v) − Z 1(v) Z 2(v) . . . Z m(v) Hm 2F + βtr(Hm L (v) HmT ) .
v=1
Update rule for weight matrix Z(v) i . We minimize the objective value with
(v)
respect to Z i by fixing the rest of variables in vth view for the ith layer. By setting
∂C /∂ Z i(v) = 0, we give the solutions as
where [M]pos denotes a matrix that all the negative elements are replaced by 0.
Similarly, [M]neg denotes one that has all the positive elements replaced by 0. That is,
Update rule for weight matrix Hm (i.e., Hi(v) (i = m)). Since Hm involves the
graph term, the updating rule and convergence property have never been investigated
4 For the ease of presentation, we denote the dimensionalities (layer size) from layer 1 to layer m
as [ p1 . . . pm ] in the experiments.
14 2 Multi-view Clustering with Complete Information
before. We give the updating rule first, followed by the proof of its convergence
property.
[Φ T X (v) ]pos +[Φ T Φ Hm ]neg +Gu (Hm , A)
Hm =Hm (2.7)
[Φ T X (v) ]neg +[Φ T Φ Hm ]pos +Gd (Hm , A)
where Gu (Hm , A) = β([Hm A(v) ]pos + [Hm D (v) ]neg ) and Gd (Hm , A) = β([Hm
A(v) ]neg + [Hm D (v) ]pos ).
Theorem 2.1 The limited solution of the update rule in Eq. (2.7) satisfies the KKT
condition.
V
L (Hm ) = (α (v) )γ X (v) − Z 1(v) Z 2(v) . . . Z m(v) Hm 2F
v=1
(2.8)
(v)
+ βtr(Hm L HmT ) − ηHm ,
This is a fixed point equation that the solution must satisfy at convergence.
The limiting solution of Eq. (2.7) satisfies the fixed point equation. At conver-
gence, Hm(∞) = Hm(t+1) = Hm(t) = Hm , i.e.,
Equation (2.11) is identical to Eq. (2.9). Both equations require that at least one of
the two factors is equal to zero. The first factors in both equations are identical. For
the second factor (Hm )kl or (Hm2 )kl , if (Hm )kl = 0 then (Hm2 )kl = 0, and vice versa.
Therefore if Eq. (2.9) holds, Eq. (2.11) also holds and vice versa.
2.1 Deep Multi-view Clustering 15
Update rule for weight α (v) . Similar to (Cai et al. 2013b), for the ease of rep-
resentation, let us denote R (v) = X (v) − Z 1(v) Z 2(v) . . . Z m(v) Hm 2F + βtr(Hm L (v) HmT ).
The objective in Eq. (2.3) with respect to α (v) is written as
V
V
min (α (v) )γ R (v) , s.t. α (v) = 1, α (v) ≥ 0. (2.12)
α (v)
v=1 v=1
V V
min (α (v) )γ R (v) − λ( α (v) − 1), (2.13)
α (v)
v=1 v=1
where λ is the Lagrange multiplier. By taking the derivative of Eq. (2.13) with respect
to α(v), and setting it to zero, we obtain
1
λ γ −1
α (v) = . (2.14)
γ R (v)
V
Then we replace α (v) in Eq. (2.14) into α (v) = 1, and obtain
v=1
1
(v) γ R (v) 1−γ
α = V .
(v)
1−γ
1 (2.15)
γR
v=1
It is interesting to see that with only one parameter γ , we could control the different
weights for different views. When γ approaches ∞, we get equal weights. When γ
is close to 1, the weight of the view whose R (v) value is the smallest is assigned to
1, and the others are assigned to 0.
Until now, we have all the update rules done. We repeat the updates iteratively
until convergence. The entire algorithm is outlined in Algorithm 2.1. After obtaining
the optimized Hm , standard spectral clustering (Ng et al. 2001) is performed on the
graph built on Hm via k-NN algorithm.
Our deep matrix factorization model is composed of two stages, i.e., pre-training and
fine-tuning, so we analyze them separately. To simplify the analysis, we assume the
dimensions in all the layers (i.e., layer size) are the same, denoting p. The original
feature dimensions for all the views are the same, denoting d. V is the number of
views. m is the number of layers.
16 2 Multi-view Clustering with Complete Information
In pre-training stage, the Semi-NMF process and graph construction are the time
consuming
parts. The complexity is of order O V mt p (dnp + np 2 + pd 2 + pn 2 +
dn ) , where t p is the number of iterations to achieve convergence in Semi-NMF
2
optimization
process. Normally,
p < d, thus the computational cost is T pr e. =
O V mt p (dnp + pd 2 + dn 2 ) for the pre-training stage. Similarly, in the fine-tuning
stage, the time complexity is of order T f ine. = O V mt f (dnp + pd 2 + pn 2 ) , where
t f is the number of iterations in this fine-tuning stage. To sum up, the overall com-
putational cost is Ttotal = T pr e. + T f ine. .
For these datasets, we follow the preprocessing strategy (Cao et al. 2015). Firstly
all the images are resized into 48 × 48 and then three kinds of features are extracted,
i.e., intensity, LBP (Ahonen et al. 2006) and Gabor (Feichtinger and Strohmer 1998).
Specifically, LBP is a 59-dimension histogram over 9 × 10 pixel patches generated
from cropped images. The scale parameter λ in Gabor wavelets is fixed as 4 at four
orientations θ = {0◦ , 45◦ , 90◦ , 135◦ } with a cropped image of size 25 × 30 pixels.
For the comparison baselines, we have the following. (1) BestSV performs stan-
dard spectral clustering (Ng et al. 2001) on the features in each view. We report the best
performance. (2) ConcatFea concatenates all the features, and then performs stan-
dard spectral clustering. (3) ConcatPCA concatenates all the features, then projects
the original features into a low-dimensional subspace via PCA. Spectral clustering
is applied on the projected feature representation. (4) Co-Reg (SPC) (Kumar et al.
2011) co-regularizes the clustering hypotheses to enforce the memberships from
different views admit with each other. (5) Co-Training (SPC) (Kumar and Daume
III 2011) borrows the idea of co-training strategy to alternatively modify the graph
structure of each view using other views’ information. (6) Min-D(isagreement) (de
Sa 2005) builds a bipartite graph which derives from the “minimizing-disagreement”
idea. (7) MultiNMF (Liu et al. 2013) applies NMF to project each view data to the
common latent subspace. This method can be roughly considered as one-layer ver-
sion of our proposed method. (8) NaMSC (Cao et al. 2015) firstly applies (Hu et
al. 2014) to each view data, then combines the learned representations and feeds to
the spectral clustering. (9) DiMSC (Cao et al. 2015) investigates the complementary
information of representations of Multi-view data by introducing a diversity term.
This work is also one of the most recent approaches in MVC. We do not make the
comparison with deep auto-encoder based methods (Andrew et al. 2013, Wang et
al. 2015), because these CCA-based methods cannot fully utilize more than 2 view
data, leading to an unfair comparison.
To make a comprehensive evaluation, we use six different evaluation metrics
including normalized mutual information (NMI), accuracy (ACC), adjusted
rand index (AR), F-score, Precision and Recall. For details about the metrics,
readers could refer to Kumar and Daume III (2011), Cao et al. (2015). For all the
metrics, higher value denotes better performance. Different measurements favor dif-
ferent properties, thus a comprehensive view can be acquired from the diverse results.
For each experiment, we repeat 10 times and report the mean values along with stan-
dard deviations.
2.1.3.1 Result
Tables 2.1 and 2.2 tabulate the results on datasets Yale and Extended YaleB. Our
method outperforms all the other competitors. For the dataset Yale, we raise the
performance bar by around 7.57% in NMI, 5.08% in ACC, 8.22% in AR, 6.56% in
F-score, 10.13% in Precision and 4.61% in Recall. On average, we improve the state-
of-the-art DiMSC by more than 7%. The possible reason why our method improves
a lot is that both image data in Yale and Extended YaleB contain multiple factors, i.e.,
18 2 Multi-view Clustering with Complete Information
Table 2.1 Results of 6 different metrics (mean ± standard deviation) on dataset Yale
Method NMI ACC AR F-score Precision Recall
BestSV 0.654 ± 0.616 ± 0.440 ± 0.475 ± 0.457 ± 0.495 ±
0.009 0.030 0.011 0.011 0.011 0.010
ConcatFea 0.641 ± 0.544 ± 0.392 ± 0.431 ± 0.415 ± 0.448 ±
0.006 0.038 0.009 0.008 0.007 0.008
ConcatPCA 0.665 ± 0.578 ± 0.396 ± 0.434 ± 0.419 ± 0.450 ±
0.037 0.038 0.011 0.011 0.012 0.009
Co-Reg 0.648 ± 0.564 ± 0.436 ± 0.466 ± 0.455 ± 0.491 ±
0.002 0.000 0.002 0.000 0.004 0.003
Co-Train 0.672 ± 0.630 ± 0.452 ± 0.487 ± 0.470 ± 0.505 ±
0.006 0.001 0.010 0.009 0.010 0.007
Min-D 0.645 ± 0.615 ± 0.433 ± 0.470 ± 0.446 ± 0.496 ±
0.005 0.043 0.006 0.006 0.005 0.006
MultiNMF 0.690 ± 0.673 ± 0.495 ± 0.527 ± 0.512 ± 0.543 ±
0.001 0.001 0.001 0.000 0.000 0.000
NaMSC 0.671 ± 0.636 ± 0.475 ± 0.508 ± 0.492 ± 0.524 ±
0.011 0.000 0.004 0.007 0.003 0.004
DiMSC 0.727 ± 0.709 ± 0.535 ± 0.564 ± 0.543 ± 0.586 ±
0.010 0.003 0.001 0.002 0.001 0.003
Ours 0.782 ± 0.745 ± 0.579 ± 0.601 ± 0.598 ± 0.613 ±
0.010 0.011 0.002 0.002 0.001 0.002
pose, expression, illumination, etc. The existing MVC methods only involve one layer
of representation, e.g., one layer factor decomposition in MultiNMF or the practice of
self-representation (i.e., coefficient matrix Z in NaMSC and DiMSC Cao et al. 2015).
However, our proposed approach can extract the meaningful representation layer by
layer. Through the deep representation, we eliminate the influence of undesirable
factors, and keep the core information (i.e., class/id information) in the final layer.
Table 2.3 lists the performance on video data Notting-Hill. This dataset is more
challenging than the previous two image datasets, since the illumination conditions
vary dramatically and the source of lighting is arbitrary. Moreover, there is no fixed
expression pattern in the Notting-Hill movie, on the contrary to datasets Yale and
Extended YaleB. We observe from the tables that our method reports the superior
results in five metrics. The only outlier is NMI, but our performance is slightly
worse than DiMSC by only 0.25%. Therefore, we safely draw the conclusion that our
proposed method generally achieves better clustering performance in the challenging
video dataset Notting-Hill.
2.1.3.2 Analysis
In this subsection, the robustness and stability of the proposed model is evaluated.
The convergence property is firstly studied in terms of objective value and NMI
2.1 Deep Multi-view Clustering 19
Table 2.2 Results of 6 different metrics (mean ± standard deviation) on dataset Extended YaleB
Method NMI ACC AR F-score Precision Recall
BestSV 0.360 ± 0.366 ± 0.225 ± 0.303 ± 0.296 ± 0.310 ±
0.016 0.059 0.018 0.011 0.010 0.012
ConcatFea 0.147 ± 0.224 ± 0.064 ± 0.159 ± 0.155 ± 0.162 ±
0.005 0.012 0.003 0.002 0.002 0.002
ConcatPCA 0.152 ± 0.232 ± 0.069 ± 0.161 ± 0.158 ± 0.164 ±
0.003 0.005 0.002 0.002 0.001 0.002
Co-Reg 0.151 ± 0.224 ± 0.066 ± 0.160 ± 0.157 ± 0.162 ±
0.001 0.000 0.001 0.000 0.001 0.000
Co-Train 0.302 ± 0.186 ± 0.043 ± 0.140 ± 0.137 ± 0.143 ±
0.007 0.001 0.001 0.001 0.001 0.002
Min-D 0.186 ± 0.242 ± 0.088 ± 0.181 ± 0.174 ± 0.189 ±
0.003 0.018 0.001 0.001 0.001 0.002
MultiNMF 0.377 ± 0.428 ± 0.231 ± 0.329 ± 0.298 ± 0.372 ±
0.006 0.002 0.001 0.001 0.001 0.002
NaMSC 0.594 ± 0.581 ± 0.380 ± 0.446 ± 0.411 ± 0.486 ±
0.004 0.013 0.002 0.004 0.002 0.001
DiMSC 0.635 ± 0.615 ± 0.453 ± 0.504 ± 0.481 ± 0.534 ±
0.002 0.003 0.000 0.006 0.002 0.001
Ours 0.649 ± 0.763 ± 0.512 ± 0.564 ± 0.525 ± 0.610 ±
0.002 0.001 0.002 0.001 0.001 0.001
. . . . . .
The song goes on to tell the sad story of her death while her
“pitying comrades” were carrying her home to die, and ends:—
When I look back into the factory life of fifty or sixty years ago, I
do not see what is called “a class” of young men and women going
to and from their daily work, like so many ants that cannot be
distinguished one from another; I see them as individuals, with
personalities of their own. This one has about her the atmosphere of
her early home. That one is impelled by a strong and noble purpose.
The other,—what she is, has been an influence for good to me and
to all womankind.
Yet they were a class of factory operatives, and were spoken of
(as the same class is spoken of now) as a set of persons who earned
their daily bread, whose condition was fixed, and who must continue
to spin and to weave to the end of their natural existence. Nothing
but this was expected of them, and they were not supposed to be
capable of social or mental improvement. That they could be
educated and developed into something more than mere work-
people, was an idea that had not yet entered the public mind. So
little does one class of persons really know about the thoughts and
aspirations of another! It was the good fortune of these early mill-
girls to teach the people of that time that this sort of labor is not
degrading; that the operative is not only “capable of virtue,” but also
capable of self-cultivation.
At the time the Lowell cotton-mills were started, the factory girl
was the lowest among women. In England, and in France
particularly, great injustice had been done to her real character; she
was represented as subjected to influences that could not fail to
destroy her purity and self-respect. In the eyes of her overseer she
was but a brute, a slave, to be beaten, pinched, and pushed about.
It was to overcome this prejudice that such high wages had been
offered to women that they might be induced to become mill-girls, in
spite of the opprobrium that still clung to this “degrading
occupation.” At first only a few came; for, though tempted by the
high wages to be regularly paid in “cash,” there were many who still
preferred to go on working at some more genteel employment at
seventy-five cents a week and their board.
But in a short time the prejudice against factory labor wore away,
and the Lowell mills became filled with blooming and energetic New
England women. They were naturally intelligent, had mother-wit,
and fell easily into the ways of their new life. They soon began to
associate with those who formed the community in which they had
come to live, and were invited to their houses. They went to the
same church, and sometimes married into some of the best families.
Or if they returned to their secluded homes again, instead of being
looked down upon as “factory girls” by the squire’s or the lawyer’s
family, they were more often welcomed as coming from the
metropolis, bringing new fashions, new books, and new ideas with
them.
In 1831 Lowell was little more than a factory village. Several
corporations were started, and the cotton-mills belonging to them
were building. Help was in great demand; and stories were told all
over the country of the new factory town, and the high wages that
were offered to all classes of work-people,—stories that reached the
ears of mechanics’ and farmers’ sons, and gave new life to lonely
and dependent women in distant towns and farmhouses. Into this
Yankee El Dorado, these needy people began to pour by the various
modes of travel known to those slow old days. The stage-coach and
the canal-boat came every day, always filled with new recruits for
this army of useful people. The mechanic and machinist came, each
with his home-made chest of tools, and oftentimes his wife and little
ones. The widow came with her little flock and her scanty
housekeeping goods to open a boarding-house or variety store, and
so provided a home for her fatherless children. Many farmers’
daughters came to earn money to complete their wedding outfit, or
buy the bride’s share of housekeeping articles.
Women with past histories came, to hide their griefs and their
identity, and to earn an honest living in the “sweat of their brow.”
Single young men came, full of hope and life, to get money for an
education, or to lift the mortgage from the home-farm. Troops of
young girls came by stages and baggage-wagons, men often being
employed to go to other States and to Canada, to collect them at so
much a head, and deliver them at the factories.
A very curious sight these country girls presented to young eyes
accustomed to a more modern style of things. When the large
covered baggage-wagon arrived in front of a block on the
corporation, they would descend from it, dressed in various and
outlandish fashions, and with their arms brimful of bandboxes
containing all their worldly goods. On each of these was sewed a
card, on which one could read the old-fashioned New England name
of the owner. And sorrowful enough they looked, even to the fun-
loving child who has lived to tell the story; for they had all left their
pleasant country homes to try their fortunes in a great
manufacturing town, and they were homesick even before they
landed at the doors of their boarding-houses. Years after, this scene
dwelt in my memory; and whenever anyone said anything about
being homesick, there rose before me the picture of a young girl
with a sorrowful face and a big tear in each eye, clambering down
the steps at the rear of a great covered wagon, holding fast to a
cloth-covered bandbox, drawn up at the top with a string, on which
was sewed a paper bearing the name of Plumy Clay!
Some of these girls brought diminutive hair trunks covered with
the skin of calves, spotted in dun and white, even as when they did
skip and play in daisy-blooming meads. And when several of them
were set together in front of one of the blocks, they looked like their
living counterparts, reposing at noontide in the adjacent field. One of
this kind of trunks has been handed down to me as an heirloom.
The hair is worn off in patches; it cannot be invigorated, and it is
now become a hairless heirloom. Within its hide-bound sides are
safely stowed away the love-letters of a past generation,—love-
letters that agitated the hearts of the grandparents of to-day; and I
wonder that their resistless ardor has not long ago burst its wrinkled
sides. It is relegated to distant attics, with its ancient crony, “ye
bandbox,” to enjoy an honored and well-earned repose.
Ah me! when some of us, its contemporaries, are also past our
usefulness, gone clean out of fashion, may we also be as resigned,
yea, as willing, to be laid quietly on some attic shelf!
These country girls had queer names, which added to the
singularity of their appearance. Samantha, Triphena, Plumy, Kezia,
Aseneth, Elgardy, Leafy, Ruhamah, Lovey, Almaretta, Sarepta, and
Florilla were among them.
Their dialect was also very peculiar. On the broken English and
Scotch of their ancestors was ingrafted the nasal Yankee twang; so
that many of them, when they had just come daown, spoke a
language almost unintelligible. But the severe discipline and ridicule
which met them was as good as a school education, and they were
soon taught the “city way of speaking.”
Their dress was also peculiar, and was of the plainest of
homespun, cut in such an old-fashioned style that each young girl
looked as if she had borrowed her grandmother’s gown. Their only
head-covering was a shawl, which was pinned under the chin; but
after the first pay-day, a “shaker” (or “scooter”) sunbonnet usually
replaced this primitive head-gear of their rural life.
But the early factory girls were not all country girls. There were
others also, who had been taught that “work is no disgrace.” There
were some who came to Lowell solely on account of the social or
literary advantages to be found there. They lived in secluded parts of
New England, where books were scarce, and there was no cultivated
society. They had comfortable homes, and did not perhaps need the
money they would earn; but they longed to see this new “City of
Spindles,” of which they had heard so much from their neighbors
and friends, who had gone there to work.
And the fame of the circulating libraries, that were soon opened,
drew them and kept them there, when no other inducement would
have been sufficient.
The laws relating to women were such, that a husband could
claim his wife wherever he found her, and also the children she was
trying to shield from his influence; and I have seen more than one
poor woman skulk behind her loom or her frame when visitors were
approaching the end of the aisle where she worked. Some of these
were known under assumed names, to prevent their husbands from
trusteeing their wages. It was a very common thing for a male
person of a certain kind to do this, thus depriving his wife of all her
wages, perhaps, month after month. The wages of minor children
could be trusteed, unless the children (being fourteen years of age)
were given their time. Women’s wages were also trusteed for the
debts of their husbands, and children’s for the debts of their parents.
As an instance, my mother had some financial difficulties when I
was fifteen years old, and to save herself and me from annoyance,
she gave me my time. The document reads as follows:—
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com