Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long pdf download
Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long pdf download
https://ebookgate.com/product/relational-data-clustering-models-
algorithms-and-applications-1st-edition-bo-long/
https://ebookgate.com/product/data-clustering-algorithms-and-
applications-1st-edition-charu-c-aggarwal/
ebookgate.com
https://ebookgate.com/product/co-clustering-models-algorithms-and-
applications-1st-edition-gerard-govaert/
ebookgate.com
https://ebookgate.com/product/factorization-models-for-multi-
relational-data-1st-edition-lucas-drumond/
ebookgate.com
https://ebookgate.com/product/network-and-discrete-location-models-
algorithms-and-applications-2ed-edition-daskin/
ebookgate.com
Data Structures Algorithms And Applications In C 2nd
Edition Sartaj Sahni
https://ebookgate.com/product/data-structures-algorithms-and-
applications-in-c-2nd-edition-sartaj-sahni/
ebookgate.com
https://ebookgate.com/product/clustering-for-data-mining-a-data-
recovery-approach-1st-edition-boris-mirkin/
ebookgate.com
https://ebookgate.com/product/introduction-to-clustering-large-and-
high-dimensional-data-1st-edition-jacob-kogan/
ebookgate.com
https://ebookgate.com/product/hierarchical-linear-models-applications-
and-data-analysis-methods-2nd-edition-stephen-w-raudenbush/
ebookgate.com
Relational
Data Clustering
Models, Algorithms,
and Applications
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A
PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS
David Skillicorn
Relational
Data Clustering
Models, Algorithms,
and Applications
Bo Long
Zhongfei Zhang
Philip S. Yu
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
QA76.9.D343R46 2010
005.75’6--dc22 2010009487
To my family
Philip S. Yu
List of Tables xi
Preface xv
1 Introduction 1
1.1 Defining the Area . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Content and the Organization of This Book . . . . . . . 4
1.3 The Audience of This Book . . . . . . . . . . . . . . . . . . . 6
1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 6
I Models 9
2 Co-Clustering 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Model Formulation and Analysis . . . . . . . . . . . . . . . . 13
2.3.1 Block Value Decomposition . . . . . . . . . . . . . . . 13
2.3.2 NBVD Method . . . . . . . . . . . . . . . . . . . . . . 17
vii
II Algorithms 73
8 Co-Clustering 75
8.1 Nonnegative Block Value Decomposition Algorithm . . . . . 75
8.2 Proof of the Correctness of the NBVD Algorithm . . . . . . 78
IV Summary 179
References 185
Index 195
14.1 Data sets details. Each data set is randomly and evenly sampled
from specific newsgroups . . . . . . . . . . . . . . . . . . . . . 142
14.2 Both NBVD and NMF accurately recover the original clusters
in the CLASSIC3 data set . . . . . . . . . . . . . . . . . . . . 144
14.3 A normalized block value matrix on the CLASSIS3 data set . 145
14.4 NBVD extracts the block structure more accurately than NMF
on Multi5 data set . . . . . . . . . . . . . . . . . . . . . . . . 145
14.5 NBVD shows clear improvements on the micro-averaged-
precision values on different newsgroup data sets over other
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
xi
2.1 The original data matrix (b) with a 2 × 2 block structure which
is demonstrated by the permuted data matrix (a). The row-
coefficient matrix R, the block value matrix B, and the column-
coefficient matrix C give a reconstructed matrix (c) to approx-
imate the original data matrix (b). . . . . . . . . . . . . . . . 15
2.2 Illustration of the difference between BVD and SVD. . . . . . 16
3.1 A bipartite graph (a) and its relation summary network (b). . 24
3.2 A tripartite graph (a) and its RSN (b) . . . . . . . . . . . . . 26
3.3 The cluster structures of V2 and V3 affect the similarity between
v11 and v12 through the hidden nodes. . . . . . . . . . . . . . 28
4.1 A graph with mixed cluster structures (a) and its cluster pro-
totype graph (b). . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 A graph with virtual nodes. . . . . . . . . . . . . . . . . . . . 31
4.3 A graph with strongly connected clusters (a) and its cluster
prototype graph (b); the graph affinity matrices for (a) and
(b), (c) and (d), respectively. . . . . . . . . . . . . . . . . . . 34
14.1 The coefficient of the variance for the columns of the mean
block value matrix with the varing number of the word clusters
using NBVD on different NG20 data sets. . . . . . . . . . . . 143
xiii
The world we live today is full of data with relations—the Internet, the social
network, the telecommunications, the customer shopping patterns, as well as
the micro-array data in bioinformatics research to just name a few examples,
resulting in an active research area called relational data mining in the data
mining research field. Given the fact that in many real-world applications
we do not have the luxury to have any training data or it would become
extremely expensive to obtain training data, relational data clustering has
recently caught substantial attention from the related research communities
and has thus emerged as a new and hot research topic in the area of relational
data mining. This book is the very first monograph on the topic of relational
data clustering written in a self-contained format. This book addresses both
the fundamentals and the applications of relational data clustering, includ-
ing the theoretic models, algorithms, as well as the exemplar applications of
applying these models and algorithms to solve for real-world problems.
The authors of this book have been actively working on the topic of rela-
tional data clustering for years, and this book is the final culmination of their
years of long research on this topic. This book may be used as a collection of
research notes for researchers interested in the research on this topic, a refer-
ence book for practitioners or engineers, as well as a textbook for a graduate
advanced seminar on the topic of relational data clustering. This book may
also be used for an introductory course for graduate students or advanced
undergraduate seniors. The references collected in this book may be used as
further reading lists or references for the readers.
Due to the extensive attention received on this topic in the literature, and
also due to the rapid development in the literature on this topic in recent years,
it is by no means meant to be exhaustive to collect complete information on
relational data clustering. We intend to collect the most recent research of our
own on this topic in this book. For those who have already been in the area of
relational data mining or who already know what this area is about, this book
serves the purpose of a formal and systematic collection of part of the most
recent advances of the research on this topic. For those who are beginners to
the area of relational data mining, this book serves the purpose of a formal
and systematic introduction to relational data clustering.
It is not possible for us to accomplish this book without the great support
from a large group of people and organizations. In particular, we would like to
thank the publisher—Taylor & Francis/CRC Press for giving us the opportu-
nity to complete this book for the readers as one of the books in the Chapman
xv
& Hall/CRC Data Mining and Knowledge Discovery series, with Prof Vipin
Kumar at the University of Minnesota serving as the series editor. We would
like to thank this book’s editor of Taylor & Francis Group, Randi Cohen, for
her enthusiastic and patient support, effort, and advice; the project coordi-
nator of Taylor & Francis Group, Amber Donley, and the anonymous proof
reader for their meticulous effort in correcting typos and other errors of the
draft of the book; and Shashi Kumar of Glyph International for his prompt
technical support in formatting the book. We would like to thank Prof Ji-
awei Han at University of Illinois at Urbana-Champaign and Prof Jieping Ye
at Arizona State University as well as another anonymous reviewer for their
painstaking effort to review the book and their valuable comments to sub-
stantially improve the quality of this book. While this book is derived from
the original contributions by the authors of the book, part of the materials
of this book are also jointly contributed by their colleagues Xiaoyun Wu at
Google Research Labs and Tianbing Xu at SUNY Binghamton. This book
project is supported in part by the National Science Foundation under grant
IIS-0812114, managed by the program manager, Dr. Maria Zemankova. Any
opinions, findings, and conclusions or recommendations expressed in this ma-
terial are those of the authors and do not necessarily reflect the views of the
National Science Foundation.
Finally, we would like to thank our families for the love and support that
are essential for us to complete this book.
Models
• Text analysis. To learn the document clusters and word clusters from
the bi-type relational data, document-word data.
• Recommendation system. Movie recommendation based on user clusters
(communities) and movie clusters learned from relational data involving
users, movies, and actors/actresses.
• Online advertisement. Based on the relational data, in which advertiser,
bidded terms, and words are interrelated to each other, the clusters of
advertisers and bidder terms can be learned for bidded term suggestion.
• BBioinformatics. Automatically identifying gene groups (clusters) from
the relational data of genes, conditions, and annotation words.
Homogeneous Co-clustering
relational
clustering Heterogeneous Multipleview clustering
relational clustering
General relational
data clustering
FIGURE 1.1: Relationships among the different areas of relational data clus-
tering.
tering, which has great potential to incorporate time effects into relational
data clustering. Evolutionary clustering is a relatively new research topic in
data mining. Evolutionary clustering refers to the scenario where a collection
of data evolves over the time; at each time, the collection of the data has a
number of clusters; when the collection of the data evolves from one time to
another, new data items may join the collection and existing data items may
disappear; similarly, new clusters may appear and at the same time existing
clusters may disappear. Consequently, both the data items and the clusters
of the collection may change over the time, which poses a great challenge
to the problem of evolutionary clustering in comparison with the traditional
clustering. In this book, we introduce the evolutionary clustering models and
algorithms based on Dirichlet processes.
for further readings, may be found in the literature of the two parent areas.
In data mining area, related work may be found in the premier conferences
such as ACM International Conference on Knowledge Discovery and Data
Mining (ACM KDD), IEEE International Conference on Data Mining (IEEE
ICDM), and SIAM International Conference on Data Mining (SDM). In par-
ticular, related work may be found in the workshop dedicated to the area of
relational learning, such as Statistical Relational Learning workshop. For jour-
nals, the premier journals in the data mining area may contain related work
in relational data clustering, including IEEE Transactions on Knowledge and
Data Engineering (IEEE TKDE), ACM Transactions on Data Mining (ACM
TDM), and Knowledge and Information Systems (KAIS).
In the machine learning area, related work may be found in the premier
conferences such as International Conference on Machine Learning (ICML),
Neural Information Processing Systems (NIPS), European Conference on Ma-
chine Learning (ECML), European Conference on Principles and Practice
of Knowledge Discovery in Databases (PKDD), International Joint Confer-
ence on Artificial Intelligence (IJCAI), and Conference on Learning Theory
(COLT). For journals, the premier journals in machine learning area may con-
tain related work in relational data clustering, including Journal of Machine
Learning Research (JMLR) and Machine Learning Journal (MLJ).
A bi-type heterogeneous relational data set consists of two types of data ob-
jects with heterogeneous relations between them. Bi-type heterogenous rela-
tional data are a very important special case of heterogeneous relational data,
since they arise frequently in various important applications. In bi-type het-
erogeneous relational data clustering, we are interested in clustering two types
of data objects simultaneously. This is also known as co-clustering in the liter-
ature. In this chapter, we present a new co-clustering framework, Block Value
Decomposition (BVD), for bi-type heterogeneous relational data, which fac-
torizes the relational data matrix into three components: the row-coefficient
matrix R, the block value matrix B, and the column-coefficient matrix C.
2.1 Introduction
In many applications, such as document clustering, collaborative filtering,
and micro-array analysis, the bi-type heterogeneous relational data can be
formulated as a two-dimensional matrix representing a set of dyadic data.
Dyadic data refer to a domain with two finite sets of objects in which ob-
servations are made for dyads, i.e., pairs with one element from either set.
For the dyadic data in these applications, co-clustering both dimensions of
the data matrix simultaneously is often more desirable than traditional one-
way clustering. This is due to the fact that co-clustering takes the benefit
of exploiting the duality between rows and columns to effectively deal with
the high-dimensional and sparse data that are typical in many applications.
Moreover, there is an additional benefit for co-clustering to provide both row
clusters and column clusters at the same time. For example, we may be in-
terested in simultaneously clustering genes and experimental conditions in
bioinformatics applications [29, 31], simultaneously clustering documents and
words in text mining [44], and simultaneously clustering users and movies in
collaborative filtering.
In this chapter, we propose a new co-clustering framework called Block
Value Decomposition (BVD). The key idea is that the latent block structure
in a two-dimensional dyadic data matrix can be explored by its triple de-
11
composition. The dyadic data matrix is factorized into three components: the
row-coefficient matrix R, the block value matrix B, and the column-coefficient
matrix C. The coefficients denote the degrees of the rows and columns associ-
ated with their clusters, and the block value matrix is an explicit and compact
representation of the hidden block structure of the data matrix.
Under this framework, we develop a specific novel co-clustering algorithm
for a special yet very popular case—nonnegative dyadic data that iteratively
computes the three decomposition matrices based on the multiplicative up-
dating rules derived from an objective criterion. By intertwining the row clus-
terings and the column clusterings at each iteration, the algorithm performs
an implicitly adaptive dimensionality reduction, which works well for typi-
cal high-dimensional and sparse data in many data mining applications. We
have proven the correctness of the algorithm by showing that the algorithm
is guaranteed to converge and have conducted extensive experimental evalua-
tions to demonstrate the effectiveness and potential of the framework and the
algorithm. As compared with the existing co-clustering methods in the liter-
ature, the BVD framework as well as the specific algorithm offers an extra
capability: it gives an explicit and compact representation of the hidden block
structures in the original data which helps understand the interpretability of
the data. For example, the block value matrix may be used to interpret the
explicit relationship or association between the document clusters and word
clusters in a document-word co-clustering.
in which the observations are made for dyads(x, y). Usually a dyad is a scalar
value w(x, y), e.g., the frequency of co-occurrence, or the strength of prefer-
ence/association/expression level. For the scalar dyads, the data can always
be organized as an n-by-m two-dimensional matrix Z by mapping the row
indices into X and the column indices into Y. Then, each w(x, y) corresponds
to one element of Z.
We are interested in simultaneously clustering X into k disjoint clusters and
Y into l disjoint clusters. Let the k clusters of X be written as: {x̂1 , . . . , x̂k },
and the l clusters of Y be written as: {ŷ1 , . . . , ŷl }. In other words, we are
interested in finding mappings CX and CY ,
CX : {x1 , . . . , xn } → {x̂1 , . . . , x̂k }
CY : {y1 , . . . , yn } → {ŷ1 , . . . , ŷl }
This is equivalent to finding the block structures of the matrix Z, i.e., finding
the k × l submatrices of Z such that the elements within each submatrix are
similar to each other and elements from different submatrices are dissimilar
to each other. This equivalence relation can be illustrated by the procedure
below.
Suppose that we are given the cluster labels of rows and columns. Let us
permute the rows of Z such that the rows within the same cluster are arranged
together and the columns within the same cluster are arranged together. Con-
sequently, we have discovered the hidden block structure from the permuted
data matrix. On the other hand, if we are given the data matrix with block
structure, it is trivial to derive the clustering of rows and columns. The original
data matrix and the permuted data matrix in Figure 2.1 give an illustrative
example.
Since the elements within each block are similar to each other, we expect
one center to represent each block. Therefore a k ×l small matrix is considered
as the compact representation for the original data matrix with a k × l block
structure. In the traditional one-way clustering, given the cluster centers and
the weights that denote degrees of observations associated with their clusters,
one can approximate the original data by linear combinations of the cluster
centers. Similarly, we should be able to “reconstruct” the original data matrix
by the linear combinations of the block centers. Based on this observation,
we formulate the problem of co-clustering dyadic data as the optimization
problem of matrix decomposition, i.e., block value decomposition (BVD).
(a) (b)
X X =
(c)
Reconstructed matrix
R B C
FIGURE 2.1: The original data matrix (b) with a 2 × 2 block structure which
is demonstrated by the permuted data matrix (a). The row-coefficient matrix
R, the block value matrix B, and the column-coefficient matrix C give a
reconstructed matrix (c) to approximate the original data matrix (b).
Y-a xis
Y-a xis
X-axis X-axis
We call the elements of B as the block values; B as the block value matrix;
R as the row-coefficient matrix; and C as the column-coefficient matrix . As
is discussed before, B may be considered as a compact representation of Z; R
denotes the degrees of rows associated with their clusters; and C denotes the
degrees of the columns associated with their clusters. We seek to approximate
the original data matrix by the reconstructed matrix, i.e., RBC, as illustrated
in Figure 2.1.
Under the BVD framework, the combinations of the components also have
an intuitive interpretation. RB is the matrix containing the basis for the
column space of Z and BC contains the basis for the row space of Z. For
example, for a word-by-document matrix Z, each column of RB captures a
base topic of a particular document cluster and each row of BC captures a
base topic of a word cluster.
Compared with SVD-based approaches, there are three main differences
between BVD and SVD. First, in BVD, it is natural to consider each row
or column of a data matrix as an additive combination of the block values
since BVD does not allow negative values in R and C. In contrast, since SVD
allows the negative values in each component, there is no intuitive interpre-
tation for the negative combinations. Second, unlike the singular vectors in
SVD, the basis vectors contained in RB and BC are not necessarily orthog-
onal. Although singular vectors in SVD have a statistical interpretation as
the directions of the variance, they typically do not have clear physical inter-
pretations. In contrast, the directions of the basis vectors in BVD have much
more straightforward correspondence to the clusters (Figure 2.2). Third, SVD
is a full rank decomposition whereas BVD is a reduced rank approximation.
Since the clustering task seeks the reduced or compact representation for the
original data, BVD achieves the objective directly, i.e., the final clusters can
be easily derived without additional clustering operations. In summary, com-
pared with SVD or eigenvector-based decomposition, the decomposition from
BVD has an intuitive interpretation, which is necessary for many data mining
applications.
BVD provides a general framework for co-clustering. Depending on differ-
ent data types in different applications, various formulations and algorithms
may be developed under the BVD framework. An interesting observation is
that the data matrices in many important applications are typically nonneg-
ative, such as the co-occurrence tables, the performance/rating matrices and
the proximity matrices. Some other data may be transformed into the non-
negative form, such as the gene expression data. Therefore, in the rest of
the paper, we concentrate on developing a specific novel method under BVD
framework, the nonnegative block value decomposition (NBVD).
have unit L2 norm is desirable, since RB consists of the basis vectors of the
document space. Assuming that RB is normalized to RBV , the cluster labels
for the documents are given by V −1 C instead of C.
where b̄i is the mean of the ith row of B̄ and b̄j is the mean of the jth column
of B̄. It is necessary to define these statistics in order to provide certain useful
information (e.g., to find the optimal number of clusters).
Finally, we compare NBVD with Nonnegative Matrix Factorization (NMF)
[36]. Given a nonnegative data matrix V , NMF seeks to find an approximate
factorization V ≈ W H with non-negative components W and H. Essentially,
NMF concentrates on the one-sided, individual clustering and does not take
the advantage of the duality between the row clustering and the column clus-
tering. In fact, NMF may be considered as a special case of NBVD in the
sense that W H = W IH, where I is an identity matrix. By this formulation,
NMF is a special case of NBVD and it does co-clustering with the additional
restrictions that the number of the row clusters equals to that of the col-
umn clusters and that each row cluster is associated with one column cluster.
Clearly, NBVD is more flexible to exploit the hidden block structure of the
original data matrix than NMF.
In more general cases, heterogeneous relational data consist of more than two
types of data objects. Those multiple-type interrelated data objects formulates
a k-partite heterogeneous relation graph. The research on mining the hidden
structures from a k-partite heterogeneous relation graph is still limited and
preliminary. In this chapter, we propose a general model, the relation summary
network, to find the hidden structures (the local cluster structures and the
global community structures) from a k-partite heterogeneous relation graph.
The model provides a principal framework for unsupervised learning on k-
partite heterogeneous relation graphs of various structures.
3.1 Introduction
Clustering approaches have traditionally focused on the homogeneous data
objects. However, many examples of real-world data involve objects of mul-
tiple types that are related to each other, which naturally form k-partite
heterogeneous relation graphs of heterogeneous types of nodes. For example,
documents, words, and categories in taxonomy mining, as well as Web pages,
search queries, and Web users in a Web search system all form a tripartite
graph; papers, key words, authors, and publication venues in a scientific publi-
cation archive form a quart-partite graph. In such scenarios, using traditional
approaches to cluster each type of objects (nodes) individually may not work
well due to the following reasons.
First, to apply traditional clustering approaches to each type of data ob-
jects individually, the relation information needs to be transformed into fea-
ture vectors for each type of objects. In general, this transformation results
in high-dimensional and sparse feature vectors, since after the transformation
the number of the features for a single type of objects is equal to that of all
the objects which are possibly related to this type of objects. For example, if
we transform the links between Web pages and Web users as well as search
queries into the features for the Web pages, this leads to a huge number of
features with sparse values for each Web page. Second, traditional clustering
21
approaches are unable to tackle the interactions among the cluster structures
of different types of objects, since they cluster data of a single type based
on static features. Note that the interactions could pass along the relations,
i.e., there exists influence propagation in a k-partite heterogeneous relation
graph. Third, in some data mining applications, users are interested not only
in the local cluster structures for each type of objects, but also in the global
community structures involving multi-types of objects. For example, in doc-
ument clustering, in addition to document clusters and word clusters, the
relationship between the document clusters and the word clusters is also use-
ful information. It is difficult to discover such global structures by clustering
each type of objects individually.
An intuitive attempt to mine the hidden structures from k-partite heteroge-
neous relation graphs is applying the existing graph partitioning approaches
to k-partite heterogeneous relation graphs. This idea may work in some spe-
cial and simple situations. However, in general, it is infeasible. First, the graph
partitioning theory focuses on finding the best cuts of a graph under a certain
criterion and it is very difficult to cut different type of relations (links) simul-
taneously to identify different hidden structures for different types of nodes.
Second, by partitioning an entire k-partite heterogeneous relation graph into
m subgraphs, one actually assumes that all different types of nodes have the
same number of clusters m, which in general is not true. Third, by simply
partitioning the entire graph into disjoint subgraphs, the resulting hidden
structures are rough. For example, the clusters of different types of nodes are
restricted to one-to-one associations.
Therefore, mining hidden structures from k-partite heterogeneous relation
graphs has presented a great challenge to traditional clustering approaches. In
this chapter, first we propose a general model, the relation summary network,
to find the hidden structures (the local cluster structures and the global com-
munity structures) from a k-partite heterogeneous relation graph. The basic
idea is to construct a new k-partite heterogeneous relation graph with hid-
den nodes, which “summarize” the link information in the original k-partite
heterogeneous relation graph and make the hidden structures explicit, to ap-
proximate the original graph. The model provides a principal framework for
unsupervised learning on k-partite heterogeneous relation graphs of various
structures. Second, under this model, based on the matrix representation of
a k-partite heterogeneous relation graph we reformulate the graph approxi-
mation as an optimization problem of matrix approximation and derive an
iterative algorithm to find the hidden structures from a k-partite heteroge-
neous relation graph under a broad range of distortion measures. By itera-
tively updating the cluster structures for each type of nodes, the algorithm
takes advantage of the interactions among the cluster structures of different
types of nodes and performs an implicit adaptive feature reduction for each
type of nodes. Experiments on both synthetic and real data sets demonstrate
the promise and effectiveness of the proposed model and algorithm. Third, we
also establish the connections between existing clustering approaches and the
v11 v12 v13 v14 v15 v16 v11 v12 v13 v14 v15 v16
s 11 s12 s13
s 21 s22
(a) (b)
FIGURE 3.1: A bipartite graph (a) and its relation summary network (b).
for clustering heterogeneous web objects, under which a layered structure with
the link information is used to iteratively project and propagate the cluster
results between layers. Similarly, [125] presents an approach named ReCom
to improve the cluster quality of interrelated data objects through an iter-
ative reinforcement clustering process. However, there is no sound objective
function and theoretical proof on the effectiveness of these algorithms. [87] for-
mulates heterogeneous relational data clustering as a collective factorization
on related matrices and derives a spectral algorithm to cluster multi-type in-
terrelated data objects simultaneously. The algorithm iteratively embeds each
type of data objects into low-dimensional spaces and benefits from the interac-
tions among the hidden structures of different types of data objects. Recently,
a general method based on matrix factorization is independently developed
by [115, 116], but had not yet appeared at the time of the write-up
To summarize, unsupervised learning on k-partite heterogeneous relation
graphs has been touched from different perspectives due to its high impact in
various important applications. Yet, systematic research is still limited. This
chapter attempts to derive a theoretically sound general model and algorithm
for unsupervised learning on k-partite heterogeneous relation graphs of various
structures.
3.1b, we redraw the original graph by adding two sets of new nodes (called
hidden nodes), S1 = {s11 , s12 , s13 } and S2 = {s21 , s22 }. Based on the new
graph, the cluster structures for each type of nodes are straightforward; V1 has
three clusters: {v11 , v12 }, {v13 , a14 }, and {v15 , v16 }, and V2 has two clusters,
{v21 , v22 } and {v23 , b24 }. If we look at the subgraph consisting of only the
hidden nodes in Figure 3.1b, we see that it provides a clear skeleton for the
global structure of the whole graph, from which it is clear how the clusters of
different types of nodes are related to each other; for example, cluster s11 is
associated with cluster s21 and cluster s12 is associated with both clusters s21
and s22 . In other words, by introducing the hidden nodes into the original k-
partite heterogeneous relation graph, both the local cluster structures and the
global community structures become explicit. Note that if we apply a graph
partitioning approach to the bipartite graph in Figure 3.1a to find its hidden
structures, no matter how we cut the edges, it is impossible to identify all the
cluster structures correctly.
Based on the above observations, we propose a model, the relation summary
network (RSN), to mine the hidden structures from a k-partite heterogeneous
relation graph. The key idea of RSN is to add a small number of hidden
nodes to the original k-partite heterogeneous relation graph to make the hid-
den structures of the graph explicit. However, given a k-partite heterogeneous
relation graph, we are not interested in an arbitrary relation summary net-
work. To ensure a relation summary network to discover the desirable hidden
structures of the original graph, we must make RSN as “close” as possible to
the original graph. In other words, we aim at an optimal relation summary
network, from which we can reconstruct the original graph as precisely as
possible. Formally, we define an RSN as follows.
In Definition 3.1, the first condition implies that in an RSN, the instance
nodes (the nodes in Vi ) are related to each other only through the hidden
s21 s22
v21 v22 v23 v24
v21 v22 v23 v24
(a) (b)
nodes. Hence, a small number of hidden nodes actually summarize the complex
relations (edges) in the original graph to make the hidden structures explicit.
Since in this study, our focus is to find disjoint clusters for each type of nodes,
the first condition restricts one instance node to be adjacent to only one hidden
node with unit weight; however, it is easy to modify this restriction to extend
the model to other cases of unsupervised learning on k-partite heterogeneous
relation graphs. The second condition implies that if two types of instance
nodes Vi and Vj are (or are not) related to each other in the original graph,
then the corresponding two types of hidden nodes Si and Sj in the RSN are
(or are not) related to each other. For example, Figure 3.2 shows a tripartite
graph and its RSN. In the original graph Figure 3.2a, V1 ∼ V2 and V1 ∼ V3 ,
and hence S1 ∼ S2 and S1 ∼ S3 in its RSN. The third condition states that
the RSN is an optimal approximation to the original graph under a certain
distortion measure.
Next, we need to define the distance between a k-partite heterogeneous
relation graph G and its RSN Gs . Without loss of generality, if Vi ∼ Vj
in G, we assume that edges between Vi and Vj are complete (if there is no
edge between vih and vjl , we can assume an edge with weight of zero or
other special value). Similarly for Si ∼ Sj in Gs . Let e(vih , vjl ) denote the
weight of the edge (vih , vjl ) in G. Similarly let es (sip , sjq ) be the weight of
the edge (sip , sjq ) in Gs . In the RSN, a pair of instance nodes vih and vjl are
connected through a unique path (vih , sip , sjq , vjl ), in which es (vih , sip ) = 1
and es (sjq , vjl ) = 1 according to Definition 3.1. The edge between two hidden
nodes (sip , sjq ) can be considered as the “summary relation” between two sets
of instance nodes, i.e., the instance nodes connecting with sip and the instance
nodes connecting with sjq . Hence, how good Gs approximates G depends on
how good es (sip , sjq ) approximates e(vih , vjl ) for vih and vjl which satisfy
es (vih , sip ) = 1 and es (sjq , vjl ) = 1, respectively. Therefore, we define the
distance between a k-partite heterogeneous relation graph G and its RSN Gs
as follows:
v11 v12
v11 v12
(a) (b)
FIGURE 3.3: The cluster structures of V2 and V3 affect the similarity between
v11 and v12 through the hidden nodes.
of V2 and V3 may be affected in return. In fact, this is the idea of the iterative
algorithm to construct an RSN for a k-partite heterogeneous relation graph,
which we discuss in the next section.
Homogeneous relational data consist of only one type of data objects. In the
literature, a special case of homogeneous relational data clustering has been
studied as the graph partitioning problem. However, the research on the gen-
eral case is still limited. In this chapter, we propose a general model based
on the graph approximation to learn relation-pattern-based cluster structures
from a graph. The model generalizes the traditional graph partitioning ap-
proaches and is applicable to learning the various cluster structures.
4.1 Introduction
Learning clusters from homogeneous relational graphs is an important prob-
lem in these applications, such as Web mining, social network analysis, bioin-
formatics, VLSI design, and task scheduling. In many applications, users are
interested in strongly intra-connected clusters in which the nodes are intra-
cluster close and intercluster loose. Learning this type of the clusters corre-
sponds to finding strongly connected subgraphs from a graph, which has been
studied for decades as a graph partitioning problem [28, 77, 113].
In addition to the strongly intra-connected clusters, other types of the clus-
ters also attract an intensive attention in many important applications. For
example, in Web mining, we are also interested in the clusters of Web pages
that sparsely link to each other but all densely link to the same Web pages [80],
such as a cluster of music “fans” Web pages which share the same taste on
music and are densely linked to the same set of music Web pages but sparsely
linked to each other. Learning this type of the clusters corresponds to finding
dense bipartite subgraphs from a graph, which has been listed as one of the
five algorithmic challenges in Web search engines [64].
The strongly intra-connected clusters and weakly intra-connected clusters
are two basic cluster structures, and various types of clusters can be generated
based on them. For example, a Web cluster could take on different structures
during its development, i.e., in its early stage, it has the form of bipartite
graph, since in this stage the members of the cluster share the same interests
29
2 3
5 6 1 2 5 6
1 4 3 4 7 8
7 8
9 10 11 12
9 10 1314
13 14 15 16 1112 1516
(a) (b)
FIGURE 4.1: A graph with mixed cluster structures (a) and its cluster pro-
totype graph (b).
(linked to the same Web pages) but have not known (linked to) each other;
in the later stage, with members of the cluster start linking to each other, the
cluster becomes a hybrid of the aforementioned two basic cluster structures;
in the final stage it develops into a larger strongly intra-connected cluster.
These various types of clusters can be unified into a general concept,
relation-pattern-based cluster. A relation-pattern-based cluster is a group of
nodes which have the similar relation patterns, i.e., the nodes within a cluster
relate to other nodes in similar ways. Let us have an illustrative example. Fig-
ure 4.1a shows a graph of mixed types of clusters. There are four clusters in
Figure 4.1a: C1 = {v1 , v2 , v3 , v4 }, C2 = {v5 , v6 , v7 , v8 }, C3 = {v9 , v10 , v11 , v12 },
and C4 = {v13 , v14 , v15 , v16 }. Within the strongly intra-connected cluster C1 ,
the nodes have the similar relation patterns, i.e., they all strongly link to the
nodes in C1 (their own cluster) and C3 , and weakly link to the nodes in C2
and C4 ; within the weakly intra-connected cluster C3 , the nodes also have the
similar relation patterns, i.e., they all weakly link to the nodes in C3 (their
own cluster), and C2 , strongly link to the nodes in C1 and C4 ; Similarly for
the nodes in cluster C3 and the nodes in cluster C4 . Note that graph partition-
ing approaches cannot correctly identify the cluster structure of the graph in
Figure 4.1a, since they seek only strongly intra-connected clusters by cutting
a graph into disjoint subgraphs to minimize edge cuts.
In addition to unsupervised cluster learning applications, the concept of
the relation-pattern-based cluster also provides a simple approach for semi-
supervised learning on graphs. In many applications, graphs are very sparse
and there may exist a large number of isolated or nearly isolated nodes which
do not have cluster patterns. However, according to extra supervised infor-
mation (domain knowledge), these nodes may belong to certain clusters. To
incorporate the supervised information, a common approach is to manually
label these nodes. However, for a large graph, manually labeling is labor-
intensive and expensive. Furthermore, to make use of these labels, instead of
supervised learning algorithms, different semi-supervised learning algorithms
core to a full-fledged cluster with the HITS algorithm [79]. [104] proposes a
different approach to extract the emerging clusters by finding all bipartite
graphs instead of finding cores.
In this chapter, we focus on how to divide the nodes of a homogeneous
relational graph into disjoint clusters based on relation patterns.
arg min
∗
||A − A∗ ||2 , (4.1)
A
ebookgate.com