100% found this document useful (3 votes)
18 views

Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long pdf download

The document provides information about various books and resources related to relational data clustering models, algorithms, and applications. It includes links to download eBooks on topics such as data clustering, co-clustering, and hierarchical linear models. Additionally, it outlines the aims and scope of the Data Mining and Knowledge Discovery series published by Taylor and Francis Group.

Uploaded by

maimaavny14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
18 views

Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long pdf download

The document provides information about various books and resources related to relational data clustering models, algorithms, and applications. It includes links to download eBooks on topics such as data clustering, co-clustering, and hierarchical linear models. Additionally, it outlines the aims and scope of the Data Mining and Knowledge Discovery series published by Taylor and Francis Group.

Uploaded by

maimaavny14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Relational Data Clustering Models Algorithms and

Applications 1st Edition Bo Long pdf download

https://ebookgate.com/product/relational-data-clustering-models-
algorithms-and-applications-1st-edition-bo-long/

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Data Clustering Algorithms and Applications 1st Edition


Charu C. Aggarwal

https://ebookgate.com/product/data-clustering-algorithms-and-
applications-1st-edition-charu-c-aggarwal/

ebookgate.com

Co Clustering Models algorithms and applications 1st


Edition Gérard Govaert

https://ebookgate.com/product/co-clustering-models-algorithms-and-
applications-1st-edition-gerard-govaert/

ebookgate.com

Factorization Models for Multi Relational Data 1st Edition


Lucas Drumond

https://ebookgate.com/product/factorization-models-for-multi-
relational-data-1st-edition-lucas-drumond/

ebookgate.com

Network and Discrete Location Models Algorithms and


Applications 2ed. Edition Daskin

https://ebookgate.com/product/network-and-discrete-location-models-
algorithms-and-applications-2ed-edition-daskin/

ebookgate.com
Data Structures Algorithms And Applications In C 2nd
Edition Sartaj Sahni

https://ebookgate.com/product/data-structures-algorithms-and-
applications-in-c-2nd-edition-sartaj-sahni/

ebookgate.com

Clustering for Data Mining A Data Recovery Approach 1st


Edition Boris Mirkin

https://ebookgate.com/product/clustering-for-data-mining-a-data-
recovery-approach-1st-edition-boris-mirkin/

ebookgate.com

War Survival Units and Citizenship A Neo Eliasian


Processual Relational Perspective 1st Edition Lars Bo
Kaspersen
https://ebookgate.com/product/war-survival-units-and-citizenship-a-
neo-eliasian-processual-relational-perspective-1st-edition-lars-bo-
kaspersen/
ebookgate.com

Introduction to Clustering Large and High Dimensional Data


1st Edition Jacob Kogan

https://ebookgate.com/product/introduction-to-clustering-large-and-
high-dimensional-data-1st-edition-jacob-kogan/

ebookgate.com

Hierarchical Linear Models Applications and Data Analysis


Methods 2nd Edition Stephen W. Raudenbush

https://ebookgate.com/product/hierarchical-linear-models-applications-
and-data-analysis-methods-2nd-edition-stephen-w-raudenbush/

ebookgate.com
Relational
Data Clustering
Models, Algorithms,
and Applications

© 2010 by Taylor and Francis Group, LLC

C7261_FM.indd 1 4/5/10 12:49:49 PM


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A

AIMS AND SCOPE


This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS
David Skillicorn

COMPUTATIONAL METHODS OF FEATURE SELECTION


Huan Liu and Hiroshi Motoda

CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY, AND APPLICATIONS


Sugato Basu, Ian Davidson, and Kiri L. Wagstaff

KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND LAW ENFORCEMENT


David Skillicorn

MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO CONCEPTS AND THEORY


Zhongfei Zhang and Ruofei Zhang

NEXT GENERATION OF DATA MINING


Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar

DATA MINING FOR DESIGN AND MARKETING


Yukio Ohsawa and Katsutoshi Yada

THE TOP TEN ALGORITHMS IN DATA MINING


Xindong Wu and Vipin Kumar

GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY, SECOND EDITION


Harvey J. Miller and Jiawei Han

TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS


Ashok N. Srivastava and Mehran Sahami

BIOLOGICAL DATA MINING


Jake Y. Chen and Stefano Lonardi

INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS


Vagelis Hristidis

TEMPORAL DATA MINING


Theophano Mitsa

RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS, AND APPLICATIONS


Bo Long, Zhongfei Zhang, and Philip S. Yu

© 2010 by Taylor and Francis Group, LLC

C7261_FM.indd 2 4/5/10 12:49:50 PM


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

Relational
Data Clustering
Models, Algorithms,
and Applications

Bo Long
Zhongfei Zhang
Philip S. Yu

© 2010 by Taylor and Francis Group, LLC

C7261_FM.indd 3 4/5/10 12:49:50 PM


Chapman & Hall/CRC
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2010 by Taylor and Francis Group, LLC


Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper


10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-4200-7261-7 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Relational data clustering : models, algorithms, and applications / Bo Long, Zhongfei


Zhang, Philip S. Yu.
p. cm. -- (Chapman & Hall/CRC data mining and knowledge discovery series)
Includes bibliographical references and index.
ISBN 978-1-4200-7261-7 (hardcover : alk. paper)
1. Data mining. 2. Cluster analysis. 3. Relational databases. I. Long, Bo. II. Zhang,
Zhongfei. III. Yu, Philip S. IV. Title. V. Series.

QA76.9.D343R46 2010
005.75’6--dc22 2010009487

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com

© 2010 by Taylor and Francis Group, LLC

C7261_FM.indd 4 4/5/10 12:49:50 PM


To my parents, Jingyu Li and Yinghua Long; my sister, Li; my wife Jing;
and my daughter, Helen
Bo Long

To my parents, Yukun Zhang and Ming Song; my sister, Xuefei; and my


sons, Henry and Andrew
Zhongfei (Mark) Zhang

To my family
Philip S. Yu

© 2010 by Taylor and Francis Group, LLC


Contents

List of Tables xi

List of Figures xiii

Preface xv

1 Introduction 1
1.1 Defining the Area . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Content and the Organization of This Book . . . . . . . 4
1.3 The Audience of This Book . . . . . . . . . . . . . . . . . . . 6
1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 6

I Models 9
2 Co-Clustering 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Model Formulation and Analysis . . . . . . . . . . . . . . . . 13
2.3.1 Block Value Decomposition . . . . . . . . . . . . . . . 13
2.3.2 NBVD Method . . . . . . . . . . . . . . . . . . . . . . 17

3 Heterogeneous Relational Data Clustering 21


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Relation Summary Network Model . . . . . . . . . . . . . . . 24

4 Homogeneous Relational Data Clustering 29


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Community Learning by Graph Approximation . . . . . . . . 33

5 General Relational Data Clustering 39


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Mixed Membership Relational Clustering . . . . . . . . . . . 42
5.4 Spectral Relational Clustering . . . . . . . . . . . . . . . . . 45

vii

© 2010 by Taylor and Francis Group, LLC


viii Contents

6 Multiple-View Relational Data Clustering 47


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Background and Model Formulation . . . . . . . . . . . . . . 50
6.3.1 A General Model for Multiple-View Unsupervised
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3.2 Two Specific Models: Multiple-View Clustering and
Multiple-View Spectral Embedding . . . . . . . . . . . 53

7 Evolutionary Data Clustering 57


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Dirichlet Process Mixture Chain (DPChain) . . . . . . . . . 60
7.3.1 DPChain Representation . . . . . . . . . . . . . . . . 61
7.4 HDP Evolutionary Clustering Model (HDP-EVO) . . . . . . 63
7.4.1 HDP-EVO Representation . . . . . . . . . . . . . . . . 63
7.4.2 Two-Level CRP for HDP-EVO . . . . . . . . . . . . . 65
7.5 Infinite Hierarchical Hidden Markov State Model . . . . . . . 66
7.5.1 iH2 MS Representation . . . . . . . . . . . . . . . . . . 67
7.5.2 Extention of iH2 MS . . . . . . . . . . . . . . . . . . . 68
7.5.3 Maximum Likelihood Estimation of HTM . . . . . . . 69
7.6 HDP Incorporated with HTM (HDP-HTM) . . . . . . . . . . 70
7.6.1 Model Representation . . . . . . . . . . . . . . . . . . 70

II Algorithms 73
8 Co-Clustering 75
8.1 Nonnegative Block Value Decomposition Algorithm . . . . . 75
8.2 Proof of the Correctness of the NBVD Algorithm . . . . . . 78

9 Heterogeneous Relational Data Clustering 83


9.1 Relation Summary Network Algorithm . . . . . . . . . . . . 83
9.2 A Unified View to Clustering Approaches . . . . . . . . . . . 90
9.2.1 Bipartite Spectral Graph Partitioning . . . . . . . . . 90
9.2.2 Binary Data Clustering with Feature Reduction . . . . 90
9.2.3 Information-Theoretic Co-Clustering . . . . . . . . . . 91
9.2.4 K-Means Clustering . . . . . . . . . . . . . . . . . . . 92

10 Homogeneous Relational Data Clustering 95


10.1 Hard CLGA Algorithm . . . . . . . . . . . . . . . . . . . . . 95
10.2 Soft CLGA Algorithm . . . . . . . . . . . . . . . . . . . . . . 97
10.3 Balanced CLGA Algorithm . . . . . . . . . . . . . . . . . . . 101

© 2010 by Taylor and Francis Group, LLC


ix

11 General Relational Data Clustering 105


11.1 Mixed Membership Relational Clustering Algorithm . . . . . 105
11.1.1 MMRC with Exponential Families . . . . . . . . . . . 105
11.1.2 Monte Carlo E-Step . . . . . . . . . . . . . . . . . . . 108
11.1.3 M-Step . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.1.4 Hard MMRC Algorithm . . . . . . . . . . . . . . . . 112
11.2 Spectral Relational Clustering Algorithm . . . . . . . . . . . 114
11.3 A Unified View to Clustering . . . . . . . . . . . . . . . . . . 118
11.3.1 Semi-Supervised Clustering . . . . . . . . . . . . . . . 118
11.3.2 Co-Clustering . . . . . . . . . . . . . . . . . . . . . . . 119
11.3.3 Graph Clustering . . . . . . . . . . . . . . . . . . . . 120

12 Multiple-View Relational Data Clustering 123


12.1 Algorithm Derivation . . . . . . . . . . . . . . . . . . . . . . 123
12.1.1 Multiple-View Clustering Algorithm . . . . . . . . . . 124
12.1.2 Multiple-View Spectral Embedding Algorithm . . . . 127
12.2 Extensions and Discussions . . . . . . . . . . . . . . . . . . . 129
12.2.1 Evolutionary Clustering . . . . . . . . . . . . . . . . . 129
12.2.2 Unsupervised Learning with Side Information . . . . 130

13 Evolutionary Data Clustering 133


13.1 DPChain Inference . . . . . . . . . . . . . . . . . . . . . . . 133
13.2 HDP-EVO Inference . . . . . . . . . . . . . . . . . . . . . . . 134
13.3 HDP-HTM Inference . . . . . . . . . . . . . . . . . . . . . . 136

III Applications 139


14 Co-Clustering 141
14.1 Data Sets and Implementation Details . . . . . . . . . . . . . 141
14.2 Evaluation Metricees . . . . . . . . . . . . . . . . . . . . . . 142
14.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 143

15 Heterogeneous Relational Data Clustering 147


15.1 Data Sets and Parameter Setting . . . . . . . . . . . . . . . 147
15.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 150

16 Homogeneous Relational Data Clustering 153


16.1 Data Sets and Parameter Setting . . . . . . . . . . . . . . . 153
16.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 155

17 General Relational Data Clustering 159


17.1 Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . . 159
17.2 Bi-clustering and Tri-Clustering . . . . . . . . . . . . . . . . 161
17.3 A Case Study on Actor-Movie Data . . . . . . . . . . . . . . 163
17.4 Spectral Relational Clustering Applications . . . . . . . . . . 164
17.4.1 Clustering on Bi-Type Relational Data . . . . . . . . . 164

© 2010 by Taylor and Francis Group, LLC


x Contents

17.4.2 Clustering on Tri-Type Relational Data . . . . . . . . 166

18 Multiple-View and Evolutionary Data Clustering 169


18.1 Multiple-View Clustering . . . . . . . . . . . . . . . . . . . . 169
18.1.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . 169
18.1.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . 172
18.2 Multiple-View Spectral Embedding . . . . . . . . . . . . . . 173
18.3 Semi-Supervised Clustering . . . . . . . . . . . . . . . . . . . 174
18.4 Evolutionary Clustering . . . . . . . . . . . . . . . . . . . . . 175

IV Summary 179
References 185

Index 195

© 2010 by Taylor and Francis Group, LLC


List of Tables

4.1 A list of variations of the CLGA model . . . . . . . . . . . . 37

9.1 A list of Bregman divergences and the corresponding convex


functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

14.1 Data sets details. Each data set is randomly and evenly sampled
from specific newsgroups . . . . . . . . . . . . . . . . . . . . . 142
14.2 Both NBVD and NMF accurately recover the original clusters
in the CLASSIC3 data set . . . . . . . . . . . . . . . . . . . . 144
14.3 A normalized block value matrix on the CLASSIS3 data set . 145
14.4 NBVD extracts the block structure more accurately than NMF
on Multi5 data set . . . . . . . . . . . . . . . . . . . . . . . . 145
14.5 NBVD shows clear improvements on the micro-averaged-
precision values on different newsgroup data sets over other
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

15.1 Parameters and distributions for synthetic bipartite graphs . 148


15.2 Subsets of newsgroup data for constructing bipartite graphs . 148
15.3 Parameters and distributions for synthetic tripartite graphs . 149
15.4 Taxonomy structures of two data sets for constructing tripartite
graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
15.5 NMI scores of the algorithms on bipartite graphs . . . . . . . 151
15.6 NMI scores of the algorithms on tripartite graphs . . . . . . . 152

16.1 Summary of the graphs with general clusters . . . . . . . . . 154


16.2 Summary of graphs based on text datasets . . . . . . . . . . 154
16.3 NMI scores on graphs of general clusters . . . . . . . . . . . 155
16.4 NMI scores on graphs of text data . . . . . . . . . . . . . . . 156

17.1 Summary of relational data for Graph Clustering . . . . . . . 160


17.2 Subsets of newsgroup data for bi-type relational data . . . . . 161
17.3 Taxonomy structures of two data sets for constructing tripartite
relational data . . . . . . . . . . . . . . . . . . . . . . . . . . 161
17.4 Two clusters from actor-movie data . . . . . . . . . . . . . . . 163
17.5 NMI comparisons of SRC, NC, and BSGP algorithms . . . . 166
17.6 Taxonomy structures for three data sets . . . . . . . . . . . . 167
17.7 NMI comparisons of SRC, MRK, and CBGC algorithms . . . 168

xi

© 2010 by Taylor and Francis Group, LLC


xii List of Tables

18.1 Distributions and parameters to generate syn2 data . . . . . 170

© 2010 by Taylor and Francis Group, LLC


List of Figures

1.1 Relationships among the different areas of relational data clus-


tering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 The original data matrix (b) with a 2 × 2 block structure which
is demonstrated by the permuted data matrix (a). The row-
coefficient matrix R, the block value matrix B, and the column-
coefficient matrix C give a reconstructed matrix (c) to approx-
imate the original data matrix (b). . . . . . . . . . . . . . . . 15
2.2 Illustration of the difference between BVD and SVD. . . . . . 16

3.1 A bipartite graph (a) and its relation summary network (b). . 24
3.2 A tripartite graph (a) and its RSN (b) . . . . . . . . . . . . . 26
3.3 The cluster structures of V2 and V3 affect the similarity between
v11 and v12 through the hidden nodes. . . . . . . . . . . . . . 28

4.1 A graph with mixed cluster structures (a) and its cluster pro-
totype graph (b). . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 A graph with virtual nodes. . . . . . . . . . . . . . . . . . . . 31
4.3 A graph with strongly connected clusters (a) and its cluster
prototype graph (b); the graph affinity matrices for (a) and
(b), (c) and (d), respectively. . . . . . . . . . . . . . . . . . . 34

5.1 Examples of the structures of relational data. . . . . . . . . . 42

7.1 The DPChain model. . . . . . . . . . . . . . . . . . . . . . . . 61


7.2 The HDP-EVO model. . . . . . . . . . . . . . . . . . . . . . . 64
7.3 The iH2 MS model. . . . . . . . . . . . . . . . . . . . . . . . . 67
7.4 The HDP-HTM model. . . . . . . . . . . . . . . . . . . . . . 71

9.1 An RSN equivalent to k-means. . . . . . . . . . . . . . . . . . 92

13.1 The illustrated example of global and local cluster correspon-


dence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

14.1 The coefficient of the variance for the columns of the mean
block value matrix with the varing number of the word clusters
using NBVD on different NG20 data sets. . . . . . . . . . . . 143

xiii

© 2010 by Taylor and Francis Group, LLC


xiv List of Figures

14.2 Micro-averaged-precision with the varing number of the word


clusters using NBVD on different NG20 data sets. . . . . . . 144

17.1 NMI comparison of SGP, METIS, and MMRC algorithms. . . 160


17.2 NMI comparison among BSGP, RSN, and MMRC algorithms
for bi-type data. . . . . . . . . . . . . . . . . . . . . . . . . . 162
17.3 NMI comparison of CBGC, RSN, and MMRC algorithms for
tri-type data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
17.4 (a), (b), and (c) are document embeddings of multi2 data set
produced by NC, BSGP, and SRC, respectively (u1 and u2
denote first and second eigenvectors, respectively). (d) is an
iteration curve for SRC. . . . . . . . . . . . . . . . . . . . . . 165
17.5 Three pairs of embeddings of documents and categories for
the TM1 data set produced by SRC with different weights:
(12) (23)
(a) and (b) with wa = 1, wa = 1; (c) and (d) with
(12) (23) (12) (23)
wa = 1, wa = 0; (e) and (f) with wa = 0, wa = 1. . . 167

18.1 A toy example that demonstrates that our MVC algorithm is


able to learn the consensus pattern from multiple-views with
noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
18.2 NMI comparison on synthetic data. . . . . . . . . . . . . . . 171
18.3 NMI comparison on real data. . . . . . . . . . . . . . . . . . 172
18.4 Four embeddings for NGv3 data set. . . . . . . . . . . . . . . 174
18.5 NMI comparison on spectral embedding of NGv3 and NGv4. 175
18.6 Semi-supervised clustering results. . . . . . . . . . . . . . . . 176
18.7 Evolutionary clustering results. . . . . . . . . . . . . . . . . . 176

© 2010 by Taylor and Francis Group, LLC


Preface

The world we live today is full of data with relations—the Internet, the social
network, the telecommunications, the customer shopping patterns, as well as
the micro-array data in bioinformatics research to just name a few examples,
resulting in an active research area called relational data mining in the data
mining research field. Given the fact that in many real-world applications
we do not have the luxury to have any training data or it would become
extremely expensive to obtain training data, relational data clustering has
recently caught substantial attention from the related research communities
and has thus emerged as a new and hot research topic in the area of relational
data mining. This book is the very first monograph on the topic of relational
data clustering written in a self-contained format. This book addresses both
the fundamentals and the applications of relational data clustering, includ-
ing the theoretic models, algorithms, as well as the exemplar applications of
applying these models and algorithms to solve for real-world problems.
The authors of this book have been actively working on the topic of rela-
tional data clustering for years, and this book is the final culmination of their
years of long research on this topic. This book may be used as a collection of
research notes for researchers interested in the research on this topic, a refer-
ence book for practitioners or engineers, as well as a textbook for a graduate
advanced seminar on the topic of relational data clustering. This book may
also be used for an introductory course for graduate students or advanced
undergraduate seniors. The references collected in this book may be used as
further reading lists or references for the readers.
Due to the extensive attention received on this topic in the literature, and
also due to the rapid development in the literature on this topic in recent years,
it is by no means meant to be exhaustive to collect complete information on
relational data clustering. We intend to collect the most recent research of our
own on this topic in this book. For those who have already been in the area of
relational data mining or who already know what this area is about, this book
serves the purpose of a formal and systematic collection of part of the most
recent advances of the research on this topic. For those who are beginners to
the area of relational data mining, this book serves the purpose of a formal
and systematic introduction to relational data clustering.
It is not possible for us to accomplish this book without the great support
from a large group of people and organizations. In particular, we would like to
thank the publisher—Taylor & Francis/CRC Press for giving us the opportu-
nity to complete this book for the readers as one of the books in the Chapman

xv

© 2010 by Taylor and Francis Group, LLC


xvi Preface

& Hall/CRC Data Mining and Knowledge Discovery series, with Prof Vipin
Kumar at the University of Minnesota serving as the series editor. We would
like to thank this book’s editor of Taylor & Francis Group, Randi Cohen, for
her enthusiastic and patient support, effort, and advice; the project coordi-
nator of Taylor & Francis Group, Amber Donley, and the anonymous proof
reader for their meticulous effort in correcting typos and other errors of the
draft of the book; and Shashi Kumar of Glyph International for his prompt
technical support in formatting the book. We would like to thank Prof Ji-
awei Han at University of Illinois at Urbana-Champaign and Prof Jieping Ye
at Arizona State University as well as another anonymous reviewer for their
painstaking effort to review the book and their valuable comments to sub-
stantially improve the quality of this book. While this book is derived from
the original contributions by the authors of the book, part of the materials
of this book are also jointly contributed by their colleagues Xiaoyun Wu at
Google Research Labs and Tianbing Xu at SUNY Binghamton. This book
project is supported in part by the National Science Foundation under grant
IIS-0812114, managed by the program manager, Dr. Maria Zemankova. Any
opinions, findings, and conclusions or recommendations expressed in this ma-
terial are those of the authors and do not necessarily reflect the views of the
National Science Foundation.
Finally, we would like to thank our families for the love and support that
are essential for us to complete this book.

© 2010 by Taylor and Francis Group, LLC


Part I

Models

© 2010 by Taylor and Francis Group, LLC


Chapter 1
Introduction

1.1 Defining the Area


Clustering problem is an essential problem of data mining and machine
learning. Cluster analysis is a process that partitions a set of data objects
into clusters in such a way that objects from the same cluster are similar and
objects from different clusters are dissimilar [105].
Most clustering approaches in the literature focus on “flat” data, in which
each data object is represented as a fixed-length attribute vector [105]. How-
ever, many real-world data sets are much richer in structure, involving objects
of multiple types that are related to each other, such as documents and words
in a text corpus; Web pages, search queries, and Web users in a Web search
system; and shops, customers, suppliers, shareholders, and advertisement me-
dia in a marketing system. We refer those data, in which data objects are
related to each other, as relational data.
Relational data have attracted more and more attention due to its phenom-
enal impact in various important applications, such as text analysis, recom-
mendation systems, Web mining, online advertising, bioinformatics, citation
analysis, and epidemiology. Different relational learning problems have been
addressed in different fields of data mining. One of the most important re-
lational learning tasks is to discover hidden groups (clusters) from relational
data, i.e., relational data clustering. The following are examples of relational
data clustering:

• Text analysis. To learn the document clusters and word clusters from
the bi-type relational data, document-word data.
• Recommendation system. Movie recommendation based on user clusters
(communities) and movie clusters learned from relational data involving
users, movies, and actors/actresses.
• Online advertisement. Based on the relational data, in which advertiser,
bidded terms, and words are interrelated to each other, the clusters of
advertisers and bidder terms can be learned for bidded term suggestion.
• BBioinformatics. Automatically identifying gene groups (clusters) from
the relational data of genes, conditions, and annotation words.

© 2010 by Taylor and Francis Group, LLC


2 Introduction

• Research community mining and topic identification. To identify re-


search communities (author clusters) and research topics (paper clus-
ters) from relational data consisting of authors, papers, and key words.

In general, relational data contain three types of information: attributes for


individual objects, homogeneous relations between objects of the same type,
and heterogeneous relations between objects of different types. For example,
for a scientific publication relational data set of papers and authors, the per-
sonal information such as affiliations for authors is attributes; the citation
relations among papers are homogeneous relations; the authorship relations
between papers and authors are heterogeneous relations. Such data violate
the classic IID assumption in machine learning and statistics and present
huge challenges to traditional clustering approaches.
There are two frameworks for the challenging problem of relational data
clustering. One is individual clustering framework; the other is collective clus-
tering framework.
Under the individual clustering framework, we transform relational data
into flat data and then cluster each type of objects individually. An intuitive
way under this framework is to transform all the relations into features and
then to apply traditional clustering algorithms directly. On the other hand,
under the collective clustering framework, we cluster different types of data
objects simultaneously. Compared with the collective clustering framework,
individual clustering framework has the following disadvantages.
First, the transformation causes the loss of relation and structure infor-
mation [48]. Second, traditional clustering approaches are unable to tackle
influence propagation in clustering relational data, i.e., the hidden patterns of
different types of objects could affect each other both directly and indirectly
(pass along relation chains). Third, in some data mining applications, users
are interested not only in the hidden structure for each type of objects, but
also in interaction patterns involving multi-types of objects. For example, in
document clustering, in addition to document clusters and word clusters, the
relationship between document clusters and word clusters is also useful infor-
mation. It is difficult to discover such interaction patterns by clustering each
type of objects individually.
One the other hand, the collective clustering framework has the obvious ad-
vantage of learning both local and global cluster structures based on all three
types of information. In this book, our main focus is on the collective cluster-
ing framework. Under the collective clustering framework, with different foci
on different types of relational data, there are different subfields of relational
data clustering: co-clustering on bi-type heterogeneous relational data, het-
erogeneous relational data clustering on multi-type heterogeneous relational
data, homogeneous relational data clustering on homogeneous relational data,
general relational data clustering on general relational data.
Another interesting observation is that a number of important clustering
problems, which have been of intensive interest in the literature, can be viewed

© 2010 by Taylor and Francis Group, LLC


Defining the Area 3

Homogeneous Co-clustering
relational
clustering Heterogeneous Multipleview clustering
relational clustering

General relational
data clustering

Collective clustering Individual clustering

FIGURE 1.1: Relationships among the different areas of relational data clus-
tering.

as special cases of relational clustering. For example, graph clustering parti-


tioning [28, 75, 113] can be viewed as clustering on singly ype relational data
consisting of only homogeneous relations (represented as a graph affinity ma-
trix) co-clustering [11,44], which arises in important applications such as doc-
ument clustering and micro-array data clustering; under the collective cluster-
ing framework, this can be formulated as clustering on bi-type relational data
consisting of only heterogeneous relations. Recently, semi-supervised cluster-
ing [14, 124] has attracted significant attention, which is a special type of
clustering using both labeled and unlabeled data. In Section 11.3, we show
that semi-supervised clustering can be formulated as clustering on singly type
relational data consisting of attributes and homogeneous relations.
Although this book is mainly focused on the collective clustering frame-
work, it also includes our most recent research on the individual clustering
framework, specifically multiple-view relational data clustering, since in some
applications, when a large number of types of objects in a relational data set
are related to each other in a complicated way, we may want to focus on a
certain type of data objects to reduce the model complexity.
Figure 1.1 shows the relations among different areas of relational data clus-
tering. In summary, as a recently booming area, relational data clustering
arises in a wide range of applications and is also related to a number of im-
portant clustering problems in the literature. Hence, there is a great need for
both practical algorithm derivation and theoretical framework construction
for relational data clustering, which is the main goal of this book.

© 2010 by Taylor and Francis Group, LLC


4 Introduction

1.2 The Content and the Organization of This Book


This book aims at introducing a novel theoretical framework for a new
data mining field, relational data clustering, and a family of new algorithms
for different relational clustering problems arising in a wide range of important
applications.
The organization of this book is as follows. The whole book contains four
parts, introduction, models, algorithms, and applications. The introduction
part defines the area of relational data clustering and to outline what this
book is about; the model part introduces different types of model formulations
for relational data clustering; the algorithm part presents various algorithms
for the corresponding models; the application part shows applications of the
models and algorithms by extensive experimental results. This book focuses
on six topics of relational data clustering.
The first topic is clustering on bi-type heterogeneous relational data, in
which there are heterogeneous relations between the two types of data objects.
For example, a text corpus can be formulated as a bi-type relational data set
of documents and words, in which there exist heterogeneous relations between
documents and words. Bi-type relational data clustering is also known as co-
clustering in the literature. We present a new co-clustering framework, Block
Value Decomposition (BVD), for bi-type heterogeneous relational data, which
factorizes the relational data matrix into three components: the row-coefficient
matrix R, the block value matrix B, and the column-coefficient matrix C. Un-
der this framework, we focus on a special yet very popular case—-nonnegative
relational data, and propose a specific novel co-clustering algorithm that itera-
tively computes the three decomposition matrices based on the multiplicative
updating rules.
The second topic is about a more general case of bi-type heterogeneous
relational data, multi-type heterogeneous relational data that formulate k-
partite graphs with various structures. In fact, many examples of real-world
data involve multiple types of data objects that are related to each other,
which naturally form k-partite graphs of heterogeneous types of data objects.
For example, documents, words, and categories in taxonomy mining, as well
as Web pages, search queries, and Web users in a Web search system all form
a tri-partite graph; papers, key words, authors, and publication venues in a
scientific publication archive form a quart-partite graph. We propose a gen-
eral model, the relation summary network, to find the hidden structures (the
local cluster structures and the global community structures) from a k-partite
heterogeneous relation graph. The model provides a principal framework for
unsupervised learning on k-partite heterogeneous relation graphs of various
structures. Under this model, we derive a novel algorithm to identify the hid-
den structures of a k-partite heterogeneous relation graph by constructing a
relation summary network to approximate the original k-partite heterogeneous

© 2010 by Taylor and Francis Group, LLC


The Content and the Organization of This Book 5

relation graph under a broad range of distortion measures.


For the third topic, this book presents homogeneous relational data clus-
tering. In heterogeneous relational data, we have heterogeneous relations be-
tween different types of data objects. On the other hand, in homogeneous
relational data, there are homogeneous relations between the data objects of
a single type. Homogeneous relational data also arise from important appli-
cations, such as Web mining, social network analysis, bioinformatics, Very
Large-Scale Integration (VLSI) design, and task scheduling. Graph partition-
ing in the literature can be viewed as a special case of homogeneous relational
data clustering. Basically, graph partitioning looks for dense clusters corre-
sponding to strongly intra-connected subgraphs. On the other hand, the goal
of homogeneous relational data clustering is more general and challenging. It
is to identify both dense clusters and sparse clusters. We propose a general
model based on graph approximation to learn relation-pattern-based cluster
structures from a graph. The model generalizes the traditional graph parti-
tioning approaches and is applicable to learning various cluster structures.
Under this model, we derive a family of algorithms which are flexible to learn
various cluster structures and easy to incorporate the prior knowledge of the
cluster structures.
The fourth topic is clustering on the most general case of relational data,
which contain three types of information: attributes for individual objects,
homogeneous relations between objects of the same type, and heterogeneous
relations between objects of different types. How to make use of all three
types of information to cluster multi-type-related objects simultaneously is a
big challenge, since the three types of information have different forms and
very different statistical properties. We propose a probabilistic model for re-
lational clustering, which also provides a principal framework to unify various
important clustering problems, including traditional attributes-based cluster-
ing, semi-supervised clustering, co-clustering, and graph clustering. The pro-
posed model seeks to identify cluster structures for each type of data objects
and interaction patterns between different types of objects. Under this model,
we propose parametric hard and soft relational clustering algorithms under a
large number of exponential family distributions.
The fifth topic is about individual relational clustering framework. On this
topic, we propose a general model for multiple-view unsupervised learning.
The proposed model introduces the concept of mapping function to make
the different patterns from different pattern spaces comparable and hence
an optimal pattern can be learned from the multiple patterns of multiple
representations. Under this model, we formulate two specific models for two
important cases of unsupervised learning: clustering and spectral dimension-
ality reductions; we derive an iterating algorithm for multiple-view clustering,
and a simple algorithm providing a global optimum to multiple spectral di-
mensionality reduction. We also extend the proposed model and algorithms
to evolutionary clustering and unsupervised learning with side information.
The sixth topic is about our most recent research on the evolutionary clus-

© 2010 by Taylor and Francis Group, LLC


6 Introduction

tering, which has great potential to incorporate time effects into relational
data clustering. Evolutionary clustering is a relatively new research topic in
data mining. Evolutionary clustering refers to the scenario where a collection
of data evolves over the time; at each time, the collection of the data has a
number of clusters; when the collection of the data evolves from one time to
another, new data items may join the collection and existing data items may
disappear; similarly, new clusters may appear and at the same time existing
clusters may disappear. Consequently, both the data items and the clusters
of the collection may change over the time, which poses a great challenge
to the problem of evolutionary clustering in comparison with the traditional
clustering. In this book, we introduce the evolutionary clustering models and
algorithms based on Dirichlet processes.

1.3 The Audience of This Book


This book is a monograph on the authors’ recent research in relational data
clustering, the recently emerging area of data mining and machine learning
related to a wide range of applications. Therefore, the expected readership of
this book includes all the researchers and system developing engineers working
in the areas, including but not limited to, data mining, machine learning,
computer vision, multimedia data mining, pattern recognition, statistics, as
well as other application areas that use relational data clustering techniques
such as Web mining, information retrieval, marketing, and bioinformatics.
Since this book is self-contained in the presentations of the materials, this
book also serves as an ideal reference book for people who are interested
in this new area of relational data clustering. Consequently, in addition, the
readership also includes any of those who have this interest or work in a field
which needs this reference book. Finally, this book can be used as a reference
book for a graduate course on advanced topics of data mining and/or machine
learning, as it provides a systematic introduction to this booming new subarea
of data mining and machine learning.

1.4 Further Readings


As a newly emerging area of data mining and machine learning, relational
data clustering is just in its infant stage; currently there is no dedicated,
premier venue for the publications of the research in this area. Consequently,
the related work in this area, as the supplementary information to this book

© 2010 by Taylor and Francis Group, LLC


Further Readings 7

for further readings, may be found in the literature of the two parent areas.
In data mining area, related work may be found in the premier conferences
such as ACM International Conference on Knowledge Discovery and Data
Mining (ACM KDD), IEEE International Conference on Data Mining (IEEE
ICDM), and SIAM International Conference on Data Mining (SDM). In par-
ticular, related work may be found in the workshop dedicated to the area of
relational learning, such as Statistical Relational Learning workshop. For jour-
nals, the premier journals in the data mining area may contain related work
in relational data clustering, including IEEE Transactions on Knowledge and
Data Engineering (IEEE TKDE), ACM Transactions on Data Mining (ACM
TDM), and Knowledge and Information Systems (KAIS).
In the machine learning area, related work may be found in the premier
conferences such as International Conference on Machine Learning (ICML),
Neural Information Processing Systems (NIPS), European Conference on Ma-
chine Learning (ECML), European Conference on Principles and Practice
of Knowledge Discovery in Databases (PKDD), International Joint Confer-
ence on Artificial Intelligence (IJCAI), and Conference on Learning Theory
(COLT). For journals, the premier journals in machine learning area may con-
tain related work in relational data clustering, including Journal of Machine
Learning Research (JMLR) and Machine Learning Journal (MLJ).

© 2010 by Taylor and Francis Group, LLC


Chapter 2
Co-Clustering

A bi-type heterogeneous relational data set consists of two types of data ob-
jects with heterogeneous relations between them. Bi-type heterogenous rela-
tional data are a very important special case of heterogeneous relational data,
since they arise frequently in various important applications. In bi-type het-
erogeneous relational data clustering, we are interested in clustering two types
of data objects simultaneously. This is also known as co-clustering in the liter-
ature. In this chapter, we present a new co-clustering framework, Block Value
Decomposition (BVD), for bi-type heterogeneous relational data, which fac-
torizes the relational data matrix into three components: the row-coefficient
matrix R, the block value matrix B, and the column-coefficient matrix C.

2.1 Introduction
In many applications, such as document clustering, collaborative filtering,
and micro-array analysis, the bi-type heterogeneous relational data can be
formulated as a two-dimensional matrix representing a set of dyadic data.
Dyadic data refer to a domain with two finite sets of objects in which ob-
servations are made for dyads, i.e., pairs with one element from either set.
For the dyadic data in these applications, co-clustering both dimensions of
the data matrix simultaneously is often more desirable than traditional one-
way clustering. This is due to the fact that co-clustering takes the benefit
of exploiting the duality between rows and columns to effectively deal with
the high-dimensional and sparse data that are typical in many applications.
Moreover, there is an additional benefit for co-clustering to provide both row
clusters and column clusters at the same time. For example, we may be in-
terested in simultaneously clustering genes and experimental conditions in
bioinformatics applications [29, 31], simultaneously clustering documents and
words in text mining [44], and simultaneously clustering users and movies in
collaborative filtering.
In this chapter, we propose a new co-clustering framework called Block
Value Decomposition (BVD). The key idea is that the latent block structure
in a two-dimensional dyadic data matrix can be explored by its triple de-

11

© 2010 by Taylor and Francis Group, LLC


12 Co-Clustering

composition. The dyadic data matrix is factorized into three components: the
row-coefficient matrix R, the block value matrix B, and the column-coefficient
matrix C. The coefficients denote the degrees of the rows and columns associ-
ated with their clusters, and the block value matrix is an explicit and compact
representation of the hidden block structure of the data matrix.
Under this framework, we develop a specific novel co-clustering algorithm
for a special yet very popular case—nonnegative dyadic data that iteratively
computes the three decomposition matrices based on the multiplicative up-
dating rules derived from an objective criterion. By intertwining the row clus-
terings and the column clusterings at each iteration, the algorithm performs
an implicitly adaptive dimensionality reduction, which works well for typi-
cal high-dimensional and sparse data in many data mining applications. We
have proven the correctness of the algorithm by showing that the algorithm
is guaranteed to converge and have conducted extensive experimental evalua-
tions to demonstrate the effectiveness and potential of the framework and the
algorithm. As compared with the existing co-clustering methods in the liter-
ature, the BVD framework as well as the specific algorithm offers an extra
capability: it gives an explicit and compact representation of the hidden block
structures in the original data which helps understand the interpretability of
the data. For example, the block value matrix may be used to interpret the
explicit relationship or association between the document clusters and word
clusters in a document-word co-clustering.

2.2 Related Work


This work is primarily related to two main areas: co-clustering in data
mining and matrix decomposition in matrix computation.
Although most of the clustering literature focuses on one-sided clustering
algorithms [5], recently co-clustering has become a topic of extensive interest
due to its applications to many problems such as gene expression data analysis
[29, 31] and text mining [44]. A representative early work of co-clustering
was reported in [71] that identified hierarchical row and column clustering in
matrices by a local greedy splitting procedure. The BVD framework proposed
in this paper is based on the partitioning-based co-clustering formulation first
introduced in [71].
The model-based clustering methods for a two-dimensional data matrix
represent another main direction in co-clustering research. These methods
(e.g., [66, 67]) typically have clear probabilistic interpretation. However, they
are all based on simplistic assumptions on data distributions, such as Gaussian
mixture models. There are no such assumptions in the BVD framework.
Recently, information-theory based co-clustering has attracted intensive at-

© 2010 by Taylor and Francis Group, LLC


Model Formulation and Analysis 13

tention in the literature. The Information Bottleneck (IB) framework [122]


was first introduced for one-sided clustering. Later, an agglomerative hard
clustering version of the IB method was used in [117] to cluster documents
after clustering words. The work in [49] extended the above framework to
repeatedly cluster documents and then words. An efficient algorithm was pre-
sented in [44] that monotonically increases the preserved mutual information
by intertwining both the row and column clusterings at all stages. All these
methods suffer from the fundamental limitation for their applications to a co-
occurrence matrix since they interpret the data matrix as a joint distribution
of two discrete random variables. A more generalized co-clustering framework
was presented in [11] wherein any Bregman divergence can be used in the
objective function, and various conditional expectation based constraints can
be incorporated into the framework.
There have been many research studies that perform clustering based
on Singular Value Decomposition (SVD) or eigenvector-based decomposi-
tion [28, 38, 47, 113]. The latent semantic indexing method (LSI) [38] projects
each data vector into the singular vector space through the SVD, and then
conducts the clustering using traditional data clustering algorithms (such as
k-means) in the transformed space. The spectral clustering methods based on
the graph partitioning theory focus on finding the best cuts of a graph that op-
timize certain predefined criterion functions. The optimization of the criterion
functions usually leads to the computation of singular vectors or eigenvectors
of certain graph affinity matrices. Many criterion functions, such as the aver-
age cut [28], the average association [113], the normalized cut [113], and the
min-max cut [47], have been proposed along with the efficient algorithms for
finding the optimal solutions. Since the computed singular vectors or eigen-
vectors do not correspond directly to individual clusters, the decompositions
from SVD- or eigenvector-based methods are difficult to interpret and to map
to the final clusters; as a result, traditional data clustering methods such as
k-means must be applied in the transformed space.
Recently, another matrix decomposition formulation, Nonnegative Matrix
Factorization (NMF) [36], has been used for clustering [133]. NMF has the
intuitive interpretation for the result. However, it focuses on one-dimension of
the data matrix and does not take advantage of the duality between the rows
and the columns of a matrix.

2.3 Model Formulation and Analysis


2.3.1 Block Value Decomposition
We start by reviewing the notion of dyadic data. The notion dyadic refers
to a domain with two sets of objects: X = {x1 , . . . , xn } and Y = {y1 , . . . , ym }

© 2010 by Taylor and Francis Group, LLC


14 Co-Clustering

in which the observations are made for dyads(x, y). Usually a dyad is a scalar
value w(x, y), e.g., the frequency of co-occurrence, or the strength of prefer-
ence/association/expression level. For the scalar dyads, the data can always
be organized as an n-by-m two-dimensional matrix Z by mapping the row
indices into X and the column indices into Y. Then, each w(x, y) corresponds
to one element of Z.
We are interested in simultaneously clustering X into k disjoint clusters and
Y into l disjoint clusters. Let the k clusters of X be written as: {x̂1 , . . . , x̂k },
and the l clusters of Y be written as: {ŷ1 , . . . , ŷl }. In other words, we are
interested in finding mappings CX and CY ,
CX : {x1 , . . . , xn } → {x̂1 , . . . , x̂k }
CY : {y1 , . . . , yn } → {ŷ1 , . . . , ŷl }
This is equivalent to finding the block structures of the matrix Z, i.e., finding
the k × l submatrices of Z such that the elements within each submatrix are
similar to each other and elements from different submatrices are dissimilar
to each other. This equivalence relation can be illustrated by the procedure
below.
Suppose that we are given the cluster labels of rows and columns. Let us
permute the rows of Z such that the rows within the same cluster are arranged
together and the columns within the same cluster are arranged together. Con-
sequently, we have discovered the hidden block structure from the permuted
data matrix. On the other hand, if we are given the data matrix with block
structure, it is trivial to derive the clustering of rows and columns. The original
data matrix and the permuted data matrix in Figure 2.1 give an illustrative
example.
Since the elements within each block are similar to each other, we expect
one center to represent each block. Therefore a k ×l small matrix is considered
as the compact representation for the original data matrix with a k × l block
structure. In the traditional one-way clustering, given the cluster centers and
the weights that denote degrees of observations associated with their clusters,
one can approximate the original data by linear combinations of the cluster
centers. Similarly, we should be able to “reconstruct” the original data matrix
by the linear combinations of the block centers. Based on this observation,
we formulate the problem of co-clustering dyadic data as the optimization
problem of matrix decomposition, i.e., block value decomposition (BVD).

DEFINITION 2.1 Block value decomposition of a data matrix Z ∈ n×m


is given by the minimization of
f (R, B, C) = Z − RBC2 (2.1)
subject to the constraints ∀ij : Rij ≥ 0 and Cij ≥ 0, where  ·  denote
Frobenius matrix norm, R ∈ n×k , B ∈ k×l , C ∈ l×m , k  n, and
l  m.

© 2010 by Taylor and Francis Group, LLC


Model Formulation and Analysis 15

Permuted data matrix


with block structure Original data matrix

(a) (b)

X X =

(c)

Reconstructed matrix
R B C

FIGURE 2.1: The original data matrix (b) with a 2 × 2 block structure which
is demonstrated by the permuted data matrix (a). The row-coefficient matrix
R, the block value matrix B, and the column-coefficient matrix C give a
reconstructed matrix (c) to approximate the original data matrix (b).

© 2010 by Taylor and Francis Group, LLC


16 Co-Clustering

Y-a xis

Y-a xis
X-axis X-axis

Directions found by BVD Directions found by SVD

FIGURE 2.2: Illustration of the difference between BVD and SVD.

We call the elements of B as the block values; B as the block value matrix;
R as the row-coefficient matrix; and C as the column-coefficient matrix . As
is discussed before, B may be considered as a compact representation of Z; R
denotes the degrees of rows associated with their clusters; and C denotes the
degrees of the columns associated with their clusters. We seek to approximate
the original data matrix by the reconstructed matrix, i.e., RBC, as illustrated
in Figure 2.1.
Under the BVD framework, the combinations of the components also have
an intuitive interpretation. RB is the matrix containing the basis for the
column space of Z and BC contains the basis for the row space of Z. For
example, for a word-by-document matrix Z, each column of RB captures a
base topic of a particular document cluster and each row of BC captures a
base topic of a word cluster.
Compared with SVD-based approaches, there are three main differences
between BVD and SVD. First, in BVD, it is natural to consider each row
or column of a data matrix as an additive combination of the block values
since BVD does not allow negative values in R and C. In contrast, since SVD
allows the negative values in each component, there is no intuitive interpre-
tation for the negative combinations. Second, unlike the singular vectors in
SVD, the basis vectors contained in RB and BC are not necessarily orthog-
onal. Although singular vectors in SVD have a statistical interpretation as
the directions of the variance, they typically do not have clear physical inter-
pretations. In contrast, the directions of the basis vectors in BVD have much
more straightforward correspondence to the clusters (Figure 2.2). Third, SVD
is a full rank decomposition whereas BVD is a reduced rank approximation.
Since the clustering task seeks the reduced or compact representation for the
original data, BVD achieves the objective directly, i.e., the final clusters can
be easily derived without additional clustering operations. In summary, com-
pared with SVD or eigenvector-based decomposition, the decomposition from
BVD has an intuitive interpretation, which is necessary for many data mining

© 2010 by Taylor and Francis Group, LLC


Model Formulation and Analysis 17

applications.
BVD provides a general framework for co-clustering. Depending on differ-
ent data types in different applications, various formulations and algorithms
may be developed under the BVD framework. An interesting observation is
that the data matrices in many important applications are typically nonneg-
ative, such as the co-occurrence tables, the performance/rating matrices and
the proximity matrices. Some other data may be transformed into the non-
negative form, such as the gene expression data. Therefore, in the rest of
the paper, we concentrate on developing a specific novel method under BVD
framework, the nonnegative block value decomposition (NBVD).

2.3.2 NBVD Method


In this section we formulate NBVD as an optimization problem and discuss
several important properties about NBVD. Discussions in this section can also
be applied to general BVD framework.

DEFINITION 2.2 Nonnegative block value decomposition of a nonnega-


tive data matrix Z ∈ n×m (i.e., ∀ij : Zij ≥ 0) is given by the minimization
of
f (R, B, C) = Z − RBC2 (2.2)
subject to the constraints ∀ij : Rij ≥ 0, Bij ≥ 0 and Cij ≥ 0, where R ∈ n×k ,
B ∈ k×l , C ∈ l×m , k  n, and l  m.

Property 1: The solution to NBVD at a global minimum or a specific local


minimum is not unique.

If R, B, and C are a solution to the objective function defined in Equa-


tion 2.2 at a global minimum or a specific local minimum, then, RU , U −1 BV ,
and V −1 C are another solution to f at the same minimum for appropriate
positive invertible matrices U and V because

(RU )(U −1 BV )(V −1 C) = RIk BI l C = RBC

where Ik and Il are identity matrices.


Therefore, in order to make the solution unique, we may consider normal-
izing the components. However, the normalization may also change the clus-
tering result. For example, If we normalize R by the column, i.e., let R̃ = RU ,
where U = diag(RT e) and e = [1, 1, . . . , 1]T , the cluster labels of some row
observations may be changed because the relative weight for association with
their row clusters may also be changed. Consequently, whether or not and
how to do normalization usually depend on the data and the specific applica-
tion. For example, in document clustering, typically each document vector is
normalized to have unit L2 norm. Thus, normalizing each column of RB to

© 2010 by Taylor and Francis Group, LLC


18 Co-Clustering

have unit L2 norm is desirable, since RB consists of the basis vectors of the
document space. Assuming that RB is normalized to RBV , the cluster labels
for the documents are given by V −1 C instead of C.

Property 2: With the column-normalized R and the row-normalized C,


each block value in B may be interpreted as the sum of the elements of the
corresponding block.

Let us illustrate the property with an example. Consider a 6 × 6 word-by-


document co-occurrence matrix below:
⎡ ⎤
010665
⎢0 0 0 6 6 6⎥
⎢ ⎥
⎢5 5 5 0 0 0⎥
⎢ ⎥
⎢5 5 5 0 0 0⎥.
⎢ ⎥
⎣4 0 4 3 3 3⎦
344303

Clearly, the matrix may be divided into 3 × 2 = 6 blocks. A nonnegative block


value decomposition is given as follows:
⎡ ⎤ ⎡ ⎤T
0.5 0 0 0.34 0
⎢ 0.5 0 0 ⎥ ⎡ ⎤ ⎢ 0.3 0 ⎥
⎢ ⎥ 1 35 ⎢ ⎥
⎢ 0 0.5 0 ⎥ ⎢ ⎥
⎢ ⎥ × ⎣ 30 0 ⎦ × ⎢ 0.36 0 ⎥ .
⎢ 0 0.5 0 ⎥ ⎢ 0 0.36 ⎥
⎢ ⎥ 19 15 ⎢ ⎥
⎣ 0 0 0.5 ⎦ ⎣ 0 0.3 ⎦
0 0 0.5 0 0.34

In the above decomposition, R is normalized by column and C (for the


format reason, it is shown as its transpose) is normalized by row and the
block value of B is the sum of the elements of the corresponding block. In
this example, B can be intuitively interpreted as the co-occurrence matrix
of the word clusters and the document clusters. In fact, if we interpret the
data matrix as the joint distribution of the words and documents, all the
components have a clear probabilistic interpretation. The column-normalized
R may be considered as the conditional distribution p(word|word cluster),
the row-normalized C may be considered as the conditional distribution
p(document|document cluster), and B may be considered as the joint dis-
tribution of the word clusters and the document clusters.
By property 1, given an NBVD decomposition, we can always define the
column-normalized R, the row-normalized C, and the corresponding B as a
solution to the original problem. We denote B in this situation as Bs .

Property 3: The variation of the mean block value matrix B̄ is a measure


of the quality of co-clustering, where B̄ is defined as dividing each element of

© 2010 by Taylor and Francis Group, LLC


Model Formulation and Analysis 19

Bs by the cardinality of the corresponding block.

Intuitively, each element of B̄ represents the mean of the corresponding


block. Therefore, under the same objective value, the larger the variation of
B̄, the larger the difference among the blocks, the better separation the co-
clustering has.
We propose the simple statistic, the coefficient of variance (CV), to measure
the variation B̄. Typically, CV is used to measure the variation of the sample
from different populations, and thus, it is appropriate to measure the variation
of B̄ with different dimensions. CV is defined as the ratio between the standard
deviation and the mean. Hence, the CV for the mean block value matrix B̄
is, 
2
i,j (Bij − b̄) /(kl − 1)
¯
CV (B̄) = , (2.3)

where b̄ is the mean of B̄ and 1 ≤ i ≤ k, 1 ≤ j ≤ l. To measure the qualities of
the row clusters and the column clusters, respectively, we define average CV
for the rows of B̄, CV (B̄r ), and for the columns of B̄, CV (B̄c ), respectively.

2
i (Bij − b̄j ) /(l − 1)
¯
1
CV (B̄r ) = (2.4)
l j b̄j

1 j (B¯ij − b̄i )2 /(l − 1)
CV (B̄c ) = , (2.5)
k i
b̄i

where b̄i is the mean of the ith row of B̄ and b̄j is the mean of the jth column
of B̄. It is necessary to define these statistics in order to provide certain useful
information (e.g., to find the optimal number of clusters).
Finally, we compare NBVD with Nonnegative Matrix Factorization (NMF)
[36]. Given a nonnegative data matrix V , NMF seeks to find an approximate
factorization V ≈ W H with non-negative components W and H. Essentially,
NMF concentrates on the one-sided, individual clustering and does not take
the advantage of the duality between the row clustering and the column clus-
tering. In fact, NMF may be considered as a special case of NBVD in the
sense that W H = W IH, where I is an identity matrix. By this formulation,
NMF is a special case of NBVD and it does co-clustering with the additional
restrictions that the number of the row clusters equals to that of the col-
umn clusters and that each row cluster is associated with one column cluster.
Clearly, NBVD is more flexible to exploit the hidden block structure of the
original data matrix than NMF.

© 2010 by Taylor and Francis Group, LLC


Chapter 3
Heterogeneous Relational Data
Clustering

In more general cases, heterogeneous relational data consist of more than two
types of data objects. Those multiple-type interrelated data objects formulates
a k-partite heterogeneous relation graph. The research on mining the hidden
structures from a k-partite heterogeneous relation graph is still limited and
preliminary. In this chapter, we propose a general model, the relation summary
network, to find the hidden structures (the local cluster structures and the
global community structures) from a k-partite heterogeneous relation graph.
The model provides a principal framework for unsupervised learning on k-
partite heterogeneous relation graphs of various structures.

3.1 Introduction
Clustering approaches have traditionally focused on the homogeneous data
objects. However, many examples of real-world data involve objects of mul-
tiple types that are related to each other, which naturally form k-partite
heterogeneous relation graphs of heterogeneous types of nodes. For example,
documents, words, and categories in taxonomy mining, as well as Web pages,
search queries, and Web users in a Web search system all form a tripartite
graph; papers, key words, authors, and publication venues in a scientific publi-
cation archive form a quart-partite graph. In such scenarios, using traditional
approaches to cluster each type of objects (nodes) individually may not work
well due to the following reasons.
First, to apply traditional clustering approaches to each type of data ob-
jects individually, the relation information needs to be transformed into fea-
ture vectors for each type of objects. In general, this transformation results
in high-dimensional and sparse feature vectors, since after the transformation
the number of the features for a single type of objects is equal to that of all
the objects which are possibly related to this type of objects. For example, if
we transform the links between Web pages and Web users as well as search
queries into the features for the Web pages, this leads to a huge number of
features with sparse values for each Web page. Second, traditional clustering

21

© 2010 by Taylor and Francis Group, LLC


22 Heterogeneous Relational Data Clustering

approaches are unable to tackle the interactions among the cluster structures
of different types of objects, since they cluster data of a single type based
on static features. Note that the interactions could pass along the relations,
i.e., there exists influence propagation in a k-partite heterogeneous relation
graph. Third, in some data mining applications, users are interested not only
in the local cluster structures for each type of objects, but also in the global
community structures involving multi-types of objects. For example, in doc-
ument clustering, in addition to document clusters and word clusters, the
relationship between the document clusters and the word clusters is also use-
ful information. It is difficult to discover such global structures by clustering
each type of objects individually.
An intuitive attempt to mine the hidden structures from k-partite heteroge-
neous relation graphs is applying the existing graph partitioning approaches
to k-partite heterogeneous relation graphs. This idea may work in some spe-
cial and simple situations. However, in general, it is infeasible. First, the graph
partitioning theory focuses on finding the best cuts of a graph under a certain
criterion and it is very difficult to cut different type of relations (links) simul-
taneously to identify different hidden structures for different types of nodes.
Second, by partitioning an entire k-partite heterogeneous relation graph into
m subgraphs, one actually assumes that all different types of nodes have the
same number of clusters m, which in general is not true. Third, by simply
partitioning the entire graph into disjoint subgraphs, the resulting hidden
structures are rough. For example, the clusters of different types of nodes are
restricted to one-to-one associations.
Therefore, mining hidden structures from k-partite heterogeneous relation
graphs has presented a great challenge to traditional clustering approaches. In
this chapter, first we propose a general model, the relation summary network,
to find the hidden structures (the local cluster structures and the global com-
munity structures) from a k-partite heterogeneous relation graph. The basic
idea is to construct a new k-partite heterogeneous relation graph with hid-
den nodes, which “summarize” the link information in the original k-partite
heterogeneous relation graph and make the hidden structures explicit, to ap-
proximate the original graph. The model provides a principal framework for
unsupervised learning on k-partite heterogeneous relation graphs of various
structures. Second, under this model, based on the matrix representation of
a k-partite heterogeneous relation graph we reformulate the graph approxi-
mation as an optimization problem of matrix approximation and derive an
iterative algorithm to find the hidden structures from a k-partite heteroge-
neous relation graph under a broad range of distortion measures. By itera-
tively updating the cluster structures for each type of nodes, the algorithm
takes advantage of the interactions among the cluster structures of different
types of nodes and performs an implicit adaptive feature reduction for each
type of nodes. Experiments on both synthetic and real data sets demonstrate
the promise and effectiveness of the proposed model and algorithm. Third, we
also establish the connections between existing clustering approaches and the

© 2010 by Taylor and Francis Group, LLC


Related Work 23

proposed model to provide a unified view to the clustering approaches.

3.2 Related Work


Graph partitioning on homogeneous graphs has been studied for decades
and a number of different approaches, such as spectral approaches [28, 47,
113] and multilevel approaches [25, 63, 75], have been proposed. However, the
research on mining cluster structures from k-partite heterogeneous relation
graphs of heterogeneous types of nodes is limited. Several noticeable efforts
include [43, 69] and [56]. [43, 69] extend the spectral partitioning based on the
normalized cut to a bipartite graph. After the deduction, spectral partitioning
on the bi-partite graph is converted to a singular value decomposition (SVD).
[56] partitions a star-structured k-partite heterogeneous relation graph based
on semi-definite programming. In addition to the restriction that they are only
applicable to the special cases of k-partite heterogeneous relation graphs, all
these algorithms have the restriction that the numbers of the clusters for
different types of nodes must be equal and the clusters for different types of
objects must have one-to-one associations.
The research on clustering multi-type interrelated objects is also related
to this study. Clustering on bi-type interrelated data objects, such as word-
document data, is called co-clustering or bi-clustering. Recently, co-clustering
has been addressed based on matrix factorization. Both [88] and [85] model
the co-clustering as an optimization problem involving a triple matrix factor-
ization. [88] proposes an EM-like algorithm based on multiplicative updating
rules and [85] proposes a hard clustering algorithm for binary data. [45] ex-
tends the nonnegative matrix factorization to symmetric matrices and shows
that it is equivalent to the kernel k-means and the Laplacian-based spectral
clustering.
Some efforts on latent variable discovery are also related to co-clustering.
PLSA [66] is a method based on a mixture decomposition derived from a latent
class model. A two-sided clustering model is proposed for collaborative filter-
ing by [67]. Information-theory based co-clustering has also attracted attention
in the literature. [49] extends the information bottleneck (IB) framework [122]
to repeatedly cluster documents and then words. [44] proposes a co-clustering
algorithm to maximize the mutual information between the clustered random
variables subject to the constraints on the number of row and column clusters.
A more generalized co-clustering framework is presented by [11] wherein any
Bregman divergence can be used in the objective function.
Comparing with co-clustering, clustering on the data consisting of more
than two types of data objects has not been well studied in the literature.
Several noticeable efforts are discussed as follows. [137] proposes a framework

© 2010 by Taylor and Francis Group, LLC


24 Heterogeneous Relational Data Clustering

v11 v12 v13 v14 v15 v16 v11 v12 v13 v14 v15 v16

s 11 s12 s13

s 21 s22

v21 v22 v23 v24 v21 v22 v23 v24

(a) (b)

FIGURE 3.1: A bipartite graph (a) and its relation summary network (b).

for clustering heterogeneous web objects, under which a layered structure with
the link information is used to iteratively project and propagate the cluster
results between layers. Similarly, [125] presents an approach named ReCom
to improve the cluster quality of interrelated data objects through an iter-
ative reinforcement clustering process. However, there is no sound objective
function and theoretical proof on the effectiveness of these algorithms. [87] for-
mulates heterogeneous relational data clustering as a collective factorization
on related matrices and derives a spectral algorithm to cluster multi-type in-
terrelated data objects simultaneously. The algorithm iteratively embeds each
type of data objects into low-dimensional spaces and benefits from the interac-
tions among the hidden structures of different types of data objects. Recently,
a general method based on matrix factorization is independently developed
by [115, 116], but had not yet appeared at the time of the write-up
To summarize, unsupervised learning on k-partite heterogeneous relation
graphs has been touched from different perspectives due to its high impact in
various important applications. Yet, systematic research is still limited. This
chapter attempts to derive a theoretically sound general model and algorithm
for unsupervised learning on k-partite heterogeneous relation graphs of various
structures.

3.3 Relation Summary Network Model


In this section, we derive a general model based on graph approximation to
mine the hidden structures from a k-partite heterogeneous relation graph.
Let us start with an illustrative example. Figure 3.1a shows a bipartite
graph G = (V1 , V2 , E) where V1 = {v11 , . . . , v16 } and V2 = {v21 , . . . , v24 }
denote two types of nodes and E denotes the edges in G. Even though this
graph is simple, it is nontrivial to discover its hidden structures. In Figure

© 2010 by Taylor and Francis Group, LLC


Relation Summary Network Model 25

3.1b, we redraw the original graph by adding two sets of new nodes (called
hidden nodes), S1 = {s11 , s12 , s13 } and S2 = {s21 , s22 }. Based on the new
graph, the cluster structures for each type of nodes are straightforward; V1 has
three clusters: {v11 , v12 }, {v13 , a14 }, and {v15 , v16 }, and V2 has two clusters,
{v21 , v22 } and {v23 , b24 }. If we look at the subgraph consisting of only the
hidden nodes in Figure 3.1b, we see that it provides a clear skeleton for the
global structure of the whole graph, from which it is clear how the clusters of
different types of nodes are related to each other; for example, cluster s11 is
associated with cluster s21 and cluster s12 is associated with both clusters s21
and s22 . In other words, by introducing the hidden nodes into the original k-
partite heterogeneous relation graph, both the local cluster structures and the
global community structures become explicit. Note that if we apply a graph
partitioning approach to the bipartite graph in Figure 3.1a to find its hidden
structures, no matter how we cut the edges, it is impossible to identify all the
cluster structures correctly.
Based on the above observations, we propose a model, the relation summary
network (RSN), to mine the hidden structures from a k-partite heterogeneous
relation graph. The key idea of RSN is to add a small number of hidden
nodes to the original k-partite heterogeneous relation graph to make the hid-
den structures of the graph explicit. However, given a k-partite heterogeneous
relation graph, we are not interested in an arbitrary relation summary net-
work. To ensure a relation summary network to discover the desirable hidden
structures of the original graph, we must make RSN as “close” as possible to
the original graph. In other words, we aim at an optimal relation summary
network, from which we can reconstruct the original graph as precisely as
possible. Formally, we define an RSN as follows.

DEFINITION 3.1 Given a distance function D, a k-partite heterogeneous


relation graph G = (V1 , . . . , Vm , E), and m positive integers, k1 , . . . , km , the
relation summary network of G is a k-partite heterogeneous relation graph
Gs = (V1 , . . . , Vm , S1 , . . . , Sm , E s ), which satisfies the following conditions:
1. Each instance node in Vi is adjacent to one and only one hidden node
from Si for 1 ≤ i ≤ m with unit weight;
2. Si ∼ Sj in Gs if and only if Vi ∼ Vj in G for i = j and 1 ≤ i, j ≤ m;
3. Gs = arg minF D(G, F ),
where Si denotes a set of hidden nodes for Vi and |Si | = ki for 1 ≤
i ≤ m; Si ∼ Sj denotes that there exist edges between Si and Sj , and
similarly Vi ∼ Vj ; F denotes any k-partite heterogeneous relation graph
(V1 , . . . , Vm , S1 , . . . , Sm , E f ) satisfying Conditions 1 and 2.

In Definition 3.1, the first condition implies that in an RSN, the instance
nodes (the nodes in Vi ) are related to each other only through the hidden

© 2010 by Taylor and Francis Group, LLC


26 Heterogeneous Relational Data Clustering

v31 v32 v33 v34

v31 v32 v33 v34 s 31 s32

v11 v v v14 v15 v16


v11 v12 v13 v14 v15 v16 s11 12 13
s12 s 13

s21 s22
v21 v22 v23 v24
v21 v22 v23 v24

(a) (b)

FIGURE 3.2: A tripartite graph (a) and its RSN (b)

nodes. Hence, a small number of hidden nodes actually summarize the complex
relations (edges) in the original graph to make the hidden structures explicit.
Since in this study, our focus is to find disjoint clusters for each type of nodes,
the first condition restricts one instance node to be adjacent to only one hidden
node with unit weight; however, it is easy to modify this restriction to extend
the model to other cases of unsupervised learning on k-partite heterogeneous
relation graphs. The second condition implies that if two types of instance
nodes Vi and Vj are (or are not) related to each other in the original graph,
then the corresponding two types of hidden nodes Si and Sj in the RSN are
(or are not) related to each other. For example, Figure 3.2 shows a tripartite
graph and its RSN. In the original graph Figure 3.2a, V1 ∼ V2 and V1 ∼ V3 ,
and hence S1 ∼ S2 and S1 ∼ S3 in its RSN. The third condition states that
the RSN is an optimal approximation to the original graph under a certain
distortion measure.
Next, we need to define the distance between a k-partite heterogeneous
relation graph G and its RSN Gs . Without loss of generality, if Vi ∼ Vj
in G, we assume that edges between Vi and Vj are complete (if there is no
edge between vih and vjl , we can assume an edge with weight of zero or
other special value). Similarly for Si ∼ Sj in Gs . Let e(vih , vjl ) denote the
weight of the edge (vih , vjl ) in G. Similarly let es (sip , sjq ) be the weight of
the edge (sip , sjq ) in Gs . In the RSN, a pair of instance nodes vih and vjl are
connected through a unique path (vih , sip , sjq , vjl ), in which es (vih , sip ) = 1
and es (sjq , vjl ) = 1 according to Definition 3.1. The edge between two hidden
nodes (sip , sjq ) can be considered as the “summary relation” between two sets
of instance nodes, i.e., the instance nodes connecting with sip and the instance
nodes connecting with sjq . Hence, how good Gs approximates G depends on
how good es (sip , sjq ) approximates e(vih , vjl ) for vih and vjl which satisfy
es (vih , sip ) = 1 and es (sjq , vjl ) = 1, respectively. Therefore, we define the
distance between a k-partite heterogeneous relation graph G and its RSN Gs

© 2010 by Taylor and Francis Group, LLC


Relation Summary Network Model 27

as follows:

D(G, Gs ) = D(e(vih , vjl ), es (sip , sjq )), (3.1)


i,j Vi ∼Vj ,
vih ∈Vi ,vjl ∈Vj ,
s
e (vih ,sip )=1,
es (sjq ,vjl )=1.

where 1 ≤ i, j ≤ m, 1 ≤ h ≤ |Vi |, 1 ≤ l ≤ |Vj |, 1 ≤ p ≤ |Si |, and 1 ≤ q ≤ |Sj |.


Let us have an illustrative example. Assume that the edges of the k-partite
heterogeneous relation graph in Figure 3.1a have unit weights. If there is no
edge between vih and vjl , we let e(vih , vjl ) = 0. Similarly for its RSN in
Figure 3.1b. Assume that D is the Euclidean distance function. Hence, based
on Equation (3.1), D(G, Gs ) = 0, i.e., from the RSN in Figure 3.1b, we can
reconstruct the original graph in Figure 3.1a without any error. For example,
the path (v13 , s12 , s21 , v22 ) in the RSN implies that there is an edge between
v13 and v22 in the original graph such that e(v13 , v22 ) = es (s12 , s21 ). Following
this procedure, the original graph can be reconstructed completely.
Note that different definitions of the distances between two graphs lead to
different algorithms. In this study, we focus on the definition given in Equa-
tion (3.1). One of the advantages of this definition is that it leads to a nice
matrix representation for the distance between two graphs, which facilitates
to derive the algorithm.
Definition 3.1 and Equation (3.1) provide a general model, the RSN model,
to mine the cluster structures for each type of nodes in a k-partite heteroge-
neous relation graph and the global structures for the whole graph. Compared
with the traditional clustering approaches, the RSN model is capable of mak-
ing use of the interactions (direct or indirect) among the hidden structures
of different types of nodes, and through the hidden nodes performing implicit
and adaptive feature reduction to overcome the typical high dimensionality
and sparsity. Figure 3.3 shows an illustrative example of how the cluster struc-
tures of two types of instance nodes affect the similarity between two instance
nodes of another type. Suppose that we are to cluster nodes in V1 (only two
nodes in V1 are shown in Figure 3.3a). Traditional clustering approaches deter-
mine the similarity between v11 and v12 based on their link features, [1, 0, 1, 0]
and [0, 1, 0, 1], respectively, and hence, their similarity is inappropriately con-
sidered as zero (lowest level). This is a typical situation in a large graph with
sparse links. Now suppose that we have derived hidden nodes for V2 and V3 as
in Figure 3.3b; through the hidden nodes the cluster structures of V2 change
the similarity between v11 and v12 to 1 (highest level), since the reduced link
features for both v11 and v12 are [1, 1], which is a more reasonable result,
since in a sparse k-partite heterogeneous relation graph we expect that two
nodes are similar when they are connected to similar nodes even though they
are not connected to the same nodes. If we continue this example, next, v11
and v12 are connected with the same hidden nodes in S1 (not shown in the
figure); then after the hidden nodes for V1 are derived, the cluster structures

© 2010 by Taylor and Francis Group, LLC


28 Heterogeneous Relational Data Clustering

v31 v32 v33 v34 v35 v36


v31 v32 v33 v34 v35 v36
s31 s31
v21 v22 v23 v24 v21 s v22 v23 s v24
21 22

v11 v12
v11 v12

(a) (b)

FIGURE 3.3: The cluster structures of V2 and V3 affect the similarity between
v11 and v12 through the hidden nodes.

of V2 and V3 may be affected in return. In fact, this is the idea of the iterative
algorithm to construct an RSN for a k-partite heterogeneous relation graph,
which we discuss in the next section.

© 2010 by Taylor and Francis Group, LLC


Chapter 4
Homogeneous Relational Data
Clustering

Homogeneous relational data consist of only one type of data objects. In the
literature, a special case of homogeneous relational data clustering has been
studied as the graph partitioning problem. However, the research on the gen-
eral case is still limited. In this chapter, we propose a general model based
on the graph approximation to learn relation-pattern-based cluster structures
from a graph. The model generalizes the traditional graph partitioning ap-
proaches and is applicable to learning the various cluster structures.

4.1 Introduction
Learning clusters from homogeneous relational graphs is an important prob-
lem in these applications, such as Web mining, social network analysis, bioin-
formatics, VLSI design, and task scheduling. In many applications, users are
interested in strongly intra-connected clusters in which the nodes are intra-
cluster close and intercluster loose. Learning this type of the clusters corre-
sponds to finding strongly connected subgraphs from a graph, which has been
studied for decades as a graph partitioning problem [28, 77, 113].
In addition to the strongly intra-connected clusters, other types of the clus-
ters also attract an intensive attention in many important applications. For
example, in Web mining, we are also interested in the clusters of Web pages
that sparsely link to each other but all densely link to the same Web pages [80],
such as a cluster of music “fans” Web pages which share the same taste on
music and are densely linked to the same set of music Web pages but sparsely
linked to each other. Learning this type of the clusters corresponds to finding
dense bipartite subgraphs from a graph, which has been listed as one of the
five algorithmic challenges in Web search engines [64].
The strongly intra-connected clusters and weakly intra-connected clusters
are two basic cluster structures, and various types of clusters can be generated
based on them. For example, a Web cluster could take on different structures
during its development, i.e., in its early stage, it has the form of bipartite
graph, since in this stage the members of the cluster share the same interests

29

© 2010 by Taylor and Francis Group, LLC


30 Homogeneous Relational Data Clustering

2 3
5 6 1 2 5 6
1 4 3 4 7 8
7 8
9 10 11 12
9 10 1314
13 14 15 16 1112 1516

(a) (b)

FIGURE 4.1: A graph with mixed cluster structures (a) and its cluster pro-
totype graph (b).

(linked to the same Web pages) but have not known (linked to) each other;
in the later stage, with members of the cluster start linking to each other, the
cluster becomes a hybrid of the aforementioned two basic cluster structures;
in the final stage it develops into a larger strongly intra-connected cluster.
These various types of clusters can be unified into a general concept,
relation-pattern-based cluster. A relation-pattern-based cluster is a group of
nodes which have the similar relation patterns, i.e., the nodes within a cluster
relate to other nodes in similar ways. Let us have an illustrative example. Fig-
ure 4.1a shows a graph of mixed types of clusters. There are four clusters in
Figure 4.1a: C1 = {v1 , v2 , v3 , v4 }, C2 = {v5 , v6 , v7 , v8 }, C3 = {v9 , v10 , v11 , v12 },
and C4 = {v13 , v14 , v15 , v16 }. Within the strongly intra-connected cluster C1 ,
the nodes have the similar relation patterns, i.e., they all strongly link to the
nodes in C1 (their own cluster) and C3 , and weakly link to the nodes in C2
and C4 ; within the weakly intra-connected cluster C3 , the nodes also have the
similar relation patterns, i.e., they all weakly link to the nodes in C3 (their
own cluster), and C2 , strongly link to the nodes in C1 and C4 ; Similarly for
the nodes in cluster C3 and the nodes in cluster C4 . Note that graph partition-
ing approaches cannot correctly identify the cluster structure of the graph in
Figure 4.1a, since they seek only strongly intra-connected clusters by cutting
a graph into disjoint subgraphs to minimize edge cuts.
In addition to unsupervised cluster learning applications, the concept of
the relation-pattern-based cluster also provides a simple approach for semi-
supervised learning on graphs. In many applications, graphs are very sparse
and there may exist a large number of isolated or nearly isolated nodes which
do not have cluster patterns. However, according to extra supervised infor-
mation (domain knowledge), these nodes may belong to certain clusters. To
incorporate the supervised information, a common approach is to manually
label these nodes. However, for a large graph, manually labeling is labor-
intensive and expensive. Furthermore, to make use of these labels, instead of
supervised learning algorithms, different semi-supervised learning algorithms

© 2010 by Taylor and Francis Group, LLC


Introduction 31

need to be designed. The concept of the relation-pattern-based cluster pro-


vides a simple way to incorporate supervised information by adding virtual
nodes to graphs. The idea is that if the nodes belong to the same cluster
according to the supervised information, they are linked to the same virtual
nodes. Then an algorithm which is able to learn general relation-pattern-based
clusters can be directly applied to the graphs with virtual nodes to make use
of the supervised information to learn cluster patterns.
For example, to find the hidden classes from a collection of the documents, a
common approach is to represent the collection as a graph in which each node
denotes a document and each edge weight denotes the similarity between two
documents [43, 69]. Usually, the similarities are calculated based on the term-
frequency vectors of documents. However, there may exist documents which
share no or very few words with each other but still belong to the same cluster
according to extra domain information. Let us have an illustrative example. In
Figure 4.2, the dark color nodes (documents) do not share any words and are
not linked to each other. However, they all belong to the “vehicle” cluster. By
adding virtual nodes (documents) (light color nodes in Figure 4.2) which are
concept documents consisting of popular words for the “vehicle” cluster, the
originally isolated document nodes are linked to the virtual document nodes
and the supervised information is embedded into the relation patterns.
Therefore, various applications involving unsupervised as well as semi-
supervised cluster learning have presented a great need to relation-pattern-
based cluster learning algorithms. In this chapter, we propose a general model
based on graph approximation to learn the relation-pattern-based cluster
structures from a graph. By unifying the traditional edge cut objectives, the
model provides a new view to understand the graph partitioning approaches
and at the same time it is applicable to learning various cluster structures.
Under this model, we derive three novel algorithms to learn the general clus-
ter structures from a graph, which cover three main versions of unsupervised
learning algorithms: hard, soft, and balanced versions, to provide a complete
family of cluster learning algorithms. This family of algorithms has the fol-
lowing advantages: they are flexible to learn various types of clusters; when

Car, Auto, SUV,


camry, taurus, ……. hummer,
accord, focus, landrover
….. ….. …..

Auto, Car, truck


suv, van, suv
camry,.. hummer,..
focus, ….. focus, …..

FIGURE 4.2: A graph with virtual nodes.

© 2010 by Taylor and Francis Group, LLC


32 Homogeneous Relational Data Clustering

applied to learning strongly intra-connected clusters, this family evolves to


a new family of effective graph partition algorithms; it is easy for the pro-
posed algorithms to incorporate the prior knowledge of the cluster structure
into the algorithms. Experimental evaluation and theoretical analysis show
the effectiveness and great potential of the proposed model and algorithms.

4.2 Related Work


Graph partitioning divides the nodes of a graph into clusters by finding
the best edge cuts of the graph. Several edge cut objectives, such as the
average cut [28], average association [113], normalized cut [113], and min-max
cut [47], have been proposed. Various spectral algorithms have been developed
for these objective functions [28,47,113]. These algorithms use the eigenvectors
of a graph affinity matrix, or a matrix derived from the affinity matrix, to
partition the graph. Since eigenvectors computed do not correspond directly
to individual partitions, a postprocessing approach [136], such as k-means,
must be applied to find the final partitions.
Multilevel methods have been used extensively for graph partitioning with
the Kernighan-Lin objective, which attempt to minimize the cut in the graph
while maintaining equal-sized clusters [25,63,75]. In multilevel algorithms, the
graph is repeatedly coarsened level by level until only a small number of nodes
are left. Then, an initial partitioning on this small graph is performed. Finally,
the graph is uncoarsened level by level, and, at each level, the partitioning
from the previous level is refined using a refinement algorithm.
Recently, graph partitioning with an edge cut objective has been shown to
be mathematically equivalent to an appropriately weighted kernel k-means
objective function [40, 41]. Based on this equivalence, the weighted kernel k-
means algorithm has been proposed for graph partitioning [40–42].
Learning clusters from a graph has also been intensively studied in the
context of social network analysis [109]. Hierarchical clustering [109, 128] has
been proposed to learn clusters. Recent algorithms [32, 60, 94] address several
problems related to the prior knowledge of cluster size, the precise definition
of inter-nodes similarity measure, and improved computational efficiency [95].
However, their main focus is still learning strongly intra-connected clusters.
Some efforts [4, 55, 65, 65, 118] can be considered as cluster learning based on
stochastic block modeling.
There are efforts in the literature focusing on finding clusters based on
dense bipartite graphs [80, 104]. The trawling algorithm [80] extracts clus-
ters (which are called emerging clusters in [80] as the counterpart concept of
strongly intra-connected cluster) by first applying the Apriori algorithm to
find all possible cores (complete bipartite graphs) and then expanding each

© 2010 by Taylor and Francis Group, LLC


Community Learning by Graph Approximation 33

core to a full-fledged cluster with the HITS algorithm [79]. [104] proposes a
different approach to extract the emerging clusters by finding all bipartite
graphs instead of finding cores.
In this chapter, we focus on how to divide the nodes of a homogeneous
relational graph into disjoint clusters based on relation patterns.

4.3 Community Learning by Graph Approximation


In this section, we propose a general model to learn relation-pattern-based
clusters from a homogeneous relational graph (for convenience, in the rest of
the chapter we simply use graph to refer to homogeneous relational graph).
To derive our model to learn latent cluster structure from a graph, we
start from the following simpler problem: if the relation-pattern-based cluster
structure of any graph is known, can we draw a simple graph with the explicit
latent cluster structure (latent relation patterns) to represent the original
graph? We present the concept of a cluster prototype graph as an answer. A
cluster prototype graph consists of a set of the cluster nodes and a set of
links, including self-links for individual cluster nodes and inter-links for pairs
of cluster nodes.
For example, Figure 4.1b shows a cluster prototype graph for the graph
in Figure 4.1a. Note that for convenience, in all the examples, we use 0-1
graphs where the edge weight 0 denotes the absence of an edge between two
nodes and the edges with weight 0 are not shown in the graphs. However, all
the discussions are applicable to a general weighted graph. In Figure 4.1(b),
the top-left cluster node is associated with the nodes of C1 = {v1 , v2 , v3 , v4 }
from the original graph; the self-link of the top-left cluster node implies that
all its associated nodes are linked to each other; the inter-link between the
top-left cluster node and the bottom-left cluster node implies that the nodes
of C1 = {v1 , v2 , v3 , v4 } are linked to those of C3 = {v9 , v10 , v11 , v12 }. Hence,
the cluster prototype graph in Figure 4.1b provides a clear view of the cluster
structure and the relation patterns for the original graph in Figure 4.1a. Given
the cluster structures of any graph, we can always draw its cluster prototype
graph.
Therefore, learning the hidden cluster structures from a graph can be for-
mulated as finding its optimal cluster prototype graph which is the “closest”
to the original graph, i.e., based on this cluster prototype graph, the origi-
nal graph can be constructed most precisely. By representing a graph as an
affinity matrix, this problem can be formally formulated as an optimization
problem of matrix approximation,

arg min

||A − A∗ ||2 , (4.1)
A

© 2010 by Taylor and Francis Group, LLC


Another Random Scribd Document
with Unrelated Content
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like