Instant Download Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long PDF All Chapters
Instant Download Relational Data Clustering Models Algorithms and Applications 1st Edition Bo Long PDF All Chapters
https://ebookgate.com
https://ebookgate.com/product/relational-data-
clustering-models-algorithms-and-applications-1st-
edition-bo-long/
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A
PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS
David Skillicorn
Relational
Data Clustering
Models, Algorithms,
and Applications
Bo Long
Zhongfei Zhang
Philip S. Yu
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
QA76.9.D343R46 2010
005.75’6--dc22 2010009487
To my family
Philip S. Yu
List of Tables xi
Preface xv
1 Introduction 1
1.1 Defining the Area . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Content and the Organization of This Book . . . . . . . 4
1.3 The Audience of This Book . . . . . . . . . . . . . . . . . . . 6
1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 6
I Models 9
2 Co-Clustering 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Model Formulation and Analysis . . . . . . . . . . . . . . . . 13
2.3.1 Block Value Decomposition . . . . . . . . . . . . . . . 13
2.3.2 NBVD Method . . . . . . . . . . . . . . . . . . . . . . 17
vii
II Algorithms 73
8 Co-Clustering 75
8.1 Nonnegative Block Value Decomposition Algorithm . . . . . 75
8.2 Proof of the Correctness of the NBVD Algorithm . . . . . . 78
IV Summary 179
References 185
Index 195
14.1 Data sets details. Each data set is randomly and evenly sampled
from specific newsgroups . . . . . . . . . . . . . . . . . . . . . 142
14.2 Both NBVD and NMF accurately recover the original clusters
in the CLASSIC3 data set . . . . . . . . . . . . . . . . . . . . 144
14.3 A normalized block value matrix on the CLASSIS3 data set . 145
14.4 NBVD extracts the block structure more accurately than NMF
on Multi5 data set . . . . . . . . . . . . . . . . . . . . . . . . 145
14.5 NBVD shows clear improvements on the micro-averaged-
precision values on different newsgroup data sets over other
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
xi
2.1 The original data matrix (b) with a 2 × 2 block structure which
is demonstrated by the permuted data matrix (a). The row-
coefficient matrix R, the block value matrix B, and the column-
coefficient matrix C give a reconstructed matrix (c) to approx-
imate the original data matrix (b). . . . . . . . . . . . . . . . 15
2.2 Illustration of the difference between BVD and SVD. . . . . . 16
3.1 A bipartite graph (a) and its relation summary network (b). . 24
3.2 A tripartite graph (a) and its RSN (b) . . . . . . . . . . . . . 26
3.3 The cluster structures of V2 and V3 affect the similarity between
v11 and v12 through the hidden nodes. . . . . . . . . . . . . . 28
4.1 A graph with mixed cluster structures (a) and its cluster pro-
totype graph (b). . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 A graph with virtual nodes. . . . . . . . . . . . . . . . . . . . 31
4.3 A graph with strongly connected clusters (a) and its cluster
prototype graph (b); the graph affinity matrices for (a) and
(b), (c) and (d), respectively. . . . . . . . . . . . . . . . . . . 34
14.1 The coefficient of the variance for the columns of the mean
block value matrix with the varing number of the word clusters
using NBVD on different NG20 data sets. . . . . . . . . . . . 143
xiii
The world we live today is full of data with relations—the Internet, the social
network, the telecommunications, the customer shopping patterns, as well as
the micro-array data in bioinformatics research to just name a few examples,
resulting in an active research area called relational data mining in the data
mining research field. Given the fact that in many real-world applications
we do not have the luxury to have any training data or it would become
extremely expensive to obtain training data, relational data clustering has
recently caught substantial attention from the related research communities
and has thus emerged as a new and hot research topic in the area of relational
data mining. This book is the very first monograph on the topic of relational
data clustering written in a self-contained format. This book addresses both
the fundamentals and the applications of relational data clustering, includ-
ing the theoretic models, algorithms, as well as the exemplar applications of
applying these models and algorithms to solve for real-world problems.
The authors of this book have been actively working on the topic of rela-
tional data clustering for years, and this book is the final culmination of their
years of long research on this topic. This book may be used as a collection of
research notes for researchers interested in the research on this topic, a refer-
ence book for practitioners or engineers, as well as a textbook for a graduate
advanced seminar on the topic of relational data clustering. This book may
also be used for an introductory course for graduate students or advanced
undergraduate seniors. The references collected in this book may be used as
further reading lists or references for the readers.
Due to the extensive attention received on this topic in the literature, and
also due to the rapid development in the literature on this topic in recent years,
it is by no means meant to be exhaustive to collect complete information on
relational data clustering. We intend to collect the most recent research of our
own on this topic in this book. For those who have already been in the area of
relational data mining or who already know what this area is about, this book
serves the purpose of a formal and systematic collection of part of the most
recent advances of the research on this topic. For those who are beginners to
the area of relational data mining, this book serves the purpose of a formal
and systematic introduction to relational data clustering.
It is not possible for us to accomplish this book without the great support
from a large group of people and organizations. In particular, we would like to
thank the publisher—Taylor & Francis/CRC Press for giving us the opportu-
nity to complete this book for the readers as one of the books in the Chapman
xv
& Hall/CRC Data Mining and Knowledge Discovery series, with Prof Vipin
Kumar at the University of Minnesota serving as the series editor. We would
like to thank this book’s editor of Taylor & Francis Group, Randi Cohen, for
her enthusiastic and patient support, effort, and advice; the project coordi-
nator of Taylor & Francis Group, Amber Donley, and the anonymous proof
reader for their meticulous effort in correcting typos and other errors of the
draft of the book; and Shashi Kumar of Glyph International for his prompt
technical support in formatting the book. We would like to thank Prof Ji-
awei Han at University of Illinois at Urbana-Champaign and Prof Jieping Ye
at Arizona State University as well as another anonymous reviewer for their
painstaking effort to review the book and their valuable comments to sub-
stantially improve the quality of this book. While this book is derived from
the original contributions by the authors of the book, part of the materials
of this book are also jointly contributed by their colleagues Xiaoyun Wu at
Google Research Labs and Tianbing Xu at SUNY Binghamton. This book
project is supported in part by the National Science Foundation under grant
IIS-0812114, managed by the program manager, Dr. Maria Zemankova. Any
opinions, findings, and conclusions or recommendations expressed in this ma-
terial are those of the authors and do not necessarily reflect the views of the
National Science Foundation.
Finally, we would like to thank our families for the love and support that
are essential for us to complete this book.
Models
• Text analysis. To learn the document clusters and word clusters from
the bi-type relational data, document-word data.
• Recommendation system. Movie recommendation based on user clusters
(communities) and movie clusters learned from relational data involving
users, movies, and actors/actresses.
• Online advertisement. Based on the relational data, in which advertiser,
bidded terms, and words are interrelated to each other, the clusters of
advertisers and bidder terms can be learned for bidded term suggestion.
• BBioinformatics. Automatically identifying gene groups (clusters) from
the relational data of genes, conditions, and annotation words.
Homogeneous Co-clustering
relational
clustering Heterogeneous Multipleview clustering
relational clustering
General relational
data clustering
FIGURE 1.1: Relationships among the different areas of relational data clus-
tering.
Così dunque Maria aveva passato sei mesi. E nel giorno de' morti era
venuta su la fossa della madre, fra i poveri e i buoni, a portare
anch'essa il tributo della sua orazione a quel Dio, che benedice al
dolore prezioso de' piccoli, e rasciuga le loro lagrime.
VIII.
LE ALUNNE DELLA CRESTAIA.
Una casa di gretta apparenza, con le muraglie dipinte del colore del
tempo e scalcinate, con un ballatoio alla lunga a ciascuno de' suoi
due piani e un'ampia gronda tarlata che si versa all'infuori, come la
tesa d'un cappellaccio su la fronte d'un pitocco, guarda su d'una
rimota piazzetta, in una parte lontana della città, presso a uno de'
nostri abbandonati terraggi. Da un fianco, il murello d'un'ortaglia che
fa gomito nell'attiguo chiassuolo, dall'altro una casipola lunga, bassa,
bucata d'usci e finestre come un crivello, angusto ricovero di povera
gente; e vicino, una vecchia siepe su d'un ciglione di terra, che
risponde a una strada fangosa, bistorta, orlata d'un fossato. V'ha
ancora pochi angoli della nostra bella e ringiovenita Milano, i quali
presentino un aspetto così malandato e tristo, da parer veramente la
casa delle streghe; e chi si volesse pigliar lo spasso di cercare quel
gruppo d'abituri ch'io descrivo, non aspetti al domani; perchè forse,
dov'è la casa del signor Cipriano, troverà un bel palazzetto dalla
fronte allegra e linda e dalle gelosie verdi, e in vece del rozzo
casamento da vicini col marcio fossato al piede, si vedrà sorgere
dirimpetto una fabbrica bianca, recente, di cinque piani, da far
invidia a chiunque abbia due spanne di terra al sole.
Il signor Cipriano era un antico fabbricatore di cioccolatte, il quale,
avendo avanzate di buone migliaia di scudi, e non volendo morir sul
mestiero, chiusa bottega, si ritirò a goder negli ultimi anni il frutto
de' suoi sudori in santa libertà. Egli aveva dunque comperato quella
casa a mezzo prezzo; ma poich'era assai taccagno e aveva spesa
sempre la sua lira per venti soldi almeno, si ridusse a menar grama
vita in quella topaia cadente, dove una volta aveva sognato di far il
signorone. E parevagli di toccare il cielo col dito, allorchè sdraiato su
d'una panchetta accanto al fuoco, col fido suo fiaschetto di vin
d'Ossona al fianco, ruminava, tra l'una e l'altra mezzina, il conto
degl'interessi de' suoi capitali, all'uno o al due per cento il mese.
Quand'egli attraversava la piazzetta per entrar nella sua porta,
andava tronfio, a lento passo, con le mani intrecciate sotto la
schiena; e, levando il grosso ventre e il naso bernoccoluto, sbirciava
su per le finestre e pe' terrazzini le più tonde e frescoccie comari del
contorno: tutti lo conoscevano, e gli facevan di cappello, quasi al
bassà del quartiere; perchè tutti supponevano che tenesse un bel
morto sepolto in cantina.
Dal primo all'ultimo de' sessant'anni, a cui toccava allora, egli era
stato schivo sempre d'ogni molestia e d'ogni cura; e se non volle mai
tor moglie, fu per non avere il pensiero de' figliuoli e l'impaccio della
donna, ch'egli soleva chiamare la più spallata mercanzia del mondo.
Ma, poco tempo prima, s'era condotte in casa la signora Barbara,
sua sorella, vedova d'un fallito, e la Savina figlia di lei, che sole di
tutti i parenti gli eran rimaste, e che s'accontentarono di governare
la casa e pagar la pigione; perchè l'idea di fare un dì o l'altro una
grossa eredità era l'áncora della loro speranza. In casa però, il signor
Cipriano aveva sempre tenuta la mestola a suo modo; e ben se lo
sapeva quello zotico baccellone di Michele, ch'era l'unico famiglio,
quando il padrone, dotato d'una memoria spilorcia da far fremere, gli
faceva dar conto ogni dì, della croce dell'ultimo quattrino.
ROSA.