100% found this document useful (1 vote)
9 views

Data Management in the Semantic Web 1st Edition Hal Jin 2024 Scribd Download

The document provides information about the ebook 'Data Management in the Semantic Web' edited by Hal Jin, which discusses the importance of effective data management in the context of the Semantic Web. It includes various chapters covering topics such as web crawling, data representation, and knowledge sharing. Additionally, it offers links to download the ebook and related titles on the Semantic Web from the website ebookfinal.com.

Uploaded by

basilachryso
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
9 views

Data Management in the Semantic Web 1st Edition Hal Jin 2024 Scribd Download

The document provides information about the ebook 'Data Management in the Semantic Web' edited by Hal Jin, which discusses the importance of effective data management in the context of the Semantic Web. It includes various chapters covering topics such as web crawling, data representation, and knowledge sharing. Additionally, it offers links to download the ebook and related titles on the Semantic Web from the website ebookfinal.com.

Uploaded by

basilachryso
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Visit https://ebookfinal.

com to download the full version and


explore more ebooks

Data Management in the Semantic Web 1st Edition


Hal Jin

_____ Click the link below to download _____


https://ebookfinal.com/download/data-management-in-the-
semantic-web-1st-edition-hal-jin/

Explore and download more ebooks at ebookfinal.com


Here are some suggested products you might be interested in.
Click the link to download

Programming the Semantic Web 1st Edition Toby Segaran

https://ebookfinal.com/download/programming-the-semantic-web-1st-
edition-toby-segaran/

Knowledge representation in the social Semantic Web 1st


Edition Katrin Weller

https://ebookfinal.com/download/knowledge-representation-in-the-
social-semantic-web-1st-edition-katrin-weller/

Annotation for the semantic web 1st Edition S. Handschuh

https://ebookfinal.com/download/annotation-for-the-semantic-web-1st-
edition-s-handschuh/

The Semantic Web Crafting Infrastructure for Agency 1st


Edition Leuf

https://ebookfinal.com/download/the-semantic-web-crafting-
infrastructure-for-agency-1st-edition-leuf/
The Semantic Web crafting infrastructure for agency 1st
Edition Bo Leuf

https://ebookfinal.com/download/the-semantic-web-crafting-
infrastructure-for-agency-1st-edition-bo-leuf/

Bibliographic Information Organization in the Semantic Web


1st Edition Mirna Willer And Gordon Dunsire (Auth.)

https://ebookfinal.com/download/bibliographic-information-
organization-in-the-semantic-web-1st-edition-mirna-willer-and-gordon-
dunsire-auth/

A Semantic Web Primer Cooperative Information Systems


Grigoris Antoniou

https://ebookfinal.com/download/a-semantic-web-primer-cooperative-
information-systems-grigoris-antoniou-2/

A Semantic Web Primer Cooperative Information Systems


Grigoris Antoniou

https://ebookfinal.com/download/a-semantic-web-primer-cooperative-
information-systems-grigoris-antoniou/

Data Mining the Web Uncovering Patterns in Web Content


Structure and Usage 1st Edition Zdravko Markov

https://ebookfinal.com/download/data-mining-the-web-uncovering-
patterns-in-web-content-structure-and-usage-1st-edition-zdravko-
markov/
Data Management in the Semantic Web 1st Edition Hal
Jin Digital Instant Download
Author(s): Hal Jin
ISBN(s): 9781613247600, 1613247605
Edition: 1
File Details: PDF, 12.80 MB
Year: 2011
Language: english
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
DISTRIBUTED, CLUSTER AND GRID COMPUTING

DATA MANAGEMENT
IN THE SEMANTIC WEB
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained
Data Management in the herein.
Semantic Web, NovaThis digital
Science document
Publishers, is sold with
Incorporated, 2011.the clear Ebook
ProQuest understanding
Central, that the publisher is not engaged in
DISTRIBUTED, CLUSTER AND
GRID COMPUTING
Yi Pan (Series Editor)
(Georgia State University, GA, U. S.)

Advanced Parallel and Distributed Computing:


Evaluation, Improvement and Practice
Yuan-Shun Dai, Yi Pan and Rajeev Raje (Editors)
2006. ISBN: 1-60021-202-6

Parallel and Distributed Systems: Evaluation and Improvement


Yuan-Shun Dai, Yi Pan and Rajeev Raje (Editors)
2006. ISBN: 1-60021-276-X

Performance Evaluation of Parallel, Distributed


and Emergent Systems.
Mohamed Ould-Khaoua and Geyong Min (Editors)
2007. ISBN: 1-59454-817-X

From Problem Toward Solution: Wireless Sensor Networks Security


Zhen Jiang and Yi Pan (Editors)
2009. ISBN: 978-1-60456-457-0

Congestion Control in Computer Networks: Theory,


Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Protocols and Applications


Jianxin Wang
2010. ISBN: 978-1-61728-698-8

Data Management in the Semantic Web


Hal Jin, Hanhua Chen and Zehua Lv
(Editors)
2011. ISBN: 978-1-61122-862-5

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
DISTRIBUTED, CLUSTER AND GRID COMPUTING

DATA MANAGEMENT
IN THE SEMANTIC WEB

HAL JIN,
HANHUA CHEN
AND
ZEHUA LV
EDITORS
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

New York

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Copyright © 2012 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system or
transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical
photocopying, recording or otherwise without the written permission of the Publisher.

For permission to use material from this book please contact us:
Telephone 631-231-7269; Fax 631-231-8175
Web Site: http://www.novapublishers.com

NOTICE TO THE READER


The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or
implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of
information contained in this book. The Publisher shall not be liable for any special,
consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or
reliance upon, this material. Any parts of this book based on government reports are so indicated
and copyright is claimed for those parts to the extent applicable to compilations of such works.

Independent verification should be sought for any data, advice or recommendations contained in
this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage
to persons or property arising from any methods, products, instructions, ideas or otherwise
contained in this publication.

This publication is designed to provide accurate and authoritative information with regard to the
subject matter covered herein. It is sold with the clear understanding that the Publisher is not
engaged in rendering legal or any other professional services. If legal or any other expert
assistance is required, the services of a competent person should be sought. FROM A
DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.

Additional color graphics may be available in the e-book version of this book.

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA

Data management in the semantic web / editor, Hal Jin.


p. cm.
Includes bibliographical references and index.
ISBN  H%RRN
1. Web databases. 2. Internet searching. 3. Semantic Web. I. Jin, Hal.
QA76.9.W43D38 2011
025.042'7--dc22
2010048369

Published by Nova Science Publishers, Inc. † New York

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
CONTENTS

Preface vii
Chapter 1 Interpretations of the Web of Data 1
Marko A. Rodriguez
Chapter 2 Toward Semantics-Aware Web Crawling 39
Lefteris Kozanidis, Sofia Stamou and Vasilis Megalooikonomou
Chapter 3 A Semantic Tree Representation for Document Categorization 59
with a Composite Kernel
Sujeevan Aseervatham and Younès Bennani
Chapter 4 Ontology Reuse -- Is It Feasible? 83
Elena Simperl and Tobias Bürger
Chapter 5 Computational Logic and Knowledge Representation Issues in 107
Data Analysis for the Semantic Web
J. Antonio Alonso-Jiménez, Joaquín Borrego-Díaz,
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Antonia M. Chávez-González and F. Jesús Martín-Mateos


Chapter 6 Applying Semantic Web Technologies to Biological Data 133
Integration and Visualization
Claude Pasquier
Chapter 7 An Ontology and Peer-to-Peer Based Data and Service Unified 155
Discovery System
Ying Zhang, Houkuan Huang and Youli Qu
Chapter 8 The Design and Development of a Semantic Environment for 175
Holistic eGovernment Services
Luis Álvarez Sabucedo, Luis Anido Rifón, Rubén Míguez Pérez
and Juan Santos Gago
Chapter 9 Semantic Topic Modeling and Its Application in Bioinformatics 199
B. Zheng and X. Lu

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
vi Contents

Chapter 10 Supporting a User in His Annotation and Browsing 221


Activities in Folksonomies
G. Barillà, P. De Meo, G. Quattrone and D. Ursino
Chapter 11 Data Management in Sensor Networks using Semantic Web 259
Technologies
Anastasios Zafeiropoulos, Dimitrios-Emmanuel Spanos,
Stamatios Arkoulis, Nikolaos Konstantinou and Nikolas Mitrou
Chapter 12 Chinese Semantic Dependency Analysis 283
Jiajun Yan and David B. Bracewell
Chapter 13 Creating Personal Content Management Systems using Semantic 301
Web Technologies
Chris Poppe, Gaëtan Martens, Erik Mannens
and Rik Van de Walle
Chapter 14 A Hybrid Data Layer to Utilize Open Content 329
for Higher-Layered Applications
M. Steinberg and J. Brehm
Chapter 15 Using Semantics Equivalences for Mrl Queries Rewriting 353
in Multi-data Source Fusion Systems
Gilles Nachouki and Mohamed Quafafou
Chapter 16 3D Star coordinate-based Visualization 391
of Relation Clusters from OWL Ontologies
Cartik R. Kothari, Jahangheer S. Shaik, David J. Russomanno,
and M. Yeasin
Chapter 17 Annotating Semantics of Multidisciplinary Engineering Resources 409
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

to Support Knowledge Sharing in Sustainable Manufacturing


Q. Z. Yang and X. Y. Zhang
Index 437

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
PREFACE

Effective and efficient data management is vital to today’s applications. Traditional data
management mainly focuses on information procession involving data within a single
organization. Data are unified according to the same schema and there exists an agreement
between the interacting units as to the correct mapping between these concepts. Nowadays,
data management systems have to handle a variety of data sources, from proprietary ones to
data publicly available. Investigating the relevance between data for information sharing has
become an essential challenge for data management. This book explores the technology and
application of semantic data management by bringing together various research studies in
different subfields.
As discussed in Chapter 1, the emerging Web of Data utilizes the web infrastructure to
represent and interrelate data. The foundational standards of the Web of Data include the
Uniform Resource Identifier (URI) and the Resource Description Framework (RDF). URIs
are used to identify resources and RDF is used to relate resources. While RDF has been
posited as a logic language designed specifically for knowledge representation and reasoning,
it is more generally useful if it can conveniently support other models of computing. In order
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

to realize the Web of Data as a general-purpose medium for storing and processing the
world's data, it is necessary to separate RDF from its logic language legacy and frame it
simply as a data model. Moreover, there is significant advantage in seeing the Semantic Web
as a particular interpretation of the Web of Data that is focused specifically on knowledge
representation and reasoning. By doing so, other interpretations of the Web of Data are
exposed that realize RDF in different capacities and in support of different computing models.
The rapid growth of the web imposes scaling challenges to general-purpose web crawlers
that attempt to download plentiful web pages so that these are made available to the search
engine users. Perhaps the greatest challenge associated with harvesting the web content is
how to ensure that the crawlers will not waste resources trying to download pages that are of
no or little interest to web users. One way to go about downloading useful web data is to build
crawlers that can optimize the priority of the unvisited URLs so that pages of interest are
downloaded earlier. In this respect, many attempts have been proposed towards focusing web
crawls on topic-specific content. In Chapter 2, the authors build upon existing studies and
they introduce a novel focused crawling approach that relies on the web pages’ semantic
orientation in order to determine their crawling priority. The contribution of the
authors’approach lies on the fact that the authors integrate a topical ontology and a passage
extraction algorithm into a common framework against which the crawler is trained, so as to

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
viii Hal Jin, Hanhua Chen and Zehua Lv

be able to detect pages of interest and determine their crawling priority. The evaluation of the
authors’proposed approach demonstrates that semantics-aware focused crawling yields both
accurate and complete web crawls.
The semi-structured document format, such as the XML format, is used in data-
management to efficiently structure, store and share the information between different users.
Although the information can efficiently be accessed within a semi-structured document,
automatically retrieving the relevant information from a corpus still remains a complex
problem, especially when the documents are semi-structured text documents. To tackle this,
the corpus can be partitioned according to the content of each document in order to make the
search efficient. In document categorization, a predefined partition is given and the problem
is to automatically assign the documents of the corpus to the relevant categories. The quality
of the categorization highly depends on the data representation and on the similarity measure,
especially when dealing with complex data such as natural language text. In Chapter 3, the
authors present a semantic tree to semantically represent an XML text document and the
authors propose a semantic kernel, which can be used with the semantic tree, to compute a
similarity measure. The semantic meanings of words are extracted using an ontology. The
authors use a text categorization problem in the biomedical field to illustrate the
authors’method. The UMLS framework is used to extract the semantic information related to
the biomedical field. The authors have applied this kernel with a SVM classifier to a real-
world medical free-text categorization problem. The results have shown that the
authors’method outperforms other common methods such as the linear kernel.
To understand the reuse process, the authors have analyzed the feasibility of ontology
reuse based on which they discuss its economic aspects. In analogy to methods in the field of
software engineering the authors relate costs of ontology development to the level of ontology
reuse and to the costs of developing and maintaining reusable ontologies. Subsequently the
authors propose a cost model focusing on activities in ontology reuse whose goal is to support
a trade-off analysis of reusable ontology development costs vs. the costs of the development
from scratch. The research leading to this model aims to address the following questions:
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

1. Which factors influence the costs for reusing ontologies?


2. How can the benefits of ontology reuse be shown in monetary terms?
3. Which general statement can be made about the financial feasibility of ontology
reuse in specific settings?
In order to answer these questions the authors elaborate on an extension of the
ONTOCOM model for cost estimation of ontologies for reuse. The authors’aim is to isolate
the costs of reuse of different artifacts on different levels of the ontology engineering process,
that is, reuse on the requirements, the conceptual, or the implementation level. Secondly,
following work by others on economic models for software reuse, the authors intend to show
the monetary value of ontology reuse in different reuse scenarios on these levels.
The remainder of Chapter 4 is organized as follows: Section 2. gives an overview of
ontology reuse including the reuse process, the current state of practice, and existing
methodologies for ontology reuse. Section 3. analyzes economic aspects of ontology
development and reuse, presents an economic analysis, and a cost estimation model whose
aim is to predict ontology reuse costs. Finally, Section 4. concludes the chapter.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Preface ix

In Chapter 5, the relationships between the logical and representational features on


ontologies are analysed. Through several questions, this chapter restates the use of
Computational logic in Ontology Engineering.
Current research in biology heavily depends on the availability and efficient use of
information. In order to build new knowledge, various sources of biological data must often
be combined. Semantic Web technologies, which provide a common framework allowing
data to be shared and reused between applications, can be applied to the management of
disseminated biological data. However, due to some specificities of biological data, applying
these technologies to life science is a real challenge. Chapter 6 shows that current Semantic
Web technologies start to become mature and can be used to develop large applications.
However, in order to get the best from these technologies, improvements are needed both at
the level of tool performance and knowledge modeling.
The next generation Internet has the potential ability to be a ubiquitous and pervasive
medium communication carrier for all types of information. The World Wide Web is
emerging with a broader variety of resources that include both data and services. Yet, a lot of
research work focused on either service discovery or data discovery although they cannot be
separated from each other. In addition, the current Network, due to its decentralized nature
and weak support for semantic, is still chaotic and lacks of the ability to allow users to
discover, extract and integrate information of interest from heterogeneous sources.
In Chapter 7, the authors present a scalable, high performance system for data and service
unified discovery, and to increase the success rate, an ontology-based approach is used to
describe data and services. As for service, the authors add quality of service (QoS)
information to OWL-S files to get more accurate results for users. Moreover, the authors also
bring JXTA, which is a suitable foundation to build future computer systems on, to the
authors’system.
Most nations in the world are currently committed to provide advanced services for their
citizens. The development of a software support capable of meeting the needs for this context
requires to engage in a long term bet. The success for this kind of projects is deeply related to
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

the provision of a solid back office support for services and the use of open technologies for
the interconnection and interchange of information. In Chapter 8, it is presented a formal
description of the business model involved in the provision of such solution using semantics
as a tool to share this conceptualization. This characterization of the problem to be solved is
derived from artifacts identified in the domain and described in relation with the interaction
required to fulfill the citizen's needs. On the top of this description, an entire software
platform is described. Also, some useful conclusions are presented for its consideration on
further projects.
In Chapter 9, the authors discuss utility of semantic topic modeling with the latent
Dirichlet allocation model (LDA) and its application in bioinformatics domain. Through
capturing the statistical structure of word usage patterns, LDA is capable of identifying
semantic topics from a collection of text documents in an unsupervised manner. The authors
show that semantic topic modeling with LDA can be used to automatically identify biological
concepts from corpora of biomedical literature, thus providing more concise representation of
the biomedical knowledge. The authors further demonstrate that representing text documents
in semantic topic space facilitates classification of text documents. Finally, the authors show
that connecting proteins in the semantic topic space enables efficient evaluation of the
functional coherence of a group of proteins.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
x Hal Jin, Hanhua Chen and Zehua Lv

In Chapter 10 the authors present a new approach to supporting users to annotate and
browse resources referred by a folksonomy. The authors’approach proposes two hierarchical
structures and two related algorithms to arrange groups of semantically related tags in a
hierarchy; this allows users to visualize tags of their interests according to desired semantic
granularities and, then, helps them to find those tags best expressing their information needs.
In this chapter the authors first illustrate the technical characteristics of the authors’approach;
then the authors describe the prototype implementing it; after this, they illustrate various
experiments allowing its performance to be tested; finally, the authors compare it with other
related approaches already proposed in the literature.
The increasing availability of small-size sensor devices during the last few years and the
large amount of data that they generate has led to the necessity for more efficient methods
regarding data management. In Chapter 11, the authors review the techniques that are being
used for data gathering and information management in sensor networks and the advantages
that are provided through the proliferation of Semantic Web technologies. The authors
present the current trends in the field of data management in sensor networks and propose a
three-layer flexible architecture which intends to help developers as well as end users to take
advantage of the full potential that modern sensor networks can offer. This architecture deals
with issues regarding data aggregation, data enrichment and finally, data management and
querying using Semantic Web technologies. Semantics are used in order to extract
meaningful information from the sensor’s raw data and thus facilitate smart applications
development over large-scale sensor networks.
The first and overwhelmingly major challenge of the Semantic Web is annotating
semantic information in text. Semantic analysis is often used to combat this problem by
automatically creating the semantic metadata that is needed. However, semantic analysis has
been proven difficult to get ideal results, because of two controversial problems; semantic
scheme and classification. Chapter 12 presents an answer to these two problems. For semantic
scheme, semantic dependency is chosen and for classification a number of machine learning
approaches are examined and compared. Semantic dependency is chosen as it gives a deeper
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

structure and better describes the richness of semantics in natural language. The classification
approaches encompass standard machine learning algorithms, such as Naive Bayes, Decision
Tree and Maximum Entropy, as well as multiple classification and rule-based correction
approaches. The best results receive a state-of-the-art accuracy of 85.1%. In addition, an
integrated system called SEEN (Semantic dEpendency parsEr for chiNese) is introduced,
which combines research presented in this chapter as well as segmentation, part-of-speech,
and syntactic parsing modules that are freely available from other researchers.
The amount of multimedia resources that is created and needs to be managed is
increasing considerably. Additionally, a significant increase of metadata, either structured
(metadata fields of standardized metadata formats) or unstructured (free tagging or
annotations) is noticed. This increasing amount of data and metadata, combined with the
substantial diversity in terms of used metadata fields and constructs, results in severe
problems to manage and retrieve these multimedia resources. Standardized metadata schemes
can be used but the plethora of these schemes results in interoperability issues. In Chapter 13,
we propose a metadata model suited for personal content management systems. The authors
create a layered metadata service that implements the presented model as an upper layer and
combines different metadata schemes in the lower layers. Semantic web technologies are used
to define and link formal representations of these schemes. Specifically, the authors create an

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Preface xi

ontology for the DIG35 metadata standard and elaborate on how it is used within this
metadata service. To illustrate the service, the authors present a representative use case
scenario consisting of the upload, annotation, and retrieval of multimedia content within a
personal content management system.
Increasing heterogeneous Open Content is an ongoing trend in the current Social
Semantic Web (S2W). Generic concepts and how-tos for higher-layered reuse of the arbitrary
information overload for interactive knowledge transfer and learning - mentioning the
Internet of Services (IoS) - are not covered very well yet. For further directed use of
distributed services and sources, inquiry, interlinking, analysis, machine- and human-
interpretable representation is as essential as lightweight user-oriented interoperation and
competence in handling. In the following, the authors introduce the qKAI application
framework (qualifying Knowledge Acquisition and Inquiry) [3]- a service-oriented, generic
and hybrid approach combining knowledge related offers for convenient reuse. qKAI aims at
closing some residual gaps between the “sophisticated” Semantic Web and “hands-on” Web
2.0 enabling loose-coupled knowledge and information services focused on knowledge life
cycles, learning aspects and handy user interaction. Accomplishing user interoperation and
standardized web techniques is a promising mixture to build a next generation of web
applications. The focus of Chapter 14 lies on the qKAI data layer as part of the application
framework and basic prerequisite to build user interaction scenarios on top of it. The qKAI
data layer utilizes available, distributed semantic data sets in a practically manner using an
affordable Quadcore hardware platform and preselected data dumps. Overall, the authors
boost Open Content as an inherent part of higher-layered, lightweight applications in
knowledge and information transfer via standard tasks of knowledge engineering and
augmented user interaction. Beyond giving an overview of research background and
periphery assumption, this chapter introduces the hybrid data concept - a minimalistic data
model with maximized depth - implementation results and lessoned learned. The authors
discuss the Semantic Web query language SPARQL and the Resource Description Format
(RDF) critically to enlighten their limitations in current web application practice. qKAI
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

implements search space restriction (Points of Interests) by smart enabled RDF representation
and restriction to SQL as internally used query language. Acquiring resources and
discovering the Web of Data is a massively multithreading part of the qKAI application
framework that serves as basis for further knowledge based tasks.
Classical approaches of data integration, based on schemas mediation, are not suitable for
the World Wide Web (WWW) environment where data is frequently modified or deleted.
This chapter describes a new approach of heterogeneous data source fusion called Multi-data
source Fusion Approach (MFA). Data sources are either static or active: static data sources
can be structured or semi-structured, whereas active sources are services. The aim of MFA is
to facilitate data sources fusion in dynamic contexts such as the Web. The authors introduce
an XML-based Multi-data source Fusion Language (MFL). MFL provides two sub-
languages: the Multi-data source Definition Language (MDL) - used to define the multi-data
source - and the Multi-data source Retrieval Language (MRL) - that aims to retrieve
conflicting data from multiple data sources.
In Chapter 15, the authors also study how to reconciliate semantically data sources. This
study is based on OWL/RDF technologies. The authors’ main objective is to combine data
sources with a minimal effort required from the user. This objective is crucial because, in the
authors’context, the authors suppose that the user is not an expert in the domain of data

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
xii Hal Jin, Hanhua Chen and Zehua Lv

fusion, but he/she understands the meaning of data being integrated. The results of semantic
reconciliation between data sources are used to improve the problem of rewriting MRL
semantic queries into a set of equivalent sub-queries over the data sources. The authors show
the design of the Multi-Data Source Management System called MDSManager. Finally, the
authors give an evaluation of the authors’ MRL language. The results show that our language
improves significantly the XQuery language especially considering its expressiveness power
and its performances.
The recent proliferation of high-level and domain-specific ontologies has necessitated the
development of prudent integration strategies. Visualization techniques are an important tool
to support the data and knowledge integration initiative. Chapter 16 reviews a methodology to
visualize clusters of relations from ontologies specified using the Web Ontology Language
(OWL). The relations, which in OWL are referred to as object properties, from various
ontologies are organized into clusters based upon their intrinsic semantics. The intrinsic
semantics of every relation from an input ontology is explicitly specified by a framework of
32 common elements; each element captures a specific aspect of the relationship between a
relation’s domain and range. Using this framework, each relation can be represented in a 32-
dimensional “relation space.” Relation clusters in 32 dimensions are projected to 3
dimensions using an automated 3-dimensional (3D) star coordinate-based visualization
technique. Results from applying an algorithm to create and subsequently visualize relation
clusters formed from the IEEE Suggested Upper Merged Ontology (SUMO) are presented in
this chapter and discussed in the context of their potential utility for knowledge reuse and
interoperability on the Semantic Web.
Semantic annotation of digital engineering resources is attributed as an enabling
technology for knowledge sharing in sustainable manufacturing, where the economic,
environmental and social objectives are incorporated into technical solutions to achieve
competitive product advantages. The emerging needs for sustainability require seamless
sharing of product lifecycle information and machine-understandable semantics across design
and manufacturing networks. Towards this end, Chapter 17 proposes an ontology-driven
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

approach to semantic annotation of multidisciplinary engineering resources. The targeted


application is for product knowledge sharing in sustainable manufacturing of consumer
electronics. The proposed approach is based on two mechanisms: 1) ontology modeling –
how to specify the meaning of annotations with ontological representations to enhance
sharability of the annotation content; and 2) ontology implementation – how to effectively
apply the meaning of annotations to heterogeneous computer-aided tools and lifecycle
processes on the basis of ontology. These semantic knowledge sharing mechanisms are
validated through annotation scenarios. Two use scenarios are illustrated in the chapter for
semantics sharing based on annotations in sustainable manufacturing of consumer products.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
In: Data Management in the Semantic Web ISBN 978-1-61122-862-5
Editors: Hai Jin, et al. pp. 1-38 c 2011 Nova Science Publishers, Inc.

Chapter 1

I NTERPRETATIONS OF THE W EB OF D ATA


Marko A. Rodriguez∗
T-5, Center for Nonlinear Studies
Los Alamos National Laboratory
Los Alamos, New Mexico

Abstract
The emerging Web of Data utilizes the web infrastructure to represent and interre-
late data. The foundational standards of the Web of Data include the Uniform Resource
Identifier (URI) and the Resource Description Framework (RDF). URIs are used to
identify resources and RDF is used to relate resources. While RDF has been posited
as a logic language designed specifically for knowledge representation and reasoning,
it is more generally useful if it can conveniently support other models of computing. In
order to realize the Web of Data as a general-purpose medium for storing and process-
ing the world’s data, it is necessary to separate RDF from its logic language legacy and
frame it simply as a data model. Moreover, there is significant advantage in seeing the
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Semantic Web as a particular interpretation of the Web of Data that is focused specif-
ically on knowledge representation and reasoning. By doing so, other interpretations
of the Web of Data are exposed that realize RDF in different capacities and in support
of different computing models.

1 Introduction
The common conception of the World Wide Web is that of a large-scale, distributed file
repository [6]. The typical files found on the World Wide Web are Hyper-Text Markup
Language (HTML) documents and other media such as image, video, and audio files. The
“World Wide” aspect of the World Wide Web pertains to the fact that all of these files have
an accessible location that is denoted by a Uniform Resource Locator (URL) [56]; a URL
denotes what physical machine is hosting the file (i.e. what domain name/IP address), where
in that physical machine the file is located (i.e. what directory), and finally, which protocol
to use to retrieve that file from that machine (e.g. http, ftp, etc.). The “Web” aspect of the
World Wide Web pertains to the fact that a file (typically an HTML document) can make
∗ E-mail address: marko@lanl.gov

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
2 Marko A. Rodriguez

reference (typically an href citation) to another file. In this way, a file on machine A can link
to a file on machine B and in doing so, a network/graph/web of files emerges. The ingenuity
of the World Wide Web is that it combines remote file access protocols and hypermedia and
as such, has fostered a revolution in the way in which information is disseminated and
retrieved—in an open, distributed manner. From this relatively simple foundation, a rich
variety of uses emerges: from the homepage, to the blog, to the online store.
The World Wide Web is primarily for human consumption. While HTML documents
are structured according to a machine understandable syntax, the content of the documents
are written in human readable/writable language (i.e. natural human language). It is only
through computationally expensive and relatively inaccurate text analysis algorithms that a
machine can determine the meaning of such documents. For this reason, computationally
inexpensive keyword extraction and keyword-based search engines are the most prevalent
means by which the World Wide Web is machine processed. However, the human-readable
World Wide Web is evolving to support a machine-readable Web of Data. The emerging
Web of Data utilizes the same referencing paradigm as the World Wide Web, but instead of
being focused primarily on URLs and files, it is focused on Uniform Resource Identifiers
(URI) [7] and data.1 The “Data” aspect of the Web of Data pertains to the fact that a URI
can denote anything that can be assigned an identifier: a physical entity, a virtual entity,
an abstract concept, etc. The “Web” aspect of the Web of Data pertains to the fact that
identified resources can be related to other resources by means of the Resource Description
Framework (RDF). Among other things, RDF is an abstract data model that specifies the
syntactic rules by which resources are connected. If U is the set of all URIs, B the set of all
blank or anonymous nodes, and L the set of all literals, then the Web of Data is defined as

W ⊆ ((U ∪ B) ×U × (U ∪ B ∪ L)).
A single statement (or triple) in W is denoted (s, p, o), where s is called the subject, p the
predicate, and o the object. On the Web of Data
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

“[any man or machine can] start with one data source and then move through
a potentially endless Web of data sources connected by RDF links. Just as
the traditional document Web can be crawled by following hypertext links, the
Web of Data can be crawled by following RDF links. Working on the crawled
data, search engines can provide sophisticated query capabilities, similar to
those provided by conventional relational databases. Because the query results
themselves are structured data, not just links to HTML pages, they can be im-
mediately processed, thus enabling a new class of applications based on the
Web of Data.” [9]

As a data model, RDF can conveniently represent commonly used data structures. From
the knowledge representation and reasoning perspective, RDF provides the means to make
assertions about the world and infer new statements given existing statements. From the
network/graph analysis perspective, RDF supports the representation of various network
data structures. From the programming and systems engineering perspective, RDF can be
used to encode objects, instructions, stacks, etc. The Web of Data, with its general-purpose
1 The URI is the parent class of both the URL and the Uniform Resource Name (URN) [56].

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 3

data model and supporting technological infrastructure, provides various computing models
a shared, global, distributed space. Unfortunately, this general-purpose, multi-model vision
was not the original intention of the designers of RDF. RDF was created for the domain of
knowledge representation and reasoning. Moreover, it caters to a particular monotonic sub-
set of this domain [29]. RDF is not generally understood as supporting different computing
models. However, if the Web of Data is to be used as just that, a “web of data,” then it is up
to the applications leveraging this data to interpret what that data means and what it can be
used for.
The URI address space is an address space. It is analogous, in many ways, to the
address space that exists in the local memory of the physical machines that support the
representation of the Web of Data. With physical memory, information is contained at an
address. For a 64-bit machine, that information is a 64-bit word. That 64-bit word can
be interpreted as a literal primitive (e.g. a byte, an integer, a floating point value) or yet
another 64-bit address (i.e. a pointer). This is how address locations denote data and link
to each other, respectively. On the Web of Data, a URI is simply an address as it does
not contain content.2 It is through RDF that a URI address has content. For instance,
with RDF, a URI can reference a literal (i.e. xsd:byte, xsd:integer, xsd:float) or
another URI. Thus, RDF, as a data model, has many similarities to typical local memory.
However, the benefit of URIs and RDF is that they create an inherently distributed and
theoretically infinite space. Thus, the Web of Data can be interpreted as a large-scale,
distributed memory structure. What is encoded and processed in that memory structure
should not be dictated at the level of RDF, but instead dictated by the domains that leverage
this medium for various application scenarios. The Web of Data should be realized as an
application agnostic memory structure that supports a rich variety of uses: from Semantic
Web reasoning, to Giant Global Graph analysis, to Web of Objects execution.
The intention of this article is to create a conceptual splinter that separates RDF from
its legacy use as a logic language and demonstrate that it is more generally applicable when
realized as only a data model. In this way, RDF as the foundational standard for the Web
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

of Data makes the Web of Data useful to anyone wishing to represent information and
compute in a global, distributed space. Three specific interpretations of the Web of Data are
presented in order to elucidate the many ways in which the Web of Data is currently being
used. Moreover, within these different presentations, various standards and technologies
are discussed. These presentations are provided as summaries, not full descriptions. In
short, this article is more of a survey of a very large and multi-domained landscape. The
three interpretations that will be discussed are enumerated below.

1. The Web of Data as a knowledge base (see §2).

• The Semantic Web is an interpretation of the Web of Data.


• RDF is the means by which a model of a world is created.
• There are many types of logic: logics of truth and logics of thought.
• Scalable solutions exist for reasoning on the Web of Data.
2 This is not completely true. Given that a URL is a subtype of a URI, and a URL can “contain” a file, it is
possible for a URI to “contain” information.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
4 Marko A. Rodriguez

2. The Web of Data as a multi-relational network (see §3).

• The Giant Global Graph is an interpretation of the Web of Data.3


• RDF is the means by which vertices are connected together by labeled edges.
• Single-relational network analysis algorithms can be applied to multi-relational
networks.
• Scalable solutions exist for network analysis on the Web of Data.
3. The Web of Data as an object repository (see §4).

• The Web of Objects is an interpretation of the Web of Data.


• RDF is the means by which objects are represented and related to other objects.
• An object’s representation can include both its fields and its methods.
• Scalable solutions exist for object-oriented computing on the Web of Data.

The landscape presented in this article is by no means complete and only provides a glimpse
into these different areas. Moreover, within each of these three presented interpretations,
applications and use-cases are not provided. What is provided is a presentation of com-
mon computing models that have been mapped to the Web of Data in order to take unique
advantage of the Web as a computing infrastructure.

2 A Distributed Knowledge Base


The Web of Data can be interpreted as a distributed knowledge base—a Semantic Web. A
knowledge base is composed of a set of statements about some “world.” These statements
are written in some language. Inference rules designed for that language can be used to
derive new statements from existing statements. In other words, inference rules can be used
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

to make explicit what is implicit. This process is called reasoning. The Semantic Web
initiative is primarily concerned with this interpretation of the Web of Data.

“For the Semantic Web to function, computers must have access to structured
collections of information and sets of inference rules that they can use to con-
duct automated reasoning.” [8]

Currently, the Semantic Web interpretation of the Web of Data forces strict semantics on
RDF. That is, RDF is not simply a data model, but a logic language. As a data model, it
specifies how a statement τ is constructed (i.e. τ ∈ ((U ∪ B) ×U × (U ∪ B ∪ L))). As a logic
language is species specific language constructs and semantics—a way of interpreting what
statements mean. Because RDF was developed in concert with requirements provided by
the knowledge representation and reasoning community, RDF and the Semantic Web have
been very strongly aligned for many years. This is perhaps the largest conceptual stronghold
that exists as various W3C documents make this point explicit.
3 The term “Giant Global Graph” was popularized by Tim Berners-Lee on his personal blog at this URL

http://dig.csail.mit.edu/breadcrumbs/node/215.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 5

“RDF is an assertional logic, in which each triple expresses a simple propo-


sition. This imposes a fairly strict monotonic discipline on the language, so
that it cannot express closed-world assumptions, local default preferences, and
several other commonly used non-monotonic constructs.” [29]

RDF is monotonic in that any asserted statement τ ∈ W can not be made “false” by future
assertions. In other words, the truth-value of a statement, once stated, does not change.
RDF makes use of the open-world assumption in that if a statement is not asserted, this
does not entail that it is “false.” The open-world assumption is contrasted to the closed-
world assumption found in many systems, where the lack of data is usually interpreted as
that data being “false.”
From this semantic foundation, extended semantics for RDF have been defined. The
two most prevalent language extensions are the RDF Schema (RDFS) [14] and the Web
Ontology Language (OWL) [39]. It is perhaps this stack of standards that forms the most
common conception of what the Semantic Web is. However, if the Semantic Web is to be
just that, a “semantic web,” then there should be a way to represent other languages with
different semantics. If RDF is forced to be a monotonic, open-world language, then this im-
mediately pigeonholes what can be represented on the Semantic Web. If RDF is interpreted
strictly as a data model, devoid of semantics, then any other knowledge representation lan-
guage can be represented in RDF and thus, contribute to the Semantic Web. This section
will discuss three logic languages: RDFS, OWL, and the Non-Axiomatic Logic (NAL) [59].
RDFS and OWL are generally understood in the Semantic Web community as these are the
primary logic languages used. However, NAL is a multi-valent, non-monotonic language
that, if to be implemented in the Semantic Web, requires that RDF be interpreted as a data
model, not as a logic language. Moreover, NAL is an attractive language for the Semantic
Web because its reasoning process is inherently distributed, can handle conflicting incon-
sistent data, and was designed on the assumption of insufficient knowledge and computing
resources.
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

2.1 RDF Schema


RDFS is a simple language with a small set of inference rules [14]. In RDF, resources
(e.g. URIs and blank nodes) maintain properties (i.e. rdf:Property). These properties are
used to relate resources to other resources and literals. In RDFS, classes and properties can
be formally defined. Class definitions organize resources into abstract categories. Property
definitions specify the way in which these resources are related to one another. For example,
it is possible to state there exist people and dogs (i.e. classes) and people have dogs as pets
(i.e. a property). This is represented in RDFS in Figure 1.
RDFS inference rules are used to derive new statements given existing statements that
use the RDFS langauge. RDFS inference rules make use of statements with the following
URIs:
• rdfs:Class: denotes a class as opposed to an instance.
• rdf:Property: denotes a property/role.
• rdfs:domain: denotes what a property projects from.
• rdfs:range: denotes what a property projects to.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
6 Marko A. Rodriguez

rdfs:Class

rdf:type rdf:type

rdf:Property
lanl:Person lanl:Dog

rdf:type
rdfs:range rdfs:domain
lanl:pet

Figure 1: An RDFS ontology that states that a person has a dog as a pet.

• rdf:type: denotes that an instance is a type of class.


• rdfs:subClassOf: denotes that a class is a subclass of another.
• rdfs:subPropertyOf: denotes that a property is a sub-property of another.
• rdfs:Resource: denotes a generic resource.
• rdfs:Datatype: denotes a literal primitive class.
• rdfs:Literal: denotes a generic literal class.

RDFS supports two general types of inference: subsumption and realization. Subsumption
determines which classes are a subclass of another. The RDFS inference rules that support
subsumption are

(?x, rdf:type, rdfs:Class) =⇒ (?x, rdfs:subClassOf, rdfs:Resource),

(?x, rdf:type, rdfs:Datatype) =⇒ (?x, rdfs:subClassOf, rdfs:Literal),


Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

(?x, rdfs:subPropertyOf, ?y) ∧ (?y, rdfs:subPropertyOf, ?z)


=⇒ (?x, rdfs:subPropertyOf, ?z).

and finally,

(?x, rdfs:subClassOf, ?y) ∧ (?y, rdfs:subClassOf, ?z)


=⇒ (?x, rdfs:subClassOf, ?z).

Thus, if both

(lanl:Chihuahua, rdfs:subClassOf, lanl:Dog)


(lanl:Dog, rdfs:subClassOf, lanl:Mammal)

are asserted, then it can be inferred that

(lanl:Chihuahua, rdfs:subClassOf, lanl:Mammal).

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 7

Next, realization is used to determine if a resource is an instance of a class. The RDFS


inference rules that support realization are

(?x, ?y, ?z) =⇒ (?x, rdf:type, rdfs:Resource),

(?x, ?y, ?z) =⇒ (?y, rdf:type, rdf:Property),


(?x, ?y, ?z) =⇒ (?z, rdf:type, rdfs:Resource),
(?x, rdf:type, ?y) ∧ (?y, rdfs:subClassOf, ?z) =⇒ (?x, rdf:type, ?z),
(?w, rdfs:domain, ?x) ∧ (?y, ?w, ?z) =⇒ (?y, rdf:type, ?x),
and finally,

(?w, rdfs:domain, ?x) ∧ (?y, ?w, ?z) =⇒ (?z, rdf:type, ?x).

Thus if, along with the statements in Figure 1,

(lanl:marko, lanl:pet, lanl:fluffy)

is asserted, then it can be inferred that

(lanl:marko, rdf:type, lanl:Person)


(lanl:fluffy, rdf:type, lanl:Dog).

Given a knowledge base containing statements, these inference rules continue to exe-
cute until they no longer produce novel statements. It is the purpose of an RDFS reasoner
to efficiently execute these rules. There are two primary ways in which inference rules are
executed: at insert time and at query time. With respect to insert time, if a statement is
inserted (i.e. asserted) into the knowledge base, then the RDFS inference rules execute to
determine what is entailed by this new statement. These newly entailed statements are then
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

inserted in the knowledge base and the process continues. While this approach ensures fast
query times (as all entailments are guaranteed to exist at query time), it greatly increases the
number of statements generated. For instance, given a deep class hierarchy, if a resource
is a type of one of the leaf classes, then it asserted that it is a type of all the super classes
of that leaf class. In order to alleviate the issue of “statement bloat,” inference can instead
occur at query time. When a query is executed, the reasoner determines what other implicit
statements should be returned with the query. The benefits and drawbacks of each approach
are benchmarked, like much of computing, according to space vs. time.

2.2 Web Ontology Language


The Web Ontology Language (OWL) is a more complicated language which extends RDFS
by providing more expressive constructs for defining classes [39]. Moreover, beyond sub-
sumption and realization, OWL provides inference rules to determine class and instance
equivalence. There are many OWL specific inference rules. In order to give the flavor of
OWL, without going into the many specifics, this subsection will only present some exam-
ples of the more commonly used constructs. For a fine, in depth review of OWL, please
refer to [36].

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
8 Marko A. Rodriguez

Perhaps the most widely used language URI in OWL is owl:Restriction. In RDFS,
a property can only have a domain and a range. In OWL, a class can apply the following
restrictions to a property:

• owl:cardinality
• owl:minCardinality
• owl:maxCardinality
• owl:hasValue
• owl:allValuesFrom
• owl:someValuesFrom

Cardinality restrictions are used to determine equivalence and inconsistency. For example,
in an OWL ontology, it is possible to state that a country can only have one president. This
is expressed in OWL as diagrammed in Figure 2. The :1234 resource is a blank node that
denotes a restriction on the country class’s lanl:president property.

owl:Restriction "1"^^xsd:int
owl:maxCardinality

rdfs:subClassOf

_:1234 owl:onProperty lanl:president

rdfs:range
rdfs:subClassOf

lanl:Country rdfs:domain lanl:Person

Figure 2: An OWL ontology that states that the president of a country is a person and there
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

can be at most one president for a country.

Next, if usa:barack and usa:obama are both asserted to be the president of the United
States with the statements

(usa:barack, lanl:president, usa:United_States)


(usa:obama, lanl:president, usa:United_States),

then it can be inferred (according to OWL inference rules) that these resources are equiv-
alent. This equivalence relationship is made possible because the maximum cardinality of
the lanl:president property of a country is 1. Therefore, if there are “two” people that
are president, then they must be the same person. This is made explicit when the reasoner
asserts the statements

(usa:barack, owl:sameAs, usa:obama)


(usa:obama, owl:sameAs, usa:barack).

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 9

Next, if lanl:herbertv is asserted to be different from usa:barack (which, from previ-


ous, was asserted to be the same as usa:obama) and lanl:herbertv is also asserted to
be the president of the United States, then an inconsistency is detected. Thus, given the
ontology asserted in Figure 2 and the previous assertions, asserting

(lanl:herbertv, owl:differentFrom, usa:barack)


(lanl:herbertv, lanl:president, usa:United_States)

causes an inconsistency. This inconsistency is due to the fact that a country can only have
one president and lanl:herbertv is not usa:barack.
Two other useful language URIs for properties in OWL are

• owl:SymmetricProperty
• owl:TransitiveProperty

In short, if y is symmetric, then if (x, y, z) is asserted, then (z, y, x) can be inferred. Next,
if the property y is transitive, then if (w, y, x) and (x, y, z) are asserted then, (w, y, z) can be
inferred.
There are various reasoners that exist for the OWL language. A popular OWL reasoner
is Pellet [44]. The purpose of Pellet is to execute the OWL rules given existing statements in
the knowledge base. For many large-scale knowledge base applications (i.e. triple- or quad-
stores), the application provides its own reasoner. Popular knowledge bases that make use
of the OWL language are OWLim [34], Oracle Spatial [3], and AllegroGraph [1]. It is
noted that due to the complexity (in terms of implementation and running times), many
knowledge base reasoners only execute subsets of the OWL language. For instance, Al-
legroGraph’s reasoner is called RDFS++ as it implements all of the RDFS rules and only
some of the OWL rules. However, it is also noted that RacerPro [26] can be used with
AllegroGraph to accomplish complete OWL reasoning. Finally, OpenSesame [16] can be
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

used for RDFS reasoning. Because OpenSesame is both a knowledge base and an API,
knowledge base applications that implement the OpenSesame interfaces can automatically
leverage the OpenSesame RDFS reasoner; though there may be speed issues as the reasoner
is not natively designed for that knowledge base application.

2.3 Non-Axiomatic Logic


If RDF is strictly considered a monotonic, open-world logic language, then the Semantic
Web is solidified as an open-world, monotonic logic environment. If reasoning is restricted
to the legacy semantics of RDF, then it will become more difficult to reason on the Semantic
Web as it grows in size and as more inconsistent knowledge is introduced. With the number
of statements of the Semantic Web, computational hurdles are met when reasoning with
RDFS and OWL. With inconsistent statements on the Semantic Web, it is difficult to reason
as inconsistencies are not handled gracefully in RDFS or OWL. In general, sound and
complete reasoning will not be feasible as the Semantic Web continues to grow. In order
to meet these challenges, the Large Knowledge Collider project (LarKC) is focused on
developing a reasoning platform to handle incomplete and inconsistent data [21].

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
10 Marko A. Rodriguez

“Researchers have developed methods for reasoning in rather small, closed,


trustworthy, consistent, and static domains. They usually provide a small set
of axioms and facts. [OWL] reasoners can deal with 105 axioms (concept
definitions), but they scale poorly for large instance sets. [...] There is a deep
mismatch between reasoning on a Web scale and efficient reasoning algorithms
over restricted subsets of first-order logic. This is rooted in underlying assump-
tions of current systems for computational logic: small set of axioms, small
number of facts, completeness of inference, correctness of inference rules and
consistency, and static domains.” [21]

There is a need for practical methods to reason on the Semantic Web. One promising
logic was founded on the assumption of insufficient knowledge and resources. This logic
is called the Non-Axiomatic Logic (NAL) [58]. Unfortunately for the Semantic Web as
it is now, NAL breaks the assumptions of RDF semantics as NAL is multi-valent, non-
monotonic, and makes use of statements with a subject-predicate form. However, if RDF is
considered simply a data model, then it is possible to represent NAL statements and make
use of its efficient, distributed reasoning system. Again, for the massive-scale, inconsistent
world of the Semantic Web, sound and complete approaches are simply becoming more
unreasonable.

2.3.1 The Non-Axiomatic Language


There are currently 8 NAL languages. Each language, from NAL-0 to NAL-8, builds on
the constructs of the previous in order to support more complex statements. The following
list itemizes the various languages and what can be expressed in each.

• NAL-0: binary inheritance.


• NAL-1: inference rules.
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

• NAL-2: sets and variants of inheritance.


• NAL-3: intersections and differences
• NAL-4: products, images, and ordinary relations.
• NAL-5: statement reification.
• NAL-6: variables.
• NAL-7: temporal statements.
• NAL-8: procedural statements.

Every NAL language is based on a simple inheritance relationship. For example, in


NAL-0, which assumes all statements are binary,

lanl:marko → lanl:Person

states that Marko (subject) inherits (→) from person (predicate). Given that all subjects and
predicates are joined by inheritance, there is no need to represent the copula when formally

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 11

representing a statement.4 . If RDF, as a data model, is to represent NAL, then one possible
representation for the above statement is

(lanl:marko, lanl:1234, lanl:Person),

where lanl:1234 serves as a statement pointer. This pointer could be, for example, a
128-bit Universally Unique Identifier (UUID) [37]. It is important to maintain a statement
pointer as beyond NAL-0, statements are not simply “true” or “false.” A statement’s truth
is not defined by its existence, but instead by extra numeric metadata associated with the
statement. NAL maintains an

“experience-grounded semantics [where] the truth value of a judgment indi-


cates the degree to which the judgment is supported by the system’s experi-
ence. Defined in this way, truth value is system-dependent and time-dependent.
Different systems may have conflicting opinions, due to their different experi-
ences.” [59]

A statement has a particular truth value associated with it that is defined as the frequency of
supporting evidence (denoted f ∈ [0, 1]) and the confidence in the stability of that frequency
(denoted c ∈ [0, 1]). For example, beyond NAL-0, the statement “Marko is a person” is not
“100% true” simply because it exists. Instead, every time that aspects of Marko coincide
with aspects of person, then f increases. Likewise, every time aspects of Marko do not
coincide with aspects of person, f decreases.5 Thus, NAL is non-monotonic as its statement
evidence can increase and decrease. To demonstrate f and c, the above “Marko is a person”
statement can be represented in NAL-1 as

lanl:marko → lanl:Person < 0.9, 0.8 >,

where, for the sake of this example, f = 0.9 and c = 0.8. In an RDF representation, this can
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

be denoted

(lanl:marko, lanl:1234, lanl:Person)


(lanl:1234, nal:frequency, "0.9"ˆˆxsd:float)
(lanl:1234, nal:confidence, "0.8"ˆˆxsd:float),

where the lanl:1234 serves as a statement pointer allowing NAL’s nal:frequency and
nal:confidence constructs to reference the inheritance statement.
NAL-4 supports statements that are more analogous to the subject-object-predicate form
of RDF. If Marko is denoted by the URI lanl:marko, Alberto by the URI ucla:apepe,
and friendship by the URI lanl:friend, then in NAL-4, the statement “Alberto is a friend
of Marko” is denoted in RDF as
4 This is not completely true as different types of inheritance are defined in NAL-2 such as instance ◦→,
property →◦, and instance-property ◦→◦ inheritance. However, these 3 types of inheritance can also be rep-
resented using the basic → inheritance. Moreover, the RDF representation presented can support the explicit
representation of other inheritance relationships if desired.
5 The idea of “aspects coinciding” is formally defined in NAL, but is not discussed here for the sake of

brevity. In short, a statement’s f is modulated by both the system’s “external” experiences and “internal”
reasoning—both create new evidence. See [61] for an in depth explanation.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
12 Marko A. Rodriguez

(ucla:apepe, lanl:friend, lanl:marko).

In NAL-4 this is represented as

(ucla:apepe × lanl:marko) → lanl:friend < 0.8, 0.5 >,

where f = 0.8 and c = 0.5 are provided for the sake of the example. This statement states
that the set (ucla:apepe, lanl:marko) inherits the property of friendship to a certain de-
gree and stability as defined by f and c, respectively. The RDF representation of this NAL-4
construct can be denoted

(lanl:2345, nal:_1, ucla:pepe)


(lanl:2345, nal:_2, lanl:marko)
(lanl:2345, lanl:3456, lanl:friend)
(lanl:3456, nal:frequency, "0.8"ˆˆxsd:float)
(lanl:3456, nal:confidence, "0.5"ˆˆxsd:float).

In the triples above, lanl:2345 serves as a set and thus, this set inherits from friendship.
That is, Alberto and Marko inherit the property of friendship.

2.3.2 The Non-Axiomatic Reasoner


“In traditional logic, a ‘valid’ or ‘sound’ inference rule is one that never derives
a false conclusion (that is, it will be contradicted by the future experience of the
system) from true premises [19]. [In NAL], a ‘valid conclusion’ is one that is
most consistent with the evidence in the past experience, and a ‘valid inference
rule’ is one whose conclusions are supported by the premises used to derive
them.” [61]
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Given that NAL is predicated on insufficient knowledge, there is no guarantee that reasoning
will produce “true” knowledge with respect to the world that the statements are modeling as
only a subset of that world is ever known. However, this does not mean that NAL reasoning
is random, instead, it is consistent with respect to what the system knows. In other words,

“the traditional definition of validity of inference rules—that is to get true con-


clusions from true premises—no longer makes sense in [NAL]. With insuffi-
cient knowledge and resources, even if the premises are true with respect to the
past experience of the system there is no way to get infallible predictions about
the future experience of the system even though the premises themselves may
be challenged by new evidence.” [59]

The inference rules in NAL are all syllogistic in that they are based on statements shar-
ing similar terms (i.e. URIs) [45]. The typical inference rule in NAL has the following
form
(τ1 < f 1 , c1 > ∧ τ2 < f 2 , c2 >) ` τ3 < f 3 , c3 >,
where τ1 and τ2 are statements that share a common term. There are four standard syllo-
gisms used in NAL reasoning. These are enumerated below.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 13

1. deduction: (x → y < f 1 , c1 > ∧ y → z < f 2 , c2 >) ` x → z < f 3 , c3 >.


2. induction: (x → y < f 1 , c1 > ∧ z → y < f 2 , c2 >) ` x → z < f 3 , c3 >.
3. abduction: (x → y < f 1 , c1 > ∧ x → z < f 2 , c2 >) ` y → z < f 3 , c3 >.
4. exemplification: (x → y < f 1 , c1 > ∧ y → z < f 2 , c2 >) ` z → x < f 3 , c3 >.

Two other important inference rule not discussed here are choice (i.e. what to do with con-
tradictory evidence) and revision (i.e. how to update existing evidence with new evidence).
Each of the inference rules have a different formulas for deriving < f 3 , c3 > from < f 1 , c1 >
and < f 2 , c2 >.6 These formulas are enumerated below.

1. deduction: f 3 = f 1 f2 and c3 = f 1 c1 f2 c2 .
2. induction: f 3 = f 1 and c3 = f1fc11cc12c+k
2
.
f 2 c1 c2
3. abduction: f 3 = f 2 and c3 = f2 c1 c2 +k .
4. exemplification: f 3 = 1 and c3 = f1fc21cf12fc22c+k
2
.

The variable k ∈ N+ is a system specific parameter used in the determination of confidence.


To demonstrate deduction, suppose the two statements

lanl:marko → lanl:Person < 0.5, 0.5 >

lanl:Person → lanl:Mammal < 0.9, 0.9 > .


Given these two statements and the inference rule for deduction, it is possible to infer

lanl:marko → lanl:Mammal < 0.45, 0.2025 > .

Next suppose the statement

lanl:Dog → lanl:Mammal < 0.9, 0.9 > .


Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

Given the existing statements, induction, and a k = 1, it is possible to infer

lanl:marko → lanl:Dog < 0.45, 0.0758 > .

Thus, while the system is not confident, according to all that the system knows, Marko is
a type of dog. This is because there are aspects of Marko that coincide with aspects of
dog—they are both mammals. However, future evidence, such as fur, four legs, sloppy
tongue, etc. will be further evidence that Marko and dog do not coincide and thus, the f of
lanl:marko → lanl:Dog will decrease.
The significance of NAL reasoning is that all inference is based on local areas of the
knowledge base. That is, all inference requires only two degrees of separation from the
resource being inferred on. Moreover, reasoning is constrained by available computational
resources, not by a requirement for logical completeness. Because of these two proper-
ties, the implemented reasoning system is inherently distributed and when computational
resources are not available, the system does not break, it simply yields less conclusions. For
6 Note that when the entailed statement already exists, its < f3 ,c3 > component is revised according to the
revision rule. Revision is not discussed in this article.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
14 Marko A. Rodriguez

the Semantic Web, it may be best to adopt a logic that is better able to take advantage of
its size and inconsistency. With a reasoner that is distributable and functions under variable
computational resources, and makes use of a language that is non-monotonic and supports
degrees of “truth,” NAL may serve as a more practical logic for the Semantic Web. How-
ever, this is only possible if the RDF data model is separated from the RDF semantics and
NAL’s subject-predicate form can be legally represented.
There are many other language constructs in NAL that are not discussed here. For
an in depth review of NAL, please refer to the defacto reference at [61]. Moreover, for a
fine discussion of the difference between logics of truth (i.e. mathematical logic—modern
predicate logic) and logics of thought (i.e. cognitive logic—NAL), see [60].

3 A Distributed Multi-Relational Network


The Web of Data can be interpreted as a distributed multi-relational network—a Giant
Global Graph.7 A mutli-relational network denotes a set of vertices (i.e. nodes) that are
connected to one another by set of labeled edges (i.e. typed links).8 In the graph and net-
work theory community, the multi-relational network is less prevalent. The more commonly
used network data structure is the single-relational network, where all edges are of the same
type and thus, there is no need to label edges. Unfortunately, most network algorithms have
been developed for the single-relational network data structure. However, it is possible to
port all known single-relational network algorithms over to the multi-relational domain. In
doing so, it is possible to leverage these algorithms on the Giant Global Graph. The purpose
of this section is to

1. formalize the single-relational network (see §3.1),


2. formalize the multi-relational network (see §3.2),
3. present a collection of common single-relational network algorithms (see §3.3), and
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

then finally,
4. present a method for porting all known single-relational network algorithms over to
the multi-relational domain (see §3.4).

Network algorithms are useful in many respects and have been generally applied to
analysis and querying. If the network models an aspect of the world, then network analysis
techniques can be used to elucidate general structural properties of the network and thus, the
world. Moreover, network query algorithms have been developed for searching and ranking.
When these algorithms can be effectively and efficiently applied to the Giant Global Graph,
the Giant Global Graph can serve as a medium for network analysis and query.
7 The
term “graph” is used in the mathematical domain of graph theory and the term “network” is used
primarily in the physics and computer science domain of network theory. In this chapter, both terms are used
depending on their source. Moreover, with regard to this article, these two terms are deemed synonymous with
each other.
8 A multi-relational network is also known as a directed labeled graph or semantic network.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 15

3.1 Single-Relational Networks


The single-relational network represents a set of vertices that are related to one another by
a homogenous set of edges. For instance, in a single-relational coauthorship network, all
vertices denote authors and all edges denote a coauthoring relationship. Coauthorship exists
between two authors if they have both written an article together. Moreover, coauthorship is
symmetric—if person x coauthored with person y, then person y has coauthored with person
x. In general, these types of symmetric networks are known as undirected, single-relational
networks and can be denoted

G0 = (V, E ⊆ {V ×V }),

where V is the set of vertices and E is the set of undirected edges. The edge {i, j} ∈ E
states that vertex i and j are connected to each other. Figure 3 diagrams an undirected
coauthorship edge between two author vertices.

lanl:marko lanl:coauthor rpi:josh

Figure 3: An undirected edge between two authors in an undirected single-relational net-


work.

Single-relational networks can also be directed. For instance, in a single-relational


citation network, the set of vertices denote articles and the set of edges denote citations
between the articles. In this scenario, the edges are not symmetric as one article citing
another does not imply that the cited article cites the citing article. Directed single-relational
networks can be denoted
G = (V, E ⊆ (V ×V )),
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

where (i, j) ∈ E states that vertex i is connected to vertex j. Figure 4 diagrams a directed
citation edge between two article vertices.

aaai:evidence lanl:cites joi:path_algebra

Figure 4: A directed edge between two articles in a directed single-relational network.

Both undirected and directed single-relational networks have a convenient matrix rep-
resentation. This matrix is known as an adjacency matrix and is denoted
(
1 if (i, j) ∈ E
Ai, j =
0 otherwise,

where A ∈ {0, 1}|V|×|V | . If Ai, j = 1, then vertex i is adjacent (i.e. connected) to vertex j. It
is important to note that there exists an information-preserving, bijective mapping between
the set-theoretic and matrix representations of a network. Throughout the remainder of this

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
16 Marko A. Rodriguez

section, depending on the algorithm presented, one or the other form of a network is used.
Finally, note that the remainder of this section is primarily concerned with directed networks
as a directed network can model an undirected network. In other words, the undirected edge
{i, j} can be represented as the two directed edges (i, j) and ( j, i).

3.2 Multi-Relational Networks


The multi-relational network is a more complicated structure that can be used to represent
multiple types of relationships between vertices. For instance, it is possible to not only
represent researchers, but also their articles in a network of edges that represent authorship,
citation, etc. A directed multi-relational network can be denoted

M = (V, E = {E0 , E1 , . . ., Em ⊆ (V ×V )}),

where E is a family of edge sets such that any Ek ∈ E : 1 ≤ k ≤ m is a set of edges with
a particular meaning (e.g. authorship, citation, etc.). A multi-relational network can be
interpreted as a collection of single-relational networks that all share the same vertex set.
Another representation of a multi-relational network is similar to the one commonly em-
ployed to define an RDF graph. This representation is denoted

M 0 ⊆ (V × Ω ×V ),

where Ω is the set of edge labels. In this representation if i, j ∈ V and k ∈ Ω, then the triple
(i, k, j) states that vertex i is connected to vertex j by the relationship type k.
Figure 5 diagrams multiple relationship types between scholars and articles in a multi-
relational network.

lanl:marko rpi:josh
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

lanl:authored lanl:authored lanl:authored

aaai:evidence lanl:cites joi:path_algebra

Figure 5: Multiple types of edges between articles and scholars in a directed multi-
relational network.

Like the single-relational network and its accompanying adjacency matrix, the multi-
relational network has a convenient 3-way tensor representation. This 3-way tensor is de-
noted (
1 if (i, j) ∈ Ek : 1 ≤ k ≤ m
Ai,k j =
0 otherwise.
This representation can be interpreted as a collection of adjacency matrix “slices,” where
each slice is a particular edge type. In other words, if Ai,k j = 1, then (i, k, j) ∈ M 0 . Like the
relationship between the set-theoretic and matrix forms of a single-relational network, M,

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 17

M 0 , and A can all be mapped onto one another without loss of information. In this article,
each representation will be used depending on the usefulness of its form with respect to the
idea being expressed.
On the Giant Global Graph, RDF serves as the specification for graphing resources.
Vertices are denoted by URIs, blank nodes, and literals and the edge labels are denoted by
URIs. Multi-relational network algorithms can be used to exploit the Giant Global Graph.
However, there are few algorithms dedicated specifically to multi-relational networks. Most
network algorithms have been designed for single-relational networks. The remainder of
this section will discuss some of the more popular single-relational network algorithms
and then present a method for porting these algorithms (as well as other single-relational
network algorithms) over to the multi-relational domain. This section concludes with a
distributable and scalable method for executing network algorithms on the Giant Global
Graph.

3.3 Single-Relational Network Algorithms


The design and study of graph and network algorithms is conducted primarily by mathe-
maticians (graph theory) [17], physicists and computer scientists (network theory) [12], and
social scientists (social network analysis) [62]. Many of the algorithms developed in these
domains can be used together and form the general-purpose “toolkit” for researchers do-
ing network analysis and for engineers developing network-based services. The following
itemized list presents a collection of the single-relational network algorithms that will be
reviewed in this subsection. As denoted with its name in the itemization, each algorithm
can be used to identify properties of vertices, paths, or the network. Vertex metrics assign
a real value to a vertex. Path metrics assign a real value to a path (i.e. an ordered set of
vertices). And finally, network metrics assign a real value to the network as a whole.

• shortest path: path metric (§3.3.1)


Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

• eccentricity: vertex metric (§3.3.2)


• radius: network metric (§3.3.2)
• diameter: network metric (§3.3.2)
• closeness: vertex metric (§3.3.3)
• betweenness: vertex metric (§3.3.3)
• stationary probability distribution: vertex metric (§3.3.4)
• PageRank: vertex metric (§3.3.5)
• spreading activation: vertex metric (§3.3.6)
• assortative mixing: network metric (§3.3.7)

A simple intuitive approach to determine the appropriate algorithm to use for an appli-
cation scenario is presented in [35]. In short, various factors come into play when selecting
a network algorithm such as the topological features of the network (e.g. its connectivity
and its size), the computational requirements of the algorithms (e.g. its complexity), the
type of results that are desired (e.g. personalized or global), and the meaning of the algo-
rithm’s result (e.g. geodesic-based, flow-based, etc.). The following sections will point out
which features describe the presented algorithms.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
18 Marko A. Rodriguez

3.3.1 Shortest Path


The shortest path metric is the foundation of all other geodesic metrics. The other geodesic
metrics discussed are eccentricity, radius, diameter, closeness, and betweenness. A shortest
path is defined for any two vertices i, j ∈ V such that the sink vertex j is reachable from
the source vertex i. If j is unreachable from i, then the shortest path between i and j is
undefined. Thus, for geodesic metrics, it is important to only considered strongly connected
networks, or strongly connected components of a network.9 The shortest path between any
two vertices i and j in a single-relational network is the smallest of the set of all paths
between i and j. If ρ : V ×V → Q is a function that takes two vertices and returns the set of
all paths Q where for any q ∈ Q, q = (i, . . ., j), then the length of the shortest path between i
and j is min( q∈Q |q| − 1), where min returns the smallest value of its domain. The shortest
S

path function is denoted s : V ×V → N with the function rule


 
[
s(i, j) = min  |q| − 1 .
q∈ρ(i, j)

There are many algorithms to determine the shortest path between vertices in a net-
work. Dijkstra’s method is perhaps the most popular as it is the typical algorithm taught in
introductory algorithms classes [20]. However, if the network is unweighted, then a simple
breadth-first search is a more efficient way to determine the shortest path between i and
j. Starting from i a “fan-out” search for j is executed where at each time step, adjacent
vertices are traversed to. The first path that reaches j is the shortest path from i to j.

3.3.2 Eccentricity, Radius, and Diameter


The radius and diameter of a network require the determination of the eccentricity of every
vertex in V . The eccentricity of a vertex i is the largest shortest path between i and all other
vertices in V such that the eccentricity function e : V → N has the rule
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

!
[
e(i) = max s(i, j) : i 6= j ,
j∈V

where max returns the largest value of its domain [28]. In terms of algorithmic complexity,
the eccentricity metric calculates |V | − 1 shortest paths for a particular vertex.
The radius of the network is the minimum eccentricity of all vertices in V [62]. The
function r : G → N has the rule
!
[
r(G) = min e(i) .
i∈V
Finally, the diameter of a network is the maximum eccentricity of the vertices in V [62].
The function d : G → N has the rule
!
[
d(G) = max e(i) .
i∈V
9 Donot confuse a strongly connected network with a fully connected network. A fully connected network
is where every vertex is connected to every other vertex directly. A strongly connected network is where every
vertex is connected to every other vertex indirectly (i.e. there exists a path from any i to any j).

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 19

Both radius and diameter required V 2 −V shortest path calculations.


The diameter of a network is, in some cases, telling of the growth properties of the
network (i.e. the general principle by which new vertices and edges are added). For instance,
if the network is randomly generated (edges are randomly assigned between vertices), then
the diameter of the network is much larger then if the network is generated according to
a more “natural growth” function such as a preferential attachment model, where highly
connected vertices tend to get more edges (colloquially captured by the phrase “the rich
get richer”) [11]. Thus, in general, natural networks tend to have a much smaller diameter.
This was evinced by an empirical study of the World Wide Web citation network, where the
diameter of the network was concluded to be only 19 [2].

3.3.3 Closeness and Betweenness Centrality


Closeness and betweenness centrality are popular network metrics for determining the “cen-
tralness” of a vertex and have been used in sociology [62], bioinformatics [43], and biblio-
metrics [10]. Centrality is a loose term that describes the intuitive notion that some vertices
are more connected/integral/central/influential within the network than others. Closeness
centrality is one such centrality measure and is defined as the mean shortest path between
some vertex i and all the other vertices in V [5, 38, 53]. The function c : V → R has the rule
1
c(i) = .
∑ j∈V s(i, j)

Betweenness centrality is defined for a vertex in V [13, 23]. The betweenness of i ∈ V


is the number of shortest paths that exist between all vertices j, k ∈ V that have i in their
path divided by the total number of shortest paths between j and k, where i 6= j 6= k. If
σ : V ×V → Q is the function that returns the set of shortest paths between any two vertices
j and k such that [
σ( j, k) = q : |q| − 1 = s( j, k)
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

q∈p( j,k)

and σ̂ : V × V × V → Q is the set of shortest paths between two vertices j and k that have i
in the path, where
[
σ̂( j, k, i) = q : (|q| − 1 = s( j, k) ∧ i ∈ q),
q∈p( j,k)

then the betweenness function b : V → R has the rule


|σ̂( j, k, i)|
b(i) = ∑ |σ( j, k)|
.
i6= j6=k∈V

There are many variations to the standard representations presented above. For a more
in depth review on these metrics, see [62] and [12]. Finally, centrality is not restricted only
to geodesic metrics. The next three algorithms are centrality metrics based on random walks
or “flows” through a network.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
20 Marko A. Rodriguez

3.3.4 Stationary Probability Distribution


A Markov chain is used to model the states of a system and the probability of transition
between states [27]. A Markov chain is best represented by a probabilistic, single-relational
network where the states are vertices, the edges are transitions, and the edge weights denote
the probability of transition. A probabilistic, single-relational network can be denoted

G00 = (V, E ⊆ (V ×V ), ω : E → [0, 1])

where ω is a function that maps each edge in E to a probability value. The outgoing edges
of any vertex form a probability distribution that sums to 1.0. In this section, all outgoing
probabilities from a particular vertex are assumed to be equal. Thus, ∀ j, k ∈ Γ+ (i) : ω(i, j) =
ω(i, k), where Γ+ (i) ⊆ V is the set of vertices adjacent to i.
A random walker is a useful way to visualize the transitioning between vertices. A
random walker is a discrete element that exists at a particular i ∈ V at a particular point in
time t ∈ N+ . If the vertex at time t is i then the next vertex at time t + 1 will be one of the
vertices adjacent to i in Γ+ (i). In this manner, the random walker makes a probabilistic jump
to a new vertex at every time step. As time t goes to infinity a unique stationary probability
distribution emerges if and only if the network is aperiodic and strongly connected. The
stationary probability distribution expresses the probability that the random walker will be
at a particular vertex in the network. In matrix form, the stationary probability distribution is
represented by a row vector π ∈ [0, 1]|V | , where πi is the probability that the random walker
is at i and ∑i∈V πi = 1.0. If the network is represented by the row-stochastic adjacency
matrix (
1
+ if (i, j) ∈ E
Ai, j = |Γ (i)|
0 otherwise
and if the network is aperiodic and strongly connected, then there exists some π such that
πA = π. Thus, the stationary probability distribution is the primary eigenvector of A. The
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

primary eigenvector of a network is useful in ranking its vertices as those vertices that are
more central are those that have a higher probability in π. Thus, intuitively, where the
random walker is likely to be is a indicator of how central the vertex is. However, if the
network is not strongly connected (very likely for most natural networks), then a stationary
probability distribution does not exist.

3.3.5 PageRank
PageRank makes use of the random walker model previously presented [15]. However,
in PageRank, the random walker does not simply traverse the single-relational network by
moving between adjacent vertices, but instead has a probability of jumping, or “teleporting,”
to some random vertex in the network. In some instances, the random walker will follow
an outgoing edge from its current vertex location. In other instances, the random walker
will jump to some other random vertex in the network that is not necessarily adjacent to
it. The benefit of this model is that it ensures that the network is strongly connected and
aperiodic and thus, there exists a stationary probability distribution. In order to calculate
PageRank, two networks are used. The standard single-relational network is represented as

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 21

the row-stochastic adjacency matrix


(
1
|Γ+ (i)| if (i, j) ∈ E
Ai, j = 1
|V | otherwise.

Any i ∈ V where Γ+ (i) = 0/ is called a “rank-sink.” Rank-sinks ensure that the network is
not strongly connected. To rectify this connectivity problem, all vertices that are rank-sinks
are connected to every other vertex with probability |V1 | . Next, for teleportation, a fully
connected network is created that is denoted Bi, j = |V1 | .
The random walker will choose to use A or B at time step t as its transition network
depending on the probability value α ∈ (0, 1], where in practice, α = 0.85. This means that
85% of the time the random walker will use the edges in A to traverse, and the other 15% of
the time, the random walker will use the edges in B. The α-biased union of the networks A
and B guarantees that the random walker is traversing an strongly connected and aperiodic
network. The random walker’s traversal network can be expressed by the matrix

C = αA + (1 − α)B.

The PageRank row vector π ∈ [0, 1]|V| has the property πC = π. Thus, the PageRank
vector is the primary eigenvector of the modified single-relational network. Moreover, π is
the stationary probability distribution of C. From a certain perspective, the primary contri-
bution of the PageRank algorithm is not in the way it is calculated, but in how the network
is modified to support a convergence to a stationary probability distribution. PageRank has
been popularized by the Google search engine and has been used as a ranking algorithm in
various domains. Relative to the geodesic centrality algorithms presented previous, PageR-
ank is a more efficient way to determine a centrality score for all vertices in a network.
However, calculating the stationary probability distribution of a network is not cheap and
for large networks, can not be accomplished in real-time. Local rank algorithms are more
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

useful for real-time results in large-scale networks such as the Giant Global Graph.

3.3.6 Spreading Activation


Both the stationary probability distribution and PageRank are global rank metrics. That is,
they rank all vertices relative to all vertices and as such, require a full network perspective.
However, for many applications, a local rank metric is desired. Local rank metrics rank a
subset of the set of all vertices in the network relative to some set of source vertices. Local
rank metrics have the benefit of being faster to compute and being relative to a particu-
lar area of the network. For large-scale networks, local rank metrics are generally more
practical for real-time queries.
Perhaps the most popular local rank metric is spreading activation. Spreading activation
is a network analysis technique that was inspired by the spreading activation potential found
in biological neural networks [4, 18, 30]. This algorithm (and its many variants) has been
used extensively in semantic network reasoning and recommender systems. The purpose
of the algorithm is to expose, in a computationally efficient manner, those vertices which
are closest (in terms of a flow distance) to a particular set of vertices. For example, given
i, j, k ∈ V , if there exists many short recurrent paths between vertex i and vertex j and not

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
22 Marko A. Rodriguez

so between i and k, then it can be assumed that vertex i is more “similar” to vertex j than k.
Thus, the returned ranking will rank j higher than k relative to i. In order to calculate this
distance, “energy” is assigned to vertex i. Let x ∈ [0, 1]|V| denote the energy vector, where
at the first time step all energy is at i such that x1i = 1.0. The energy vector is propagated
over A for tˆ ∈ N+ number of steps by the equation xt+1 = xt A : t + 1 ≤ tˆ. Moreover, at every
time step, x is decayed some amount by δ ∈ [0, 1]. At the end of the process, the vertex that
had the most energy flow through it (as recorded by π ∈ R|V | ) is considered the vertex that
is most related to vertex i. Algorithm 1 presents this spreading activation algorithm. The
resultant π provides a ranking of all vertices at most tˆ steps away from i.

begin
t=1
while t ≤ tˆ do
π = π+x
x = (δx)A
t = t +1
end
return π
end
Algorithm 1: A spreading activation algorithm.

A class of algorithms known as “priors” algorithms perform computations similar to the


local rank spreading activation algorithm, but do so using a stationary probability distribu-
tion [63]. Much like the PageRank algorithm distorts the original network, priors algorithms
distort the local neighborhood of the graph and require at every time step, with some prob-
ability, that all random walkers return to their source vertex. The long run behavior of such
systems yield a ranking biased towards (or relative to) the source vertices and thus, can be
characterized as local rank metrics.
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

3.3.7 Assortative Mixing


The final single-relational network algorithm discussed is assortative mixing. Assortative
mixing is a network metric that determines if a network is assortative (colloquially captured
by the phrase “birds of a feather flock together”), disassortative (colloquially captured by
the phrase “opposites attract”), or uncorrelated. An assortative mixing algorithm returns
values in [−1, 1], where 1 is assortative, −1 is disassortative, and 0 is uncorrelated. Given a
collection of vertices and metadata about each vertex, it is possible to determine the assor-
tative mixing of the network. There are two assortative mixing algorithms: one for scalar
or numeric metadata (e.g. age, weight, etc.) and one for nominal or categorical metadata
(e.g. occupation, sex, etc.). In general, an assortative mixing algorithm can be used to
answer questions such as:

• Do friends in a social network tend to be the same age?


• Do colleagues in a coauthorship network tend to be from the same university?
• Do relatives in a kinship network tend to like the same foods?

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 23

Note that to calculate the assortative mixing of a network, vertices must have metadata prop-
erties. The typical single-relational network G = (V, E) does not capture this information.
Therefore, assume some other data structure that stores metadata about each vertex.
The original publication defining the assortative mixing metric for scalar properties used
the parametric Pearson correlation of two vectors [40].10 One vector is the scalar value of
the vertex property for the vertices on the tail of all edges. The other vector is the scalar
value of the vertex property for the vertices on the head of all the edges. Thus, the length
of both vectors is |E| (i.e. the total number of edges in the network). Formally, the Pearson
correlation-based assortativity is defined as

|E| ∑i ji ki − ∑i ji ∑i ki
r = rh ih i,
2 2 2 2
|E| ∑i ji − (∑i ji ) |E| ∑i ki − (∑i ki )

where ji is the scalar value of the vertex on the tail of edge i, and ki is the scalar value of
the vertex on the head of edge i. For nominal metadata, the equation

∑ p e pp − ∑ p a p b p
r=
1 − ∑p apbp

yields a value in [−1, 1] as well, where e pp is the number of edges in the network that have
property value p on both ends, a p is the number of edges in the network that have property
value p on their tail vertex, and b p is the number of edges that have property value p on
their head vertex [41].

3.4 Porting Single-Relational Algorithms to the Multi-Relational Domain


All the aforementioned algorithms are intended for single-relational networks. However,
it is possible to map these algorithms over to the multi-relational domain and thus, apply
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

them to the Giant Global Graph. In the most simple method, it is possible to ignore edge
labels and simply treat all edges in a multi-relational network as being “equal.” This method
represents a multi-relational network as a single-relational network and then uses the afore-
mentioned single-relational network analysis algorithm on it. This method, of course, does
not take advantage of the rich structured data that multi-relational networks offer. Another
method is to only make use of a particular edge label of a multi-relational network. If only
a particular single-relational slice of the multi-relational network is desired (e.g. a citation
network, lanl:cites), then this single-relational component can be isolated and subjected
the previously presented single-relational network algorithms. This method is limited in
that it ignores much of the information that is present in the original multi-relational net-
work.
If a multi-relational network is to be generally useful, then a method that takes advan-
tage of the various types of edges in the network is desired. The methods presented next
define abstract/implicit paths through a network. By doing so, a multi-relational network
can be redefined as a “semantically rich” single-relational network. For example, in Figure
10 Note that for metadata property distributions that are not normally distributed, a non-parametric correlation

such as the Spearman ρ or Kendall τ may be the more useful correlation coefficient.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
24 Marko A. Rodriguez

5, there does not exist lanl:authorCites edges (i.e. if person i wrote an article that cites
the article of person j, then it is true that i lanl:authorCites j). However, this edge can
be generated/inferred by making use of both the lanl:authored and lanl:cites edges.
In this way, a breadth-first search or a random walk can use these generated edges to yield
“semantically-rich” network analysis results. The remainder of this section will discuss this
idea in more depth.

3.4.1 A Multi-Relational Path Algebra


A path algebra is presented to map a multi-relational network to a single-relational network
in order to expose the multi-relational network to single-relational network algorithms. The
multi-relational path algebra summarized is discussed at length in [51]. In short, the path
algebra manipulates a multi-relational tensor, A ∈ {0, 1}|V|×|V |×|E| , in order to derive a
semantically-rich, weighted single-relational adjacency matrix, A ∈ R|V |×|V | . Uses of the
algebra can be generally defined as

∆ : {0, 1}|V |×|V |×|E| → R|V |×|V | ,

where ∆ is the user-defined path operation.


There are two primary operations used in the path algebra: traverse and filter.11 These
operations are composed to create a more complex operation. The traverse operation is
denoted · : R|V |×|V | × R|V |×|V | and uses standard matrix multiplication as its function rule.
Traverse is used to “walk” the multi-relational network. The idea behind traverse is first
described using a single-relational network example. If a single-relational adjacency matrix
is raised to the second power (i.e. multiplied with itself) then the resultant matrix denotes
(2)
how many paths of length 2 exist between vertices [17]. That is, Ai, j (i.e. (A · A)i, j ) denotes
how many paths of length 2 go from vertex i to vertex j. In general, for any power p,
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

(p) (p−1)
Ai, j = ∑ Ai,l · Al, j : p ≥ 2.
l∈V

This property can be applied to a multi-relational tensor. If A 1 and A 2 are multiplied to-
gether then the result adjacency matrix denotes the number of paths of type 1 → 2 that exist
between vertices. For example, if A 1 is the coauthorship adjacency matrix, then the adja-
>
cency matrix Z = A 1 · A 1 denotes how many coauthorship paths exist between vertices,
where > transposes the matrix (i.e. inverts the edge directionality). In other words if Marko
(vertex i) and Johan (vertex j) have written 19 papers together, then Zi, j = 19. However,
given that the identity element Zi,i may be greater than 0 (i.e. a person has coauthored with
their self), it is important to remove all such reflexive coauthoring paths back to the original
author (as a person can not coauthor with their self). In order to do this, the filter operation
is used. Given the identify matrix I and the all 1 matrix 1,
>
 
Z = A 1 · A 1 ◦ (1 − I) ,

11 Other operations not discussed in this section are merge and weight. For a in depth presentation of the

multi-relational path algebra, see [51].

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Interpretations of the Web of Data 25

yields a true coauthorship adjacency matrix, where ◦ : R|V |×|V | × R|V |×|V | is the entry-wise
Hadamard matrix multiplication operation [31]. Hadamard matrix multiplication is de-
fined as  
A1,1 · B1,1 · · · A1, j · B1,m
A◦B = 
 .. .. .. 
. . . .
An,1 · Bn,1 · · · An,m · Bn,m
In this example, the Hadamard entry-wise multiplication operation applies an “identify fil-
1 1 >
ter” to A · A that removes all paths back to the source vertices (i.e. back to the iden-
tity vertices) as it sets Zi,i = 0. Filters are generally useful when particular paths through a
multi-relational network should be excluded from a computation. The presented example
demonstrates that a multi-relational network can be mapped to a semantically-rich, single-
relational network. In the original multi-relational network, there exists no coauthoring rela-
tionship (e.g. no self-loops). However, this relation exists implicitly by means of traversing
and filtering particular paths.12
The benefit of the summarized path algebra is that is can express various abstract paths
through a multi-relational tensor in an algebraic form. Thus, given the theorems of the alge-
bra, it is possible to simplify expressions in order to derive more computationally efficient
paths for deriving the same information. The primary drawback of the algebra is that it is
a matrix algebra that globally operates on adjacency matrix slices of the multi-relational
tensor A . Given that size of the Giant Global Graph, it is not practical to execute global
matrix operations. However, these path expressions can be used as an abstract path that a
discrete “walker” can take when traversing local areas of the graph. This idea is presented
next.

3.4.2 Multi-Relational Grammar Walkers


Previously, both the stationary probability distribution, PageRank, and spreading activation
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

were defined as matrix operations. However, it is possible to represent these algorithms


using discrete random walkers. In fact, in many cases, this is the more natural representation
both in terms of intelligibility and scalability. For many, it is more intuitive to think of these
algorithms as being executed by a discrete random walker moving from vertex to vertex
recording the number of times it has traversed each vertex. In terms of scalability, all of
these algorithms can be approximated by using less walkers and thus, less computational
resources. Moreover, when represented as a swarm of discrete walkers, the algorithm is
inherently distributed as a walker is only aware of its current vertex and those vertices
adjacent to it.
For multi-relational networks, this same principle applies. However, instead of ran-
domly choosing an adjacent vertex to traverse to, the walker chooses a vertex that is de-
pendent upon an abstract path description defined for the walker. Walkers of this form are
called grammar-based random walkers [48]. A path for a walker can be defined using any
12 While not explored in [51], it is possible to use the path algebra to create inference rules in a manner

analogous to the Semantic Web Rule Language (SWRL) [32]. Moreover, as explored in [51], it is possible to
perform any arbitrary SPARQL query [46] using the path algebra (save for greater-than/less-than comparisons
of and regular expressions on literals).

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
26 Marko A. Rodriguez

language such as the path algebra presented previous or SPARQL [46]. The following ex-
amples are provided in SPARQL as it is the defacto query language for the Web of Data.
Given the coauthorship path description from previous,
 
>
A 1 · A 1 ◦ (1 − I) ,
it is possible to denote this as a local walker computation in SPARQL as
SELECT ?dest WHERE {
@ lanl:authored ?x .
?dest lanl:authored ?x .
FILTER (@ != ?dest)
}
where the symbol @ denotes the current location of the walker (i.e. a parameter to the query)
and ?dest is a collection of potential locations for the walker to move to (i.e. the return set
of the query). It is important to note that the path algebra expression performs a global
computation while the SPARQL query representation distributes the computation to indi-
vidual walkers. Given the set of resources that bind to ?dest, the walker selects a single
resource from that set and traverses to it. At which point, @ is updated to that selected re-
source value. This process continues indefinitely and, in the long run behavior, the walker’s
location probability over V denotes the stationary distribution of the walker in the Giant
Global Graph according to the abstract coauthorship path description. The SPARQL query
redefines what is meant by an adjacent vertex by allowing longer paths to be represented as
single edges. Again, this is why it is stated that such mechanisms yield semantically rich,
single-relational networks.
In the previous coauthorship example, the grammar walker, at every vertex it encoun-
ters, executes the same SPARQL query to locate “adjacent” vertices. In more complex
grammars, it is possible to chain together SPARQL queries into a graph of expressions such
Copyright © 2011. Nova Science Publishers, Incorporated. All rights reserved.

that the walker moves not only through the Giant Global Graph, but also through a web of
SPARQL queries. Each SPARQL query defines a different abstract edge to be traversed.
This idea is diagrammed in Figure 6, where the grammar walker “walks” both the grammar
and the Giant Global Graph.
To demonstrate a multiple SPARQL query grammar, a PageRank coauthorship grammar
is defined using two queries. The first query was defined above and the second query is
SELECT ?dest WHERE {
?dest rdf:type lanl:Person
}
This rule serves as the “teleportation” function utilized in PageRank to ensure a strongly
connected network. Thus, if there is a α probability that the first query will be executed and
a (1 − α) probability that the second rule will be executed, then coauthorship PageRank in
the Giant Global Graph is computed. Of course, the second rule can be computationally
expensive, but it serves to elucidate the idea.13 It is noted that the stationary probability
13 Note /
that this description is not completely accurate as “rank sinks” in the first query (when ?dest = 0)
will halt the process. Thus, in such cases, when the process halts, the second query should be executed. At
which point, rank sinks are alleviated and PageRank is calculated.

Data Management in the Semantic Web, Nova Science Publishers, Incorporated, 2011. ProQuest Ebook Central,
Exploring the Variety of Random
Documents with Different Content
applause. The other five comedies met with equal commendation
from the Romans, though Volcatius[16], in his enumeration of them,
says,
Sumetur Hecyra sexta ex his fabula.
The Step-mother is reckoned the last of the six.
The Eunuch was acted twice in one day[17]; and the author
received for it a higher price than was ever paid for any comedy
before that time, viz., eight thousand sesterces[18]: on account of the
magnitude of the sum, it is mentioned in the title of that play.
Varro[19] even prefers the opening scenes of the Brothers of Terence
to the same part in Menander. The report that Terence was indebted
to Scipio and Lælius, with whom he was so intimate, for parts of his
comedies, is well known; and he himself scarcely seems to have
discouraged the assertion, as he never seriously denies it: witness
the Prologue to the Brothers:

Nam quod isti dicunt malevoli, homines nobiles


Eum adjutare, assidueque una scribere:
Quod illi maledictum vehemens existimant,
Eam laudem hic ducit maximam, cum illis placet
Qui vobis universis, et populo placent:
Quorum opera in bello, in otio, in negotio
Suo quisque tempore usus est sine superbia.

“And as for what those malicious railers say[20], who assert that
certain noble persons assist the poet, and very frequently write with
him, what they think a reproach, he considers as the highest praise;
that he should be thought to please those who please you, and all
Rome; those who have assisted every one in war, and peace, and
even in their private affairs, with the greatest services; and yet have
been always free from arrogance.” It is likely, that he might wish, in
some measure, to encourage this idea, because he knew that it
would not be displeasing to Scipio and Lælius: however, the opinion
has gained ground, and is strongly entertained even to the present
day. Quintus Memmius[21], in an oration in his own defence, says,
Publius Africanus, qui a Terentio personam mutuatus, quæ
domi luserat ipse, nomine illius in scenam detulit.——
“Publius Africanus, who borrowed the name of Terence for
those plays which he composed at home for his
diversion.——”
Cornelius Nepos[22] asserts, that he has it from the very first
authority, that Caius Lælius being at his country-house at [23]Puteoli,
on the first of March[24], and being called to supper by his wife at an
earlier hour than usual, requested that he might not be interrupted;
and afterwards coming to table very late, he declared that he had
scarcely ever succeeded better in composition than at that time; and,
being asked to repeat the verses, he read the following from the
Self-tormentor, Act IV, Scene III.

Satis pol proterve me Syri promissa huc induxerunt


Decem minas quas mihi dare pollicitus est, quod si is nunc me
Deceperit, sæpe obsecrans me, ut veniam, frustra veniet:
Aut, cum venturam dixero, et constituero, cum is certe
Renunciârit; Clitiphon cum in spe pendebit animi
Decipiam, ac non veniam; Syrus mihi tergo pænas pendet.

“Truly this Syrus has coaxed me hither, impertinently enough, with


his fine promises that I should receive ten minæ; but, if he deceives
me this time, ’twill be to no purpose to ask me to come again; or, if I
promise, and appoint to come, I’ll take good care to disappoint him.
Clitipho, who will be full of eager hope to see me, will I deceive, and
will not come; and Syrus’ back shall pay the penalty.”
Santra[25] thinks, that if Terence had required any assistance in
his comedies; he would not have requested it from Scipio and
Lælius, who were then extremely young[26]; but from [27]Caius
Sulpicius Gallus, a man of great learning, who also was the first
person who procured[28] the representation of comedies at the
consular games or from [29]Quintus Fabius Labeo; or from[30]
Marcus Popilius Lænas, two eminent poets, and persons[31] of
consular dignity: and Terence himself, speaking of those who were
reported to have assisted him, does not mention them as young
men, but as persons of weight and experience, who had served the
Romans in peace, in war, and in private business.
After the publication of his six comedies, he quitted Rome, in the
thirty-fifth year of his age, and returned no more. Some suppose that
he undertook this journey with a view to silence the reports of his
receiving assistance from others in the composition of his plays:
others, that he went with a design to inform himself more perfectly of
the manners and customs of Greece.
Volcatius speaks of his death as follows:

Sed ut Afer sex populo edidit comœdias


Iter hinc in Asiam fecit: navim cum semel
Conscendit, visus nunquam est. Sic vita vacat.

“Terence, after having written six comedies, embarked for Asia,


and was seen no more. He perished at sea.”
Quintus Consentius[32] writes, that he died at sea, as he was
returning from Greece, with one hundred and eight plays, translated
from Menander[33]. Other writers affirm, that he died at Stymphalus,
a town in Arcadia, or in Leucadia[34], in the consulate of[35] Cneus
Cornelius Dolabella and Marcus Fulvius Nobilior, and that his end
was hastened by extreme grief for the loss of the comedies which he
had translated, and some others which he had composed himself,
and sent before him in a vessel which was afterwards wrecked.
He is said to have been of a middle stature, well-shaped, and of a
dark complexion. He left one daughter, who was afterwards married
to [36]a Roman knight, and bequeathed to her a garden of [37]XX
jugera, near the Appian Way, and close to the [38]Villa Martis: it is
therefore surprising that Portius should write thus:

——nihil Publius
Scipio profuit, nihil ei Lælius, nihil Furius:
Tres per idem tempus qui agitabant nobiles facillime,
Eorum ille opera ne domum quidem habuit conductitiam:
Saltem ut esset, quo referret obitum domini servulus.

“His three great friends, Scipio, Lælius, and Furius, give him no
assistance, nor even enable him to hire a house, that there might at
least be a place where his slave might announce to Rome his
master’s death.”
Afranius[39] prefers Terence to all the comic poets, saying, in his
Compitalia[40].
Terentio non similem dices quempiam.

“Terence is without an equal.”


But Volcatius places him not only after [41]Nævius, [42]Plautus,
and [43]Cæcilius, but even after [44]Licinius. [45]Cicero, in his
ΛΕΙΜΩΝ, writes of Terence thus,

Tu quoque qui solus lecto sermone, Terenti,


Conversum, expressumque Latina voce Menandrum
In medio populi sedatis vocibus effers,
Quicquid come loquens, ac omnia dulcia dicens.

“And thou, also, O Terence, whose pure style alone could make
Menander speak the Latin tongue, thou, with the sweetest harmony
and grace, hast given him to Rome.”
Also Caius Julius Cæsar[46],
Tu quoque tu in Summis, O dimidiate Menander,
Poneris et merito, puri sermonis amator,
Lenibus atque utinam scriptis adjuncta foret vis
Comica ut æquato virtus polleret honore,
Cum Græcis neque in hac despectus parte jaceres,
Unum hoc maceror, et doleo tibi deesse Terenti.

“And thou, also, O thou half Menander, art justly placed among
the most divine poets, for the purity of thy style. O would that humour
had kept pace with ease in all thy writings; then thou wouldest not
have been compelled to yield even to the Greeks; nor could a single
defect have been objected to thee. But, as it is, thou hast this great
defect, and this, O Terence, I lament.”
THE ANDRIAN,
A Comedy,

ACTED AT

THE MEGALESIAN GAMES[47];


IN THE [48]CURULE ÆDILATE OF [49]MARCUS FULVIUS AND
MARCUS GLABRIO[50]; BY THE COMPANY[51] OF LUCIUS
AMBIVIUS TURPIO, AND LUCIUS ATTILIUS[52],
OF PRÆNESTE.

Flaccus, the Freedman of Claudius, composed the Music for


[53]equal Flutes, right and left handed.
[54]It is taken from the Greek, and was published during the
Consulate of Marcus Claudius Marcellus, and Cneus Sulpicius
Galba[55].
Year of Rome 587
Before Our Saviour 162
Author’s Age 27
THE ARGUMENT.
There were in Athens two brothers, Chremes and Phania. The
former making a voyage to Asia, left his infant daughter, named
Pasibula, under the protection of Phania; who, to avoid the
dangers of a war which shortly after convulsed the Grecian
States, quitted Athens, and embarked also for Asia with the infant
Pasibula, designing to rejoin his brother Chremes. His vessel
being wrecked off Andros, he was received and hospitably
entertained by an inhabitant of the island, where he died,
bequeathing his niece to his host, who generously educated her
with his own daughter Chrysis; changing her name from
Pasibula to Glycera. After some years he also died, and his
daughter Chrysis, finding herself reduced to poverty, and avoided
by her relations, removed to Athens, accompanied by her adopted
sister Glycera, or Pasibula. Here, supported by her industry, she
lived for some months in a virtuous seclusion; but after that period
became acquainted with several young Athenians of good family,
whose visits she admitted, hoping perhaps to accomplish an
advantageous marriage either for Glycera or for herself. She was
seduced by pleasure, and her conduct from that time became very
far from irreproachable. Meanwhile a young man, named
Pamphilus, is accidently introduced at her house, sees Glycera,
is enamoured of her; she returns his affections, and they are
privately betrothed; a short time previous to the death of Chrysis,
which happens about three years after her removal to Athens.
Chremes, whom we left in Asia, returned to Athens, and became
the father of another daughter, who was called Philumena; he had
long before formed a friendship with Simo, the father of
Pamphilus. Pamphilus being a youth of great worth and high
reputation, Chremes wishes to bestow on him the hand of his
daughter Philumena. Here the play opens. A report of the
connexion between Pamphilus and Glycera reaching the ears of
Chremes, he breaks off the marriage. Simo conceals this, and to
try the truth of the rumour, proposes Philumena again to his son,
and desires him to wed her instantly. Apprized by his servant
Davus of his father’s artful stratagem, Pamphilus professes his
willingness to marry, thinking by this measure to disappoint it; but
he defeats himself, for from his ready consent, Chremes
concludes the rumour false, and renews the treaty to the great
embarrassment of Pamphilus, which, with the artifices Davus
employs to extricate him, form the most diverting scenes of the
play. However, when the affairs of Pamphilus and Davus are
reduced to extremity, and a breach between father and son
appears inevitable on account of the marriage with Glycera, and
the refusal to accept Philumena, a stranger called Crito, most
opportunely arrives from Andros, and discovers Glycera to be
Pasibula, the daughter of Chremes, who willingly confirms her
the wife of Pamphilus, and bestows Philumena, his other
daughter, on Charinus, a friend of Pamphilus, to the great
satisfaction of all parties.
DRAMATIS PERSONÆ.

Simo, an old man, the father of Pamphilus.


Sosia, the freedman of Simo.
Pamphilus, the son of Simo.
Davus, servant to Pamphilus.
Charinus, a young man, the friend of Pamphilus.
Byrrhia, servant to Charinus.
Chremes, an old man, the friend of Simo.
Crito, a stranger, from the island of Andros.
Dromo, a servant.
Glycera, the Andrian.
Mysis, her maid.
Lesbia, a midwife.

MUTES.

Archillis, Glycera’s nurse.


Servants belonging to Simo.

The Scene lies in Athens, in a street between the


houses of Simo and Glycera.
The Time is about nine hours.
PROLOGUE[56].

Our poet, when first he bent his mind to write, thought that he
undertook no more than to compose Comedies which should please
the people. But he finds himself not a little deceived; and is
compelled to waste his time in making Prologues; not to narrate the
plot of his play, but to answer the snarling malice of an older poet[57].
And now, I pray you, Sirs, observe what they object against our
Author: Menander wrote the [58]Andrian and Perinthian: he who
knows one of them knows both, their plots are so very similar; but
they are different in dialogue, and in style. He confesses that
whatever seemed suitable to the Andrian, he borrowed from the
Perinthian, and used as his own: and this, forsooth, these railers
carp at, and argue against him that Comedies thus mixed are good
for nothing. But, in attempting to shew their wit, they prove their folly:
since, in censuring him, they censure Nævius, Plautus[59], Ennius,
who have given our author a precedent for what he has done: and
whose careless ease he would much rather imitate than their
obscure correctness. But henceforth let them be silent, and cease to
rail; or I give them warning, they shall hear their own faults
published. And now deign to favour the play with your attention; and
give it an impartial hearing, that you may know what is in future to be
expected from the poet, and whether the Comedies that he may
write hereafter, will be worthy to be accepted, or to be rejected by
you.
THE ANDRIAN.

ACT I.

Scene I.
Simo, Sosia, and Slaves, carrying Provisions.
Simo. [60]Carry in those things, directly. (Exeunt Slaves.) Do you
come hither Sosia; I have something to say to you.
Sosia. You mean, I suppose, that I should take care that these
provisions are properly drest.
Simo. No; it’s quite another matter.
Sosia. In what else can my skill be of any service?
Simo. There is no need of your skill in the management of the
affair I am now engaged in; all that I require of you is faithfulness and
secrecy; qualities I know you to possess.
Sosia. I long to hear your commands.
Simo. You well know, Sosia, that from the time when I first bought
you as my slave;[61] even from your childhood until the present
moment; I have been a just and gentle master: you served me with a
free spirit; and I gave you freedom; [62]as the greatest reward in my
power to bestow.
Sosia. Believe me, Sir, I have not forgotten it.
Simo. Nor have you given me any cause to repent that I did
so.[63]
Sosia. I am very glad, Simo, that my past, and present conduct
has been pleasing to you; and I am grateful for your goodness in
receiving my poor services so favourably: but it pains me to be thus
reminded of the benefits you have conferred upon me, as it seems to
upbraid me with having forgotten them.[64] Pray, Sir, let me request
to know your will at once.
Simo. You shall; but first I must inform you that my son’s
marriage, which you expect to take place, is only a feigned marriage.
Sosia. But why do you make use of this deceit?
Simo. [65]You shall hear every thing from the beginning; by which
means you will learn my son’s course of life, my intentions, and the
part I wish you to take in this affair. When my son, Pamphilus,
arrived at man’s estate,[66] of course he was able to live more
according to his own inclination: for, until a man has attained that
age, his disposition does not discover itself, being kept in check
either by his tutor, or by bashfulness, or by his tender years.
Sosia. That is very true.
Simo. Most young men attach themselves chiefly to one particular
pursuit; such, for instance, as breeding horses, keeping hounds, or
frequenting the schools of the philosophers.[67] He did not devote
himself entirely to any one of these: but employed a moderate
portion of his time in each; and I was much pleased to see it.
Sosia. As well you might, for I think that every man, in the conduct
of his life, should adhere to this precept, “Avoid excess.”
Simo. This was his way of life; he bore patiently with every one,
accommodated himself to the tempers of his associates; and fell in
with them in their pursuits; avoided quarrels; and never arrogantly
preferred himself before his companions. Conduct like this will
ensure a man praise without envy, and gain many friends.
Sosia. This was indeed a wise course of life; for in these times[68],
flattery makes friends; truth, foes.
Simo. Meantime, about three years ago, a certain woman,
exceedingly beautiful, and in the flower of her age, removed into this
neighbourhood; she came from the Island of Andros[69]; being
compelled to quit it by her poverty and the neglect of her
relations[70].
Sosia. I augur no good from this woman of Andros.
Simo. At first she lived chastely, and penuriously, and laboured
hard, managing with difficulty to gain a livelihood[71] with the distaff
and the loom: but soon afterwards several lovers made their
addresses to her[72]; promising to repay her favours with rich
presents; and as we all are naturally prone to pleasure, and averse
to labour, she was induced to accept their offers; and at last admitted
all her lovers without scruple. It happened that some of them with
much persuasion prevailed on my son to accompany them to her
house. Aha! thought I, he is caught[73]: he is certainly in love with
her. In the morning I watched their pages going to her house and
returning; I called one of them; Hark ye, boy, prithee tell me who was
the favourite of Chrysis, yesterday? For this was the Andrian’s name.
Sosia. I understand you, Sir.
Simo. I was answered that it was Phædrus, or Clinia, or
Niceratus; for all these were her lovers at that time: well, said I, and
what did Pamphilus there! oh! he paid[74] his share and supped with
the rest. Another day I inquired and received the same answer; and I
was extremely rejoiced that I could learn nothing to attach any blame
to my son. Then I thought that I had proved him sufficiently; and that
he was a miracle of chastity:—for he who has to contend against the
example of men of such vicious inclinations, and can preserve his
mind from its pernicious influence, may very safely be trusted with
the regulation of his own conduct. To increase my satisfaction, every
body joined as if with one voice in the praise of Pamphilus, every
one extolled his virtues, and my happiness, in possessing a son
endued with so excellent a disposition. In short, this his high
reputation induced my friend Chremes to come to me of his own
accord, and offer to give his daughter to Pamphilus with a large
dowry[75]. I contracted [76]my son, as I was much pleased with the
match, which was to have taken place on this very day.
Sosia. And what has happened to prevent it?
Simo. You shall hear: within a few days of this time our neighbour
Chrysis died.
Sosia. O happy news! I was still fearful of some mischief from this
Andrian.
Simo. Upon this occasion my son was continually at the house
with the lovers of Chrysis, and joined with them in the care of her
funeral; meantime he was sad, and sometimes would even weep.
Still I was pleased with all this; if, thought I, he is so much concerned
at the death of so slight an acquaintance, how would he be afflicted
at the loss of one whom he himself loved, or at my death. I attributed
every thing to his humane and affectionate disposition; in short, I
myself, for his sake, attended the funeral, even yet suspecting
nothing.
Sosia. Ah! what has happened then?
Simo. I will tell you. The corpse is carried out; we follow: in the
mean time, among the women who were there[77], I saw one young
girl, with a form so——
Sosia. Lovely, without doubt.
Simo. And with a face, Sosia, so modest, and so charming, that
nothing can surpass it; and as she appeared more afflicted than the
others who were there, and so pre-eminently beautiful[78], and of so
noble a carriage, I approach the women who were following the
body[79], and inquire who she is: they answer, The sister of the
deceased. Instantly the whole truth burst upon me at once: hence
then, thought I, proceed those tears; this sister it is, who is the cause
of all his affliction.
Sosia. How I dread to hear the end of all this!
Simo. In the mean time the procession advances; we follow, and
arrive at the tomb[80]: the corpse is placed on the pile[81], and quickly
enveloped in flames; they weep; while the sister I was speaking of,
rushed forward in an agony of grief toward the fire; and her
imprudence exposed her to great danger. Then, then it was, that
Pamphilus, half dead with terror, publicly betrayed the love he had
hitherto so well concealed: he flew to the spot, and throwing his arms
around her with all the tenderness imaginable; my dearest Glycera,
cried he, what are you about to do? Why do you rush upon
destruction? Upon which she threw herself weeping upon his bosom
in so affectionate a manner, that it was easy enough to perceive their
mutual love.
Sosia. How! is this possible!
Simo. I returned home, scarcely able to contain my anger; but yet
I had not sufficient cause to chide Pamphilus openly; as he might
have replied to me, What have I done amiss, my father? or how have
I offended you? of what am I guilty? I have preserved the life of one
who was going to throw herself into the flames: I prevented her: this
would have been a plausible excuse.
Sosia. You consider this rightly, Sir; for if he who has helped to
save a life is to be blamed for it; what must be done to him who is
guilty of violence and injustice?
Simo. The next day Chremes came to me, and complained of
being shamefully used, as he had discovered for a certainty that
Pamphilus had actually married this strange woman[82]. I positively
denied that this was the case, and he as obstinately insisted on the
truth of it: at last I left him, as he was absolutely resolved to break off
the match.
Sosia. Did you not then rebuke Pamphilus?
Simo. No: there was nothing yet so flagrant as to justify my
rebuke.
Sosia. How so, Sir, pray explain?
Simo. He might have answered me thus: you yourself, my father,
have fixed the time when this liberty must cease; and the period is at
hand when I must conform myself to the pleasure of another: permit
me then, I beseech you, for the short space that remains to me, to
live as my own will prompts me.
Sosia. True. What cause of complaint can you then find against
him?
Simo. If he is induced by his love for this stranger, to refuse to
marry Philumena in obedience to my commands, that offence will lay
him open to my anger; and I am now endeavouring by means of this
feigned marriage, to find a just cause of complaint against him: and,
at the same time, if that rogue Davus has any subtle scheme on foot,
this will induce him to bring it forward now, when it can do no harm;
as I believe that rascal will leave no stone unturned in the affair;
though more for the sake of tormenting me, than with a view to serve
or gratify my son.
Sosia. Why do you suspect that?
Simo. Why? because of a wicked mind one can expect nothing
but wicked intentions[83]. But if I catch him at his tricks—However,
’tis in vain to say more: if it appear, as I trust it will, that my son
makes no objection to the marriage, I have only to gain Chremes,
whom I must prevail upon by entreaty; and I have great hopes that I
shall accomplish it. What I wish you to do is, to assist me in giving
out this marriage for truth, to terrify Davus, and to watch the conduct
of my son, what he does; and what course he and his hopeful
servant resolve upon.
Sosia. It is enough, Sir; I will take care to obey you. Now, I
suppose, we may go in.
Simo. Go, I will follow presently[84].
[Exit Sosia.

Scene II.
Simo, Davus.
Simo. My son, I have no doubt, will refuse to marry; for I observed
that Davus seemed terribly perplexed just now, when he heard that
the match was to take place: but here he comes[85].
Davus. (not seeing Simo.) I wondered that this affair seemed
likely to pass off so easily! and always mistrusted the drift of my old
master’s extraordinary patience and gentleness; who, though he was
refused the wife he wished for, for his son, never mentioned a word
of it to us, or seemed to take any thing amiss.
Simo. (aside.) But now he will, as you shall feel, rascal.
Davus. His design was to entrap us while we were indulging in an
ill-founded joy, and fancied ourselves quite secure. He wished to
take advantage of our heedlessness, and make up the match before
we could prevent him: what a crafty old fellow!
Simo. How this rascal prates[86]!
Davus. Here is my master! he has overheard me! I never saw
him!
Simo. Davus.
Davus. Who calls Davus?
Simo. Come hither, sirrah.
Davus. (aside.) What can he want with me?
Simo. What were you saying?
Davus. About what, Sir?
Simo. About what, Sir? The world says that my son has an
intrigue.
Davus. Oh! Sir, the world cares a great deal about that, no doubt.
Simo. Are you attending to this, Sir?
Davus. Yes, Sir, certainly.
Simo. It does not become me to inquire too strictly into the truth of
these reports. I shall not concern myself in what he has done
hitherto; for as long as circumstances allowed of it, I left him to
himself: but it is now high time that he should alter and lead a new
life. Therefore, Davus, I command, and even entreat, that you will
prevail on him to amend his conduct.
Davus. What is the meaning of all this discourse?
Simo. Those who have love intrigues on their hands are generally
very averse to marriage.
Davus. So I have heard.
Simo. And if any of them manage such an affair after the counsel
of a knave, ’tis a hundred to one but the rogue will take advantage of
their weakness, and lead them a step further, from being love-sick to
some still greater scrape or imprudence.
Davus. Truly, Sir, I don’t understand what you said last.
Simo. No! not understand it!
Davus. No. I am not Œdipus[87] but Davus.
Simo. Then you wish that what I have to say should be explained
openly and without reserve.
Davus. Certainly I do.
Simo. Then, sirrah, if I discover that you endeavour to prevent my
son’s marriage by any of your crafty tricks; or interfere in this
business to show your cunning; you may rely on receiving a few
scores of lashes, and a situation in the grinding-house[88] for life:
upon this token, moreover, that when I liberate you from thence, I will
grind in your stead. Is this plain enough for you, or don’t you
understand yet?
Davus. Oh, perfectly! you come to the point at once: you don’t use
much circumlocution, i’faith.
Simo. Remember! In this affair above all others, if you begin
plotting, I will never forgive it.
Davus. Softly, worthy Sir, softly, good words I beg of you.
Simo. So! you are merry upon it, are you, but I am not to be
imposed upon. I advise you, finally, to take care what you do: you
cannot say you have not had fair warning.
[Exit.

Scene III[89].
Davus.
In truth, friend Davus, from what I have just heard from the old
man about the marriage, I think thou hast no time to lose. This affair
must be [90]handled dexterously, or either my young master or I must
be quite undone. Nor have I yet resolved which side to take; whether
I shall assist Pamphilus, or obey his father. If I abandon the son, I
fear his happiness will be destroyed: if I help him, I dread the threats
of the old man, who is as crafty as a fox. First, he has discovered his
son’s intrigue, and keeps a jealous eye upon me, lest I should set
some scheme a-foot to retard the marriage. If he finds out the least
thing, I am undone[91], for right or wrong, if he once takes the whim
into his head, he will soon find a pretence for sending me to grind in
the mill for my life; and, to crown our disasters, this Andrian,
Pamphilus’s wife or mistress, I know not which, is with child by him:
’tis strange enough to hear their presumption. I think their
[92]intentions savour more of madness than of any thing else: boy or

girl, say they, the child shall be brought up[93]. They have made up
among them too, some story or other, to prove that she is a citizen of
Athens[94]. Thus runs the tale. Once upon a time there was a certain
old merchant[95], who was shipwrecked upon the island of Andros,
where he afterwards died, and the father of Chrysis took in his
helpless little orphan, who was this very Glycera. Fables! for my part
I don’t believe a word of it: however, they themselves are vastly
pleased with the story. But here comes her maid Mysis. Well, I’ll
betake myself to the Forum[96], and look for Pamphilus: lest his
father should surprise him with this marriage before I can tell him any
thing of the matter.
[Exit.
Scene IV.
Mysis.
[97]I understand you, Archillis: you need not stun me with the
same thing over so often: you want me to fetch the midwife Lesbia:
in truth, she’s very fond of the dram-bottle, and very headstrong; and
I should think she was hardly skilful enough to attend a woman in her
first labour.—However, I’ll bring her.——Mark how [98]importunate
this [99]old baggage is to have her fellow-gossip, that they may tipple
together. Well, may Diana grant my [100]poor mistress a happy
minute; and that Lesbia’s want of skill may be shewn any where
rather than here. But what do I see? here comes Pamphilus,
seemingly half-distracted, surely something is the matter. I will stay
and see whether this agitation is not the forerunner of some
misfortune.

Scene V.
Pamphilus, Mysis[101].
Pam. Heavens! is it possible that any human being, much less a
father, could be guilty of an action like this?
Mysis. (aside.) What can be the matter?
Pam. By the faith of gods and men, if ever any one was
unworthily treated, I am. He peremptorily resolved that I should be
married on this very day. Why was not I informed of this before? Why
was not I consulted?
Mysis. (aside.) Miserable woman that I am! what do I hear?
Pam. And why has Chremes changed his mind, who obstinately
persisted in refusing me his daughter, after he heard of my
imprudence[102]? Can he do this to tear me from my dearest
Glycera? Alas! if I lose her, I am utterly undone. Was there ever such
an unfortunate lover?—was there ever such an unhappy man as I
am? Heavens and earth! will this persecution never end? Shall I
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like