Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang instant download
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang instant download
https://ebookname.com/product/data-mining-patterns-new-methods-
and-applications-premier-reference-source-1st-edition-pascal-
poncelet/
https://ebookname.com/product/visual-knowledge-modeling-for-
semantic-web-technologies-models-and-ontologies-premier-
reference-source-first-edition-gilbert-paquette/
https://ebookname.com/product/biomedical-image-analysis-and-
machine-learning-technologies-applications-and-techniques-
premier-reference-source-1st-edition-fabio-a-gonzalez/
https://ebookname.com/product/practical-atlas-of-ruminant-and-
camelid-reproductive-ultrasonography-1st-edition-luc-descoteaux/
Algorithmic and High Frequency Trading 1st Edition
Álvaro Cartea
https://ebookname.com/product/algorithmic-and-high-frequency-
trading-1st-edition-alvaro-cartea/
https://ebookname.com/product/rabbit-creek-country-three-
ranching-lives-in-the-heart-of-the-mountain-west-jon-thiem/
https://ebookname.com/product/flying-on-one-engine-the-bloomberg-
book-of-master-market-economists-1st-edition-thomas-r-keene/
https://ebookname.com/product/professional-office-business-
application-development-using-microsoft-office-sharepoint-
server-2007-and-vsto-wrox-programmer-to-programmer-1st-edition-
steve-fox/
https://ebookname.com/product/design-methods-and-practices-for-
research-of-project-management-second-edition-beverly-pasian-
rodney-turner/
Microsoft PerformancePoint 2007 For Dummies For Dummies
Computer Tech 1st Edition Rachel Blum
https://ebookname.com/product/microsoft-
performancepoint-2007-for-dummies-for-dummies-computer-tech-1st-
edition-rachel-blum/
Visual Analytics and
Interactive Technologies:
Data, Text and Web Mining
Applications
Qingyu Zhang
Arkansas State University, USA
Richard S. Segall
Arkansas State University, USA
Mei Cao
University of Wisconsin-Superior, USA
Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or com-
panies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Visual analytics and interactive technologies : data, text, and web mining
applications / Qingyu Zhang, Richard Segall, and Mei Cao, editors.
p. cm.
Includes bibliographical references and index.
Summary: "This book is a comprehensive reference on concepts, algorithms,
theories, applications, software, and visualization of data mining, text
mining, Web mining and computing/supercomputing, covering state-of-the-art of
the theory and applications of mining"-- Provided by publisher.
ISBN 978-1-60960-102-7 (hardcover) -- ISBN 978-1-60960-104-1 (ebook) 1.
Data mining. I. Zhang, Qingyu, 1970- II. Segall, Richard, 1949- III. Cao,
Mei, 1969-
QA76.9.D343V568 2011
006.3'12--dc22
2010042271
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
List of Reviewers
Mieczysław A. Kłopotek, Polish Academy of Sciences, Poland
N. Ranga Suri, Centre for Artificial Intelligence and Robotics, India
P. Alagambigai, Easwari Engineering College, India
Daniel Rivero, University of A Coruña, Spain
Tri Kurniawan Wijaya, Sekolah Tinggi Teknik Surabaya, Indonesia
Tzu-Liang (Bill) Tseng, The University of Texas at El Paso, USA
Marko Robnik-Šikonja, University of Ljubljana, Slovenia
Alan Olinsky, Bryant University, USA
Roberto Marmo, University of Pavia, Italy
H. Hannah Inbarani, Periyar University, India
Carson Kai-Sang Leung, The University of Manitoba, Canada
R. Roselin, Sri Sarada College for Women, India
Riadh Hammami, Université Laval, Canada
Anca Doloc-Mihu, Emory University, USA
Mei Cao, University of Wisconsin-Superior, USA
Richard S. Segall, Arkansas State University, USA
Qingyu Zhang, Arkansas State University, USA
Table of Contents
Section 1
Concepts, Algorithms, and Theory
Chapter 1
Towards the Notion of Typical Documents in Large Collections of Documents ................................... 1
Mieczysław A. Kłopotek, Polish Academy of Sciences, Poland & University of Natural
and Human Sciences, Poland
Sławomir T. Wierzchoń, Polish Academy of Sciences & University of Gdańsk, Poland
Krzysztof Ciesielski, Polish Academy of Sciences, Poland
Michał Dramiński, Polish Academy of Sciences, Poland
Dariusz Czerski, Polish Academy of Sciences, Poland
Chapter 2
Data Mining Techniques for Outlier Detection..................................................................................... 19
N N R Ranga Suri, C V Raman Nagar, India
M Narasimha Murty, Indian Institute of Science, India
G Athithan, C V Raman Nagar, India
Chapter 3
Using an Ontology-Based Framework to Extract External Web Data for the Data Warehouse ........... 39
Charles Greenidge, University of the West Indies, Barbados
Hadrian Peter, University of the West Indies, Barbados
Chapter 4
Dimensionality Reduction for Interactive Visual Clustering: A Comparative Analysis ....................... 60
P. Alagambigai, Easwari Engineering College, India
K. Thangavel, Periyar University, India
Chapter 5
Database Analysis with ANNs by Means of Graph Evolution ............................................................. 79
Daniel Rivero, University of A Coruña, Spain
Julián Dorado, University of A Coruña, Spain
Juan R. Rabuñal, University of A Coruña, Spain
Alejandro Pazos, University of A Coruña, Spain
Chapter 6
An Optimal Categorization of Feature Selection Methods for Knowledge Discovery ........................ 94
Harleen Kaur, Hamdard University, India
Ritu Chauhan, Hamdard University, India
M. A. Alam, Hamdard University, India
Chapter 7
From Data to Knowledge: Data Mining ............................................................................................ 109
Tri Kurniawan Wijaya, Sekolah Tinggi Teknik Surabaya, Indonesia
Section 2
Applications of Mining and Visualization
Chapter 8
Patent Infringement Risk Analysis Using Rough Set Theory ............................................................. 123
Chun-Che Huang, National Chi Nan University, Taiwan
Tzu-Liang (Bill) Tseng, The University of Texas at El Paso, USA
Hao-Syuan Lin, National Chi Nan University, Taiwan
Chapter 9
Visual Survey Analysis in Marketing ................................................................................................. 151
Marko Robnik-Šikonja, University of Ljubljana, Slovenia
Koen Vanhoof, University of Hasselt, Belgium
Chapter 10
Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate ......................... 169
Alan Olinsky, Bryant University, USA
Phyllis Schumacher, Bryant University, USA
John Quinn, Bryant University, USA
Chapter 11
A Robust Biclustering Approach for Effective Web Personalization ................................................. 186
H. Hannah Inbarani, Periyar University, India
K. Thangavel, Periyar University, India
Chapter 12
Web Mining and Social Network Analysis ......................................................................................... 202
Roberto Marmo, University of Pavia, Italy
Section 3
Visual Systems, Software and Supercomputing
Chapter 13
iVAS: An Interactive Visual Analytic System for Frequent Set Mining ............................................. 213
Carson Kai-Sang Leung, The University of Manitoba, Canada
Christopher L. Carmichael, The University of Manitoba, Canada
Chapter 14
Mammogram Mining Using Genetic Ant-Miner ................................................................................ 232
K. Thangavel, Periyar University, India
R. Roselin, Sri Sarada College for Women, India
Chapter 15
Use of SciDBMaker as Tool for the Design of Specialized Biological Databases ............................. 251
Riadh Hammami, Université Laval, Canada
Ismail Fliss, Université Laval, Canada
Chapter 16
Interactive Visualization Tool for Analysis of Large Image Databases .............................................. 266
Anca Doloc-Mihu, Emory University, USA
Chapter 17
Supercomputers and Supercomputing ................................................................................................ 282
Jeffrey S. Cook, Arkansas State University, USA
Section 1
Concepts, Algorithms, and Theory
Chapter 1
Towards the Notion of Typical Documents in Large Collections of Documents ................................... 1
Mieczysław A. Kłopotek, Polish Academy of Sciences, Poland & University of Natural
and Human Sciences, Poland
Sławomir T. Wierzchoń, Polish Academy of Sciences & University of Gdańsk, Poland
Krzysztof Ciesielski, Polish Academy of Sciences, Poland
Michał Dramiński, Polish Academy of Sciences, Poland
Dariusz Czerski, Polish Academy of Sciences, Poland
The chapter focuses on how to best represent a typical document in a large collection of objects (i.e.,
documents). They propose a new measure of document similarity – GNGrank that was inspired by
the popular idea that links between documents reflect similar content. The idea was to create a rank
measure based on the well known PageRank algorithm which exploits the document similarity to insert
links between the documents. Various link-based similarity measures (e.g., PageRank) and GNGrank
are compared in the context of identification of a typical document of a collection. The experimental
results suggest that each algorithm measures something different, a different aspect of document space,
and hence the respective degrees of typicality do not correlate.
Chapter 2
Data Mining Techniques for Outlier Detection..................................................................................... 19
N N R Ranga Suri, C V Raman Nagar, India
M Narasimha Murty, Indian Institute of Science, India
G Athithan, C V Raman Nagar, India
The chapter highlights some of the important research issues that determine the nature of the outlier
detection algorithm required for a typical data mining application. Detecting the objects in a data set
with unusual properties is important since such outlier objects often contain useful information on
abnormal behavior of the system or its components described by the data set. They discussed issues
including methods of outlier detection, size and dimensionality of the data set, and nature of the target
application. They attempt to cover the challenges due to the large volume of high dimensional data and
possible research directions with a survey of various data mining techniques dealing with the outlier
detection problem.
Chapter 3
Using an Ontology-Based Framework to Extract External Web Data for the Data Warehouse ........... 39
Charles Greenidge, University of the West Indies, Barbados
Hadrian Peter, University of the West Indies, Barbados
The chapter proposes a meta-data engine for extracting external data in the Web for data warehouses
that forms a bridge between the data warehouse and search engine environments. This chapter also
presents a framework named the semantic web application that facilitates semi-automatic matching of
instance data from opaque web databases using ontology terms. The framework combines information
retrieval, information extraction, natural language processing, and ontology techniques to produce a
viable building block for semantic web applications. The application uses a query modifying filter to
maximize efficiency in the search process. The ontology-based model consists of a pre-processing stage
aimed at filtering, a basic and then more advanced matching phases, a combination of thresholds and
a weighting that produces a matrix that is further normalized, and a labeling process that matches data
items to ontology terms.
Chapter 4
Dimensionality Reduction for Interactive Visual Clustering: A Comparative Analysis ....................... 60
P. Alagambigai, Easwari Engineering College, India
K. Thangavel, Periyar University, India
The chapter discusses VISTA as a Visual Clustering Rendering System that can include algorithmic
clustering results and serve as an effective validation and refinement tool for irregularly shaped clusters.
Interactive visual clustering methods allow a user to partition a data set into clusters that are appropri-
ate for their tasks and interests through an efficient visualization model and it requires an effective
human-computer interaction. This chapter entails the reliable human-computer interaction through di-
mensionality reduction by comparing three different kinds of dimensionality reduction methods: (1)
Entropy Weighting Feature Selection (EWFS), (2) Outlier Score Based Feature Selection (OSFS), and
(3) Contribution to the Entropy based Feature Selection (CEFS). The performance of the three feature
selection methods were compared with clustering of dataset using the whole set of features. The perfor-
mance was measured with popular validity measure Rand Index.
Chapter 5
Database Analysis with ANNs by Means of Graph Evolution ............................................................. 79
Daniel Rivero, University of A Coruña, Spain
Julián Dorado, University of A Coruña, Spain
Juan R. Rabuñal, University of A Coruña, Spain
Alejandro Pazos, University of A Coruña, Spain
The chapter proposes a new technique of graph evolution based ANN and compares it with other sys-
tems such as Connectivity Matrix, Pruning, Finding network parameters, and Graph-rewriting gram-
mar. Traditionally the development of Artificial Neural Networks (ANNs) is a slow process guided by
the expert knowledge. This chapter describes a new method for the development of Artificial Neural
Networks, so it becomes completely automated. Several tests were performed with some of the most
used test databases in data mining. The performance of the proposed system is better or in par with
other systems.
Chapter 6
An Optimal Categorization of Feature Selection Methods for Knowledge Discovery ........................ 94
Harleen Kaur, Hamdard University, India
Ritu Chauhan, Hamdard University, India
M. A. Alam, Hamdard University, India
The chapter focuses on several feature selection methods as to their effectiveness in preprocessing input
medical data. Feature selection is an active research area in pattern recognition and data mining com-
munities. They evaluate several feature selection algorithms such as Mutual Information Feature Selec-
tion (MIFS), Fast Correlation-Based Filter (FCBF) and Stepwise Discriminant Analysis (STEPDISC)
with machine learning algorithm naive Bayesian and Linear Discriminant analysis techniques. The
experimental analysis of feature selection technique in medical databases shows that a small number of
informative features can be extracted leading to improvement in medical diagnosis by reducing the size
of data set, eliminating irrelevant features, and decreasing the processing time.
Chapter 7
From Data to Knowledge: Data Mining ............................................................................................ 109
Tri Kurniawan Wijaya, Sekolah Tinggi Teknik Surabaya, Indonesia
The chapter conceptually discusses the techniques to mine hidden information or knowledge which lies
in data. In addition to the elaboration of the concept and theory, they also discuss about the application
and implementation of data mining. They start with differences among data, information, and knowl-
edge, and then proceed to describe the process of gaining the hidden knowledge, and compare data
mining with other closely related terminologies such as data warehouse and OLAP.
Section 2
Applications of Mining and Visualization
Chapter 8
Patent Infringement Risk Analysis Using Rough Set Theory ............................................................. 123
Chun-Che Huang, National Chi Nan University, Taiwan
Tzu-Liang (Bill) Tseng, The University of Texas at El Paso, USA
Hao-Syuan Lin, National Chi Nan University, Taiwan
The chapter applies rough set theory (RST), which is suitable for processing qualitative information,
to induce rules to derive significant attributes for categorization of the patent infringement risk. Pat-
ent infringement risk is an important issue for firms due to the increased appreciation of intellectual
property rights. If a firm gives insufficient protection to its patents, it may loss both profits and industry
competitiveness. Rather than focusing on measuring the patent trend indicators and the patent monetary
value, they integrate RST with the use of the concept hierarchy and the credibility index, to enhance
application of the final decision rules.
Chapter 9
Visual Survey Analysis in Marketing ................................................................................................. 151
Marko Robnik-Šikonja, University of Ljubljana, Slovenia
Koen Vanhoof, University of Hasselt, Belgium
The chapter makes use of the ordinal evaluation (OrdEval) algorithm as a visualization technique to
study questionnaire data of customer satisfaction in marketing. The OrdEval algorithm has many favor-
able features, including context sensitivity, ability to exploit meaning of ordered features and ordered
response, robustness to noise and missing values in the data, and visualization capability. They choose
customer satisfaction analysis as a case study and present visual analysis on two applications of busi-
ness-to-business and costumer-to-business. They demonstrate some interesting advantages offered by
the new methodology and visualization and show how to extract and interpret new insights not avail-
able with classical analytical toolbox.
Chapter 10
Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate ......................... 169
Alan Olinsky, Bryant University, USA
Phyllis Schumacher, Bryant University, USA
John Quinn, Bryant University, USA
The chapter entails the use of several types of predictive models to perform data mining to evaluate the
student retention rate and enrollment management for those selecting a major in the Actuarial Science
at a medium size university. The predictive models utilized in this research include stepwise logistic
regression, neural networks and decision trees for performing the data mining. This chapter uses data
mining to investigate the percentages of students who begin in a certain major and will graduate in the
same major. This information is important for individual academic departments in determining how to
allocate limited resources in making decisions as to the appropriate number of classes and sections to
be offered and the number of faculty lines needed to staff the department. This chapter details a study
that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial
mathematics students and then either drop out of the major or graduate as actuarial students.
Chapter 11
A Robust Biclustering Approach for Effective Web Personalization ................................................. 186
H. Hannah Inbarani, Periyar University, India
K. Thangavel, Periyar University, India
The chapter proposes a robust Biclustering algorithm to disclose the correlation between users and
pages based on constant values for integrating user clustering and page clustering techniques, which is
followed by a recommendation system that can respond to the users’ individual interests. The proposed
method is compared with Simple Biclustering (SB) method. To evaluate the effectiveness and effi-
ciency of the recommendation, experiments are conducted in terms of the recommendation accuracy
metric. The experimental results demonstrated that the proposed RB method is very simple and is able
to efficiently extract needed usage knowledge and to accurately make web recommendations.
Chapter 12
Web Mining and Social Network Analysis ......................................................................................... 202
Roberto Marmo, University of Pavia, Italy
The chapter reviews and discusses the use of web mining techniques and social networks analysis to
possibly process and analyze large amount of social data such as blogtagging, online game playing, in-
stant messenger, etc. Social network analysis views social relationships in terms of network and graph
theory about nodes (individual actors within the network) and ties (relationships between the actors).
In this way, social network mining can help understand the social structure, social relationships and
social behaviours. These algorithms differ from established set of data mining algorithms developed to
analyze individual records since social network datasets are relational with the centrality of relations
among entities.
Section 3
Visual Systems, Software and Supercomputing
Chapter 13
iVAS: An Interactive Visual Analytic System for Frequent Set Mining ............................................. 213
Carson Kai-Sang Leung, The University of Manitoba, Canada
Christopher L. Carmichael, The University of Manitoba, Canada
The chapter proposes an interactive visual analytic system called iVAS for providing visual analytic so-
lutions to the frequent set mining problem. The system enables the visualization and advanced analysis
of the original transaction databases as well as the frequent sets mined from these databases. Numer-
ous algorithms have been proposed for finding frequent sets of items, which are usually presented in a
lengthy textual list. However, the use of visual representations can enhance user understanding of the
inherent relations among the frequent sets.
Chapter 14
Mammogram Mining Using Genetic Ant-Miner ................................................................................ 232
K. Thangavel, Periyar University, India
R. Roselin, Sri Sarada College for Women, India
The chapter applies classification algorithm to image processing (e.g., mammogram processing) using
genetic Ant-Miner. Image mining deals with the extraction of implicit knowledge, image data relation-
ship, or other patterns not explicitly stored in the images. It is an extension of data mining to image
domain and an interdisciplinary endeavor. C4.5 and Ant-Miner algorithms are compared and the ex-
perimental results show that Ant-Miner performs better in the domain of biomedical image analysis.
Chapter 15
Use of SciDBMaker as Tool for the Design of Specialized Biological Databases ............................. 251
Riadh Hammami, Université Laval, Canada
Ismail Fliss, Université Laval, Canada
The chapter develops SciDBMaker to provide a tool for easy building of new specialized protein
knowledge bases. The exponential growth of molecular biology research in recent decades has brought
growth in the number and size of genomic and proteomic databases to enhance the understanding of
biological processes. This chapter also suggests best practices for specialized biological databases de-
sign, and provides examples for the implementation of these practices.
Chapter 16
Interactive Visualization Tool for Analysis of Large Image Databases .............................................. 266
Anca Doloc-Mihu, Emory University, USA
The chapter discusses an Adaptive Image Retrieval System (AIRS) that is used as a tool for actively
searching for information in large image databases. This chapter identifies two types of users for an
AIRS: an end-user who seeks images and a research-user who designs and researches the collection
and retrieval systems. This chapter focuses in visualization techniques used by Web-based AIRS to al-
low different users to efficiently navigate, search and analyze large image databases. Recent advances
in Internet technology require the development of advanced Web-based tools for efficiently accessing
images from tremendously large, and continuously growing, image collections. One such tool for ac-
tively searching for information is an Image Retrieval System. The interface discussed in this chapter
illustrates different relationships between images by using visual attributes (colors, shape, and proximi-
ties), and supports retrieval and learning, as well as browsing which makes it suitable for an Adaptive
Image Retrieval Systems.
Chapter 17
Supercomputers and Supercomputing ................................................................................................ 282
Jeffrey S. Cook, Arkansas State University, USA
The chapter describes supercomputer as the fastest type of computer used for specialized applications
that require a massive number of mathematical calculations. The term “supercomputer” was coined in
1929 by the New York World, referring to tabulators manufactured by IBM. These tabulators represent
the cutting edge of technology, which harness immense processing power so that they are incredibly
fast, sophisticated, and powerful. The use of supercomputing in data mining has also been discussed in
the chapter.
Preface
Large volumes of data and complex problems inspire research in computing and data, text, and web
mining. However, analyzing data is not sufficient, as it has to be presented visually with analytical ca-
pabilities, i.e., a chart/diagram/image illustration that enables humans to perceive, relate, and conclude
in the knowledge discovery process. In addition, how to use computing or supercomputing techniques
(e.g., distributed, parallel, and clustered computing) in improving the effectiveness of data, text, and web
mining is an important aspect of the visual analytics and interactive technology. This book extends the
visual analytics by using tools of data, web, text mining and computing, and their associated software
and technologies available today.
This is a comprehensive book on concepts, algorithms, theories, applications, software, and visu-
alization of data mining and computing. It provides a volume of coherent set of related works on the
state-of-the-art of the theory and applications of mining and its relations to computing, visualization
and others with an audience to include both researchers, practitioners, professionals and intellectuals in
technical and non-technical fields, appealing to a multi-disciplinary audience. Because each chapter is
designed to be stand-alone, readers can focus on the topics that most interest them.
With a unique collection of recent developments, novel applications, and techniques for visual ana-
lytics and interactive technologies, the sections of the book are Concepts, Algorithms, and Theory; Ap-
plications of Mining and Visualization; and Visual Systems, Software and Supercomputing, pertaining
to Data mining, Web mining, Data Visualization, Mining for Intelligence, Supercomputing, Database,
Ontology, Web Clustering, Classification, Pattern Recognition, Visualization Approaches, Data and
Knowledge Representation, and Web Intelligence.
Section 1 consists of seven chapters on concepts, algorithms, and theory of mining and visualizations.
Chapter 1, Towards the Notion of Typical Documents in Large Collections of Documents, by Mieczysław
A. Kłopotek, Sławomir T. Wierzchom, Krzysztof Ciesielski, Michał Dramiński, and Dariusz Czerski,
focuses on how to best represent a typical document in a large collection of objects (i.e., documents).
They propose a new measure of document similarity – GNGrank that was inspired by the popular idea
that links between documents reflect similar content. The idea was to create a rank measure based on
the well known PageRank algorithm which exploits the document similarity to insert links between the
documents. Various link-based similarity measures (e.g., PageRank) and GNGrank are compared in the
context of identification of a typical document of a collection. The experimental results suggest that each
algorithm measures something different, a different aspect of document space, and hence the respective
degrees of typicality do not correlate.
Chapter 2, Data Mining Techniques for Outlier Detection, by N. Ranga Suri, M Narasimha Murty,
and G Athithan, highlights some of the important research issues that determine the nature of the outlier
xv
detection algorithm required for a typical data mining application. Detecting the objects in a data set with
unusual properties is important; as such outlier objects often contain useful information on abnormal be-
havior of the system or its components described by the data set. They discussed issues including methods
of outlier detection, size and dimensionality of the data set, and nature of the target application. They
attempt to cover the challenges due to the large volume of high dimensional data and possible research
directions with a survey of various data mining techniques dealing with the outlier detection problem.
Chapter 3, Using an Ontology-based Framework to Extract External Web Data for the Data Ware-
house, by Charles Greenidge and Hadrian Peter, proposes a meta-data engine for extracting external
data in the Web for data warehouses that forms a bridge between the data warehouse and search engine
environments. This chapter also presents a framework named the semantic web application that facili-
tates semi-automatic matching of instance data from opaque web databases using ontology terms. The
framework combines information retrieval, information extraction, natural language processing, and
ontology techniques to produce a viable building block for semantic web applications. The application
uses a query modifying filter to maximize efficiency in the search process. The ontology-based model
consists of a pre-processing stage aimed at filtering, a basic and then more advanced matching phases,
a combination of thresholds and a weighting that produces a matrix that is further normalized, and a
labeling process that matches data items to ontology terms.
Chapter 4, Dimensionality Reduction for Interactive Visual Clustering: A Comparative Analysis,
by P. Alagambigai and K. Thangavel, discusses VISTA as a Visual Clustering Rendering System that
can include algorithmic clustering results and serve as an effective validation and refinement tool for
irregularly shaped clusters. Interactive visual clustering methods allow a user to partition a data set into
clusters that are appropriate for their tasks and interests through an efficient visualization model and
it requires an effective human-computer interaction. This chapter entails the reliable human-computer
interaction through dimensionality reduction by comparing three different kinds of dimensionality re-
duction methods: (1) Entropy Weighting Feature Selection (EWFS), (2) Outlier Score Based Feature
Selection (OSFS), and (3) Contribution to the Entropy based Feature Selection (CEFS). The performance
of the three feature selection methods were compared with clustering of dataset using the whole set of
features. The performance was measured with popular validity measure Rand Index.
Chapter 5, Database Analysis with ANNs by Means of Graph Evolution, by Daniel Rivero, Julián
Dorado, Juan R. Rabuñal, and Alejandro Pazos, proposes a new technique of graph evolution based ANN
and compares it with other systems such as Connectivity Matrix, Pruning, Finding network parameters,
and Graph-rewriting grammar. Traditionally the development of Artificial Neural Networks (ANNs) is
a slow process guided by the expert knowledge. This chapter describes a new method for the develop-
ment of Artificial Neural Networks, so it becomes completely automated. Several tests were performed
with some of the most used test databases in data mining. The performance of the proposed system is
better or in par with other systems.
Chapter 6, An Optimal Categorization of Feature Selection Methods for Knowledge Discovery,
by Harleen Kaur, Ritu Chauhan, and M. A. Alam, focuses on several feature selection methods as to
their effectiveness in preprocessing input medical data. Feature selection is an active research area in
pattern recognition and data mining communities. They evaluate several feature selection algorithms
such as Mutual Information Feature Selection (MIFS), Fast Correlation-Based Filter (FCBF) and Step-
wise Discriminant Analysis (STEPDISC) with machine learning algorithm naive Bayesian and Linear
Discriminant analysis techniques. The experimental analysis of feature selection technique in medical
databases shows that a small number of informative features can be extracted leading to improvement
xvi
in medical diagnosis by reducing the size of data set, eliminating irrelevant features, and decreasing the
processing time.
Chapter 7, From Data to Knowledge: Data Mining, by Tri Kurniawan Wijaya, conceptually discusses
the techniques to mine hidden information or knowledge which lies in data. In addition to the elaboration
of the concept and theory, they also discuss about the application and implementation of data mining.
They start with differences among data, information, and knowledge, and then proceed to describe the
process of gaining the hidden knowledge, and compare data mining with other closely related terminolo-
gies such as data warehouse and OLAP.
Section 2 consists of five chapters on applications of mining and visualizations.
Chapter 8, Patent Infringement Risk Analysis Using Rough Set Theory, by Chun-Che Huang, Tzu-
Liang (Bill) Tseng, and Hao-Syuan Lin, applies rough set theory (RST), which is suitable for processing
qualitative information, to induce rules to derive significant attributes for categorization of the patent
infringement risk. Patent infringement risk is an important issue for firms due to the increased apprecia-
tion of intellectual property rights. If a firm gives insufficient protection to its patents, it may loss both
profits and industry competitiveness. Rather than focusing on measuring the patent trend indicators and
the patent monetary value, they integrate RST with the use of the concept hierarchy and the credibility
index, to enhance application of the final decision rules.
Chapter 9, Visual Survey Analysis in Marketing, by Marko Robnik-Šikonja and Koen Vanhoof, makes
use of the ordinal evaluation (OrdEval) algorithm as a visualization technique to study questionnaire
data of customer satisfaction in marketing. The OrdEval algorithm has many favorable features, includ-
ing context sensitivity, ability to exploit meaning of ordered features and ordered response, robustness
to noise and missing values in the data, and visualization capability. They choose customer satisfaction
analysis as a case study and present visual analysis on two applications of business-to-business and
costumer-to-business. They demonstrate some interesting advantages offered by the new methodol-
ogy and visualization and show how to extract and interpret new insights not available with classical
analytical toolbox.
Chapter 10, Assessing Data Mining Approaches for Analyzing Actuarial Student Success Rate, by
Alan Olinsky, Phyllis Schumacher, and John Quinn, entails the use of several types of predictive mod-
els to perform data mining to evaluate the student retention rate and enrollment management for those
selecting a major in the Actuarial Science at a medium size university. The predictive models utilized
in this research include stepwise logistic regression, neural networks and decision trees for performing
the data mining. This chapter uses data mining to investigate the percentages of students who begin in a
certain major and will graduate in the same major. This information is important for individual academic
departments in determining how to allocate limited resources in making decisions as to the appropriate
number of classes and sections to be offered and the number of faculty lines needed to staff the depart-
ment. This chapter details a study that utilizes data mining techniques to analyze the characteristics of
students who enroll as actuarial mathematics students and then either drop out of the major or graduate
as actuarial students.
Chapter 11, A Robust Biclustering Approach for Effective Web Personalization, by H. Hannah In-
barani and K. Thangavel, proposes a robust Biclustering algorithm to disclose the correlation between
users and pages based on constant values for integrating user clustering and page clustering techniques,
which is followed by a recommendation system that can respond to the users’ individual interests. The
proposed method is compared with Simple Biclustering (SB) method. To evaluate the effectiveness and
efficiency of the recommendation, experiments are conducted in terms of the recommendation accuracy
xvii
metric. The experimental results demonstrated that the proposed RB method is very simple and is able
to efficiently extract needed usage knowledge and to accurately make web recommendations.
Chapter 12, Web Mining and Social Network Analysis, by Roberto Marmo, reviews and discusses
the use of web mining techniques and social networks analysis to possibly process and analyze large
amount of social data such as blogtagging, online game playing, instant messenger, etc. Social network
analysis views social relationships in terms of network and graph theory about nodes (individual actors
within the network) and ties (relationships between the actors). In this way, social network mining can
help understand the social structure, social relationships and social behaviours. These algorithms dif-
fer from established set of data mining algorithms developed to analyze individual records since social
network datasets are relational with the centrality of relations among entities.
Section 3 consists of five chapters on visual systems, software and supercomputing.
Chapter 13, iVAS: An Interactive Visual Analytic System for Frequent Set Mining, by Carson Kai-Sang
Leung and Christopher L. Carmichael, proposes an interactive visual analytic system called iVAS for
providing visual analytic solutions to the frequent set mining problem. The system enables the visualiza-
tion and advanced analysis of the original transaction databases as well as the frequent sets mined from
these databases. Numerous algorithms have been proposed for finding frequent sets of items, which are
usually presented in a lengthy textual list. However, the use of visual representations can enhance user
understanding of the inherent relations among the frequent sets.
Chapter 14, Mammogram Mining Using Genetic Ant-Miner, by Thangavel. K. and Roselin. R, applies
classification algorithm to image processing (e.g., mammogram processing) using genetic Ant-Miner.
Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns
not explicitly stored in the images. It is an extension of data mining to image domain and an interdisci-
plinary endeavor. C4.5 and Ant-Miner algorithms are compared and the experimental results show that
Ant-Miner performs better in the domain of biomedical image analysis.
Chapter 15, Use of SciDBMaker as Tool for the Design of Specialized Biological Databases, by Riadh
Hammami and Ismail Fliss, develops SciDBMaker to provide a tool for easy building of new specialized
protein knowledge bases. The exponential growth of molecular biology research in recent decades has
brought growth in the number and size of genomic and proteomic databases to enhance the understanding
of biological processes. This chapter also suggests best practices for specialized biological databases
design, and provides examples for the implementation of these practices.
Chapter 16, Interactive Visualization Tool for Analysis of Large Image Databases, by Anca Doloc-Mihu,
discusses an Adaptive Image Retrieval System (AIRS) that is used as a tool for actively searching for
information in large image databases. This chapter identifies two types of users for an AIRS: an end-user
who seeks images and a research-user who designs and researches the collection and retrieval systems.
This chapter focuses in visualization techniques used by Web-based AIRS to allow different users to
efficiently navigate, search and analyze large image databases. Recent advances in Internet technology
require the development of advanced Web-based tools for efficiently accessing images from tremendously
large, and continuously growing, image collections. One such tool for actively searching for information
is an Image Retrieval System. The interface discussed in this chapter illustrates different relationships
between images by using visual attributes (colors, shape, and proximities), and supports retrieval and
learning, as well as browsing which makes it suitable for an Adaptive Image Retrieval Systems.
Chapter 17, Supercomputers and Supercomputing, by Jeffrey S. Cook, describes supercomputer as
the fastest type of computer used for specialized applications that require a massive number of math-
ematical calculations. The term “supercomputer” was coined in 1929 by the New York World, referring
xviii
to tabulators manufactured by IBM. These tabulators represent the cutting edge of technology, which
harness immense processing power so that they are incredibly fast, sophisticated, and powerful. The use
of supercomputing in data mining has also been discussed in the chapter.
All chapters went through a blind refereeing process before final acceptance. We hope these chapters
are informative, stimulating, and helpful to the readers.
Qingyu Zhang
Arkansas State University, USA
Richard S. Segall
Arkansas State University, USA
Mei Cao
University of Wisconsin-Superior, USA
xix
Acknowledgment
The publication of a book is a cooperative and joint effort and involves many people. We wish to thank
all involved in the solicitation process of book chapters and the review process of the book, without
whose support the book could not have been completed.
Special thanks and gratitude go to the publishing team at IGI Global, in particular to the development
editor Joel Gamon and the acquisition editorial assistant Erika Carter, whose contributions throughout
the process of the book publication have been invaluable.
We want to thank all the authors for their excellent contributions to this book. We are also grateful to all
the reviewers, including most of the contributing authors, who served as referees for chapters written by
other authors, and provided constructive and comprehensive reviews in the double-blind review process.
Qingyu Zhang
Arkansas State University, USA
Richard S. Segall
Arkansas State University, USA
Mei Cao
University of Wisconsin-Superior, USA
May 2010
Section 1
Concepts, Algorithms,
and Theory
1
Chapter 1
Towards the Notion of
Typical Documents in Large
Collections of Documents
Mieczysław A. Kłopotek
Polish Academy of Sciences, Poland & University of Natural and Human Sciences, Poland
Sławomir T. Wierzchoń
Polish Academy of Sciences & University of Gdańsk, Poland
Krzysztof Ciesielski
Polish Academy of Sciences, Poland
Michał Dramiński
Polish Academy of Sciences, Poland
Dariusz Czerski
Polish Academy of Sciences, Poland
AbstrAct
This chapter presents a new measure of document similarity – the GNGrank that was inspired by the
popular opinion that links between the documents reflect similar content. The idea was to create a rank
measure based on the well known PageRank algorithm which exploits the document similarity to insert
links between the documents. A comparative study of various link- and content-based similarity mea-
sures, and GNGrank is performed in the context of identification of a typical document of a collection.
The study suggests that each group of them measures something different, a different aspect of document
space, and hence the respective degrees of typicality do not correlate. This may be an indication that
for different purposes different documents may be important. A deeper study of this phenomenon is our
future research goal.
DOI: 10.4018/978-1-60960-102-7.ch001
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Towards the Notion of Typical Documents in Large Collections of Documents
2
Towards the Notion of Typical Documents in Large Collections of Documents
wt,d for a given term t for a given document d is the “general context”. So if by chance documents
a function of the frequency with which this term from the field of medicine and computer science
occurs in the document and the collection. Then are present in the collection, then if one compares
the execution of a query against the document the medical documents, then their similarity is
collection is reduced to transforming the query impaired by the fact that there are also computer
text into a vector in the very same vector space science publications there. In our research we
and the similarity between each vector and the considered this as not appropriate and decided
query is computed as a dot-product of the query that for within group comparisons we will use
vector and the document vector. not the general context, but rather the context of
In the simplest case we assign wt,d =1, if the the group. So we redefined the tf-idf measure so
document contains at least one occurrence of that the representation of a document within the
term t, and wt,d =0 otherwise. Such a measure group, called “context” takes into account the
does not take into account the fact that a term group and not the whole collection.
used more frequently in the document is more By producing contextual models we oper-
important for it. So wt,d =count(t in d), or to get a ate simultaneously in two spaces: the space of
more flat dependence, wt,d =log (count(t in d)+1). documents and (extended) space of terms. The
This was still considered as not satisfactory, as whole algorithm iteratively adapts: (a) docu-
it gave more weight to common words than to ments representation, (b) description of contexts
content-expressing ones. So the researchers came by means of the histograms, and (c) the degrees
to the conclusion that a punishment for being a of membership of documents to the contexts as
common word has to be included, so finally the so- well as weights of the terms. As a result of such
called tf-idf (term frequency – inverse document a procedure we obtain homogenous groups of
frequency) measure was introduced (Manning, documents together with the description fitted
Raghavan & Schütze, 2009): individually to each group.
Such an approach proved fruitful as it con-
wt,d = count(t in d) * log (cont(docs in collec- tributed to dimensionality reduction within the
tion)/count(docs containing t) context and more robust behavior of subsequent
document map generation process and incremental
With this formula the “general context” of a updates of document collection map (Ciesielski
document becomes visible and more realistic query & Klopotek, 2007).
results are achieved. Still, to get rid of the impact
of document length on the similarity measure, the competitive clustering
vectors describing documents are normalized so
that they are of unit length. The above-mentioned map of a document collec-
The text documents are not uniformly distrib- tion is to be understood as a flat representation
uted in the space of terms. So usually, if a clustering (that is on a two-dimensional Euclidian plain) of
algorithm is run with the above-mentioned simi- the collection formed in such a way that documents
larity measure (dot product of document vectors) similar in the term space will appear more closely
a collection would split into subsets of similar on the document map. So called competitive clus-
documents that are usually topically related. tering algorithms are usually used to form such
At this point we come to a crucial insight. a map. Competitive clustering algorithms, like
While looking at the similarity relations between WebSOM, GNG or aiNet, are attractive because
documents within the clusters, the usual approach of at least two reasons. First, they adaptively fit
is to look at the documents from the perspective of to the internal structure of the data. Second, they
3
Towards the Notion of Typical Documents in Large Collections of Documents
offer natural possibility of visualizing this struc- occur everywhere, or those occurring one time
ture by projecting high-dimensional input vectors in a very long document, will have low weight.
to a two-dimensional grid structure (a map with The histogram then reflects the probability
composed of discrete patches, called cells). This distribution that a particular term occurs with a
map preserves most of the topological information given weight in the documents forming particular
of the input data. context. The terms, that have only low weights
Each cell of the map is described by so-called in the documents, are not important within the
reference vector, or “centroid”, being a concise context. Those with strong share of high weight
characteristic of the micro-group defined by such occurrences can be considered important in dis-
a cell. These centroids attract other input vec- criminating the documents within the context.
tors with the force proportional to their mutual Now, as “typical” we understand the document
similarity. In effect, weight vectors are ordered containing only those terms that are labeled as
according to their similarity to the cells of the important for a given context.
map. Further, the distribution of weight vectors Analogously to content terms, one can build
reflects the density of the input space. Reference histograms of the distribution of additional
vectors of cells neighboring on the map are also semantic attributes within the context and the
closer (in original data space) to one another than “typical” document can be defined now as one
those of distant cells. sharing important (typical) semantic attributes
with this context.
typical Document Issue Whatever information is used, given the his-
togram profile of a cluster, typical documents are
One of the major questions posed about document filtered out as those most similar to the profile.
collections is how to summarize their content. For
purposes of speedy analysis in clustering algo- Medoidal Document Issue
rithms the centroids (or eventually medoids) in the
term space are considered. Ciesielski & Klopotek In a map-like environment there exist new pos-
(2007) proposed to extend this representation by sibilities to analyze links between documents,
taking into account histograms of term weights that may lead to inclusion of content information
(as defined above) within the clusters. into rank computations in a PageRank manner.
The idea, why histograms are used, may be For example, we can consider GNG nodes like
explained as follows: Note that the weight of a term special type “pages” that are linked to one another
in a document depends usually on three factors: according to the GNG links as well as to documents
assigned to them. Beside a link to the GNG node
• the number of its occurrences in the containing it, a document may be deemed linked
document, to other documents within the same GNG node via
• the number of documents containing this their “natural” hyperlinks. We introduce further
term, and links via similarity relationships. A document
• the length of the document. may be deemed linked to k-Nearest Neighboring
documents within the same GNG node. These
The term which occurred several times in a links may be considered as unidirectional. The
short document and does occur in only a few reason for this is that the relation of “top ranked”
other documents, is awarded by high weight, as similarity is not a symmetric relation, so that if A
it is characteristic for such a group. Terms that is most similar to B among all pages similar to A,
4
Random documents with unrelated
content Scribd suggests to you:
obtained, as may be plainly seen, from the mineral earths which are
found about the volcano of Virgenes." The paintings were not the
work of the natives found in possession of the country, at least so
the Spaniards decided, and it was considered remarkable that they
had remained through so many centuries fresh and uninjured by
time. The colors were yellow, red, green, and black, and many
designs were placed so high on cliffs that it seemed necessary to
some of the missionaries to suppose the agency of the giants that
were in 'those days.' Indeed, giants' bones were found on the
peninsula, as in all other parts of the country, and the natives are
said to have had a tradition that the paintings were the work of
giants who came from the north. Clavigero mentions one cave
whose walls and roof formed an arch resting on the floor. It was
about fifteen by eighty feet, and the pictures on its walls
represented men and women dressed like Mexicans, but barefooted.
The men had their arms raised and spread apart, and one woman
wore her hair loose and flowing down her back, and also had a
plume. Some animals were noted both native and foreign. One
author says they bore no resemblance to Mexican paintings. A series
of red hands are reported on a cliff near Santiago mission in the
south, and also, towards the sea, some painted fishes, bows,
arrows, and obscure characters. A rock-inscription near Purmo, thirty
leagues from Santiago, seemed to the Spanish observer to contain
Gothic, Hebrew, and Chaldean letters. From all that is known of the
Lower California rock-paintings and inscriptions, there is no reason
to suppose that they differ much from, or at least are superior to,
those in the New Mexican region, of which we shall find so many
specimens in the next chapter. It is not improbable that these ruder
inscriptions and pictures exist in the southern country already
passed over, to a much greater extent than appears in the preceding
pages, but have remained comparatively unnoticed by travelers in
search of more wonderful or perfect relics of antiquity.[X-53]
Only one monument is known in Sonora, and
CERRO DE LAS
TRINCHERAS.
that only through newspaper reports. It is known
as the Cerro de las Trincheras, and is situated
about fifty miles south-east of Altar. An isolated conical hill has a
spring of water on its summit, also some heaps of loose stones. The
sides of the cerro are encircled by fifty or sixty walls of rough stones;
each about nine feet high and from three to six feet thick, occurring
at irregular intervals of fifty to a hundred feet. Each wall, except that
at the base of the hill, has a gateway, but these entrances occur
alternately on opposite sides of the hill, so that to reach the summit
an enemy would have to fight his way about twenty-five times round
the circumference. One writer tells us that Las Trincheras were first
found—probably by the Spaniards—in 1650; according to another,
the natives say that the fortifications existed in their present state
long before the Spaniards came; and finally Sr C. M. Galan, ex-
governor of Sinaloa and Lower California, a gentleman well
acquainted with all the north-western region, informs me that there
is much doubt among the inhabitants of the locality whether the
walls have not been built since the Spanish Conquest. Sonora also
furnished its quota of giants' bones.[X-54]
Casas Grandes—Chihuahua.
The ruined casas are about half a mile from the modern Mexican
town of the same name, located in a finely chosen site, commanding
a broad view over the fertile valley of the Casas Grandes or San
Miguel river, which valley—or at least the river bottom—is here two
miles wide. This bottom is bounded by a plateau about twenty-five
feet higher, and the ruins are found partly on the bottom and partly
on the more sterile plateau above. They consist of walls, generally
fallen and crumbled into heaps of rubbish, but at some points, as at
the corners and where supported by partition walls, still standing to
a height of from five to thirty feet above the heaps of débris, and
some of them as high as fifty feet, if reckoned from the level of the
ground. The cuts on this and the opposite pages represent views of
the ruins from three different standpoints, as sketched by Mr
Bartlett.
CASAS GRANDES.
Casas Grandes—Chihuahua.
The material of the walls is sun-dried blocks of mud and gravel,
about twenty-two inches thick, and of irregular length, generally
about three feet, probably formed and dried in situ. Of this material
and method of construction more details will be given in the
following chapter on the New Mexican region, where the buildings
are of a similar nature. The walls are in some parts five feet thick,
but were so much damaged at the time of Mr Bartlett's visit that
nothing could be ascertained, at least without excavation, respecting
their finish on either surface. The author of the account in the Album
states that the plaster which covers the blocks is of powdered stone,
but this may be doubted. There is no doubt, however, that they were
plastered on both interior and exterior, with a composition much like
that of which the blocks were made; Escudero found some portions
of the plaster still in place, but does not state what was its
composition. The remains of the main structure, which was
rectangular in its plan, extend over an area measuring about eight
hundred feet from north to south, and two hundred and fifty from
east to west.[X-66] Within this area are three great heaps of ruined
walls, but low connecting lines of débris indicate that all formed one
edifice, or were at least connected by corridors. On the south the
wall, or the heaps indicating its existence, is continuous and regular;
of the northern side nothing is said; but on the east and west the
walls are very irregular, with many angles and projections.
The ground plan of the whole structure could not be made out, at
least in the limited time at Mr Bartlett's disposal. He found, however,
one row of apartments whose plan is shown in the cut. Each of the
six shown is ten by twenty feet, and the small structure in the corner
of each is a pen rather than a room, being only three or four feet
high. In the Album, the usual dimensions of the rooms are given as
about twelve and a half by sixteen and a half feet; one very perfect
room, however, being a little over four feet square. Bartlett found
many rooms altogether too small for sleeping apartments, some of
great size, whose dimensions are not given, and several enclosures
too large to have been covered by a roof, doubtless enclosed
courtyards. One portion of standing wall in the interior had a
doorway narrower at the top than at the bottom, and two circular
openings or windows above it. The explorer of 1842 speaks of
doorways long, square, and round, some of them being walled up at
the bottom so as to form windows.
Area enclosed by the Gila, Rio Grande del Norte, and Colorado—A Land of Mystery—
Wonderful Reports and Adventures of Missionaries, Soldiers, Hunters, Miners, and
Pioneers—Exploration—Railroad Surveys—Classification of Remains—Monuments of
the Gila Valley—Boulder-Inscriptions—The Casa Grande of Arizona—Early Accounts
and Modern Exploration—Adobe Buildings—View and Plans—Miscellaneous remains,
Acequias, and Pottery—Other Ruins on the Gila—Valley of the Rio Salado—Rio
Verde—Pueblo Creek—Upper Gila—Tributaries of the Colorado—Rock-Inscriptions,
Bill Williams Fork—Ruined Cities of the Colorado Chiquito—Rio Puerco—
Lithodendron Creek—Navarro Spring—Zuñi Valley—Arch Spring—Zuñi—Ojo del
Pescado—Inscription Rock—Rio San Juan—Ruins of the Chelly and Chaco Cañons—
Valley of the Rio Grande—Pueblo Towns, Inhabited and in Ruins—The Moqui Towns—
The Seven Cities of Cíbola—Résumé, Comparisons, and Conclusions.
Between the Pima villages and the junction of the San Pedro with
the Gila, stands the most famous ruin of the whole region—the Casa
Grande, or Casa de Montezuma, which it is safe to say has been
mentioned by every writer on American antiquity. Coronado during
his trip from Culiacan to the 'seven cities' in 1540, visited a building
called Chichilticale, or 'red house,' which is supposed with much
reason to have been the Casa Grande. The only account of
Coronado's trip which gives any description of the building is that of
Castañeda, who says, "Chichilticale of which so much had been said
[probably by the guides or natives] proved to be a house in ruins
and without a roof; which seemed, however, to have been fortified.
It was clear that this house, built of red earth, was the work of
civilized people who had come from far away." "A house which had
long been inhabited by a people who came from Cíbola. The earth in
this country is red. The house was large; it seemed to have served
as a fortress."[XI-3]
Father Kino heard of the ruin while visiting the northern missions of
Sonora in the early part of 1694. He was at first incredulous, but the
information having been confirmed by other reports of the natives,
he visited the Casa Grande later in the same year, and said mass
within its walls. Since Kino was not accompanied at the time by
Padre Mange, his secretary, who usually kept the diary of his
expeditions, no definite account resulted from this first visit.[XI-4]
In 1697, however, Padre Kino revisited the place, in company this
time with Mange, who in his diary of the trip wrote what may be
regarded as the first definite description.[XI-5]
Padre Jacobo Sedelmair visited the Casa Grande
CASA GRANDE OF
THE GILA.
in 1744, but in his narrative he copies Mange's
account. He went further, however, and
discovered other ruins.[XI-6]
Lieut C. M. Bernal seems to have been military
AUTHORITIES ON
THE CASA GRANDE.
commandant in Kino's expedition, and he also
describes the ruin in his report.[XI-7] Padres
Garcés and Font made a journey in 1775-6, under Capt. Anza, to the
Gila and Colorado valleys, and thence to the missions of Alta
California and the Moqui towns. Both mention the ruin in their
diaries, the latter giving quite a full account. I know not if Padre
Font's diary has ever been printed, but I have in my collection an
English manuscript translation from the original in the archives at
Guadalajara,—perhaps the same copy from which Mr Bartlett made
the extracts which he printed in his work.[XI-8] Font's plan is not
given with the translation, but in Beaumont's Crónica de Mechoacan,
a very important work never published, of which I have a copy made
from the original for the Mexican Imperial Library of Maximilian, I
find a description of the Casa Grande, which appears to have been
quoted literally from Font's diary, and which also contains the ground
plan of the ruined edifice. I shall notice hereafter its variations from
the plan which I shall copy.[XI-9] A brief account was given in the
Rudo Ensayo, written about 1761, and by Velarde in his notice of the
Pimería, written probably toward the close of the eighteenth
century; but neither of these descriptions contained any additional
information, having been made up probably from the preceding.[XI-
10]
Finally the Casa Grande has been visited, sketched, and described by
Emory and Johnston, connected with Gen. Kearny's military
expedition to California in 1846; by Bartlett with the Mexican
Boundary Commission in 1852; and by Ross Browne in 1863.[XI-11]
The descriptions of different writers do not differ very materially one
from another, Bartlett's among the later, and Font's of the earlier
accounts being the most complete. From all the authorities I make
up the following description, although the extracts which I have
already given include nearly all that can be said on the subject. The
Casa Grande stands about two miles and a half south of the bank of
the Gila;—that is all the early writers call the distance about a
league; Bartlett and Emory say nothing of the distance, and Ross
Browne says it is half an hour's ride. The Gila valley in this region is
a level bottom of varying width, with nearly perpendicular banks of
earth. Opposite the ruin the bottom is about a mile wide on the
southern bank of the river, and the ruin itself stands on the raised
plateau beyond, surrounded by a thick growth of mesquite with an
occasional pitahaya. The height and nature of the ascent from the
bottom to the plateau at this particular point are not stated; but
from the fact that acequias are reported leading from the river to the
buildings, it would seem that the ascent must be very slight and
gradual.
The appearance of the ruins in 1863 is shown in the cut as sketched
by Ross Browne. Other sketches by Bartlett, Emory, and Johnston,
agree very well with the one given, but none of them indicate the
presence of the mesquite forest mentioned in Mr Bartlett's text. The
material of the buildings is adobe,[XI-12] that is, the ordinary mud of
the locality mixed with gravel. Most writers say nothing of its color,
although Bernal in 1697 pronounced it 'white clay,' and Johnston
also says it is white, probably with an admixture of lime, which, as
he states, is abundant in the vicinity. Mr Hutton, a civil engineer well
acquainted with the ruins, assured Mr Simpson that the surrounding
earth is of a reddish color, although by reason of the pebbles the
Casa has a whitish appearance in certain reflections. This matter of
color is of no great importance except to prove the identity of the
building with Castañeda's Chichilticale, which he expressly states to
have been built of red earth.[XI-13] The material instead of being
formed into small rectangular or brick-shaped blocks, as is
customary in all Spanish American countries to this day, seems in
this aboriginal structure to have been molded—perhaps by means of
wooden boxes—and dried where it was to remain in the walls, in
blocks of varying size, but generally four feet long by two feet in
width and thickness. The outer surface of the walls was plastered
with the same material which constituted the blocks, and the inner
walls were hard-finished with a finer composition of the same
nature, which in many parts has retained its smooth and even
polished surface. Adobe is a very durable building-material, so long
as a little attention is given to repairs, but it is really wonderful that
the walls of the Casa Grande have resisted, uncared for, the ravages
of time and the elements for over three hundred years of known
age, and of certainly a century—perhaps much more—of pre-Spanish
existence.
Casa Grande of the Gila.
The buildings that still have upright walls are three in number, and in
the largest of these both the exterior and interior walls are so nearly
perfect as to show accurately not only the original form and size, but
the division of the interior into apartments. Its dimensions on the
ground are fifty feet from north to south, by forty feet from east to
west. The outer wall is about five feet thick at the base, diminishing
slightly towards the top, in a curved line on the exterior, but
perpendicular on the inside.[XI-14] The interior is divided by partition
walls, slightly thinner than the others, into five apartments, as
shown in the accompanying ground plan taken from Bartlett. Font's
plan given by Beaumont agrees with this, except that additional
doors are represented at the points marked with a dot, and no
doorway is indicated at a. The three central rooms are each about
eight by fourteen feet, and the others ten by thirty-two feet, as
nearly as may be estimated from Bartlett's plan and the statements
of other writers.[XI-15] The doors in the centre of each façade are
three feet wide and five feet high, and somewhat narrower at the
top than at the bottom, except that on the western front, which is
two by seven or eight feet. There are some small windows, both
square and circular in the outer and inner walls. The following cut
shows an elevation of the side and end, also from Bartlett.[XI-16]
Remains of floor timbers show that the main walls were three stories
high, or, as the lower rooms are represented by Font as about ten
English feet high, about thirty feet in height; while the central
portion is eight or ten feet—probably one story—higher. Mr Bartlett
judged from the mass of débris within that the main building had
originally four stories; but as the earliest visitors speak of three and
four stories—some referring to the central, others apparently to the
outer portions—there would seem to be no satisfactory evidence
that the building was over forty feet high, although it is possible that
the outer and inner walls were originally of the same height.
Respecting the arrangement of apartments in the upper stories,
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com