100% found this document useful (2 votes)
14 views

Big Data Analytics in Cybersecurity First Edition Deng - The ebook in PDF format with all chapters is ready for download

The document provides information about various ebooks related to Big Data Analytics, particularly in the field of cybersecurity, available for download at textbookfull.com. It highlights several titles, including 'Big Data Analytics in Cybersecurity' and 'Leadership Strategies in the Age of Big Data Algorithms and Analytics,' among others. Additionally, it outlines the structure of a specific book on Big Data Analytics in Cybersecurity, detailing its chapters and the expertise of its contributors.

Uploaded by

olaceasoweh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
14 views

Big Data Analytics in Cybersecurity First Edition Deng - The ebook in PDF format with all chapters is ready for download

The document provides information about various ebooks related to Big Data Analytics, particularly in the field of cybersecurity, available for download at textbookfull.com. It highlights several titles, including 'Big Data Analytics in Cybersecurity' and 'Leadership Strategies in the Age of Big Data Algorithms and Analytics,' among others. Additionally, it outlines the structure of a specific book on Big Data Analytics in Cybersecurity, detailing its chapters and the expertise of its contributors.

Uploaded by

olaceasoweh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Explore the full ebook collection and download it now at textbookfull.

com

Big Data Analytics in Cybersecurity First Edition


Deng

https://textbookfull.com/product/big-data-analytics-in-
cybersecurity-first-edition-deng/

OR CLICK HERE

DOWLOAD EBOOK

Browse and Get More Ebook Downloads Instantly at https://textbookfull.com


Click here to visit textbookfull.com and download textbook now
Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...

Read anywhere, anytime, on any device!

Leadership Strategies in the Age of Big Data Algorithms


and Analytics First Edition Norton Paley

https://textbookfull.com/product/leadership-strategies-in-the-age-of-
big-data-algorithms-and-analytics-first-edition-norton-paley/

textbookfull.com

From Big Data to Big Profits Success with Data and


Analytics 1st Edition Russell Walker

https://textbookfull.com/product/from-big-data-to-big-profits-success-
with-data-and-analytics-1st-edition-russell-walker/

textbookfull.com

Big Data Analytics Systems Algorithms Applications C.S.R.


Prabhu

https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/

textbookfull.com

Big data and analytics for insurers 1st Edition Boobier

https://textbookfull.com/product/big-data-and-analytics-for-
insurers-1st-edition-boobier/

textbookfull.com
Big Data Analytics with Java 1st Edition Rajat Mehta

https://textbookfull.com/product/big-data-analytics-with-java-1st-
edition-rajat-mehta/

textbookfull.com

Big Data in Practice How 45 Successful Companies Used Big


Data Analytics to Deliver Extraordinary Results 1st
Edition Bernard Marr
https://textbookfull.com/product/big-data-in-practice-
how-45-successful-companies-used-big-data-analytics-to-deliver-
extraordinary-results-1st-edition-bernard-marr/
textbookfull.com

Big Data Analytics for Large Scale Multimedia Search


Stefanos Vrochidis

https://textbookfull.com/product/big-data-analytics-for-large-scale-
multimedia-search-stefanos-vrochidis/

textbookfull.com

Big Data Analytics for Intelligent Healthcare Management


1st Edition Nilanjan Dey

https://textbookfull.com/product/big-data-analytics-for-intelligent-
healthcare-management-1st-edition-nilanjan-dey/

textbookfull.com

Understanding Azure Data Factory: Operationalizing Big


Data and Advanced Analytics Solutions Sudhir Rawat

https://textbookfull.com/product/understanding-azure-data-factory-
operationalizing-big-data-and-advanced-analytics-solutions-sudhir-
rawat/
textbookfull.com
Big Data Analytics
in Cybersecurity
Data Analytics Applications
Series Editor: Jay Liebowitz

PUBLISHED

Actionable Intelligence for Healthcare


by Jay Liebowitz, Amanda Dawson
ISBN: 978-1-4987-6665-4

Data Analytics Applications in Latin America and Emerging Economies


by Eduardo Rodriguez
ISBN: 978-1-4987-6276-2

Sport Business Analytics: Using Data to Increase Revenue and


Improve Operational Efficiency
by C. Keith Harrison, Scott Bukstein
ISBN: 978-1-4987-6126-0

Big Data and Analytics Applications in Government:


Current Practices and Future Opportunities
by Gregory Richards
ISBN: 978-1-4987-6434-6

Data Analytics Applications in Education


by Jan Vanthienen and Kristoff De Witte
ISBN: 978-1-4987-6927-3

Big Data Analytics in Cybersecurity


by Onur Savas and Julia Deng
ISBN: 978-1-4987-7212-9

FORTHCOMING

Data Analytics Applications in Law


by Edward J. Walters
ISBN: 978-1-4987-6665-4

Data Analytics for Marketing and CRM


by Jie Cheng
ISBN: 978-1-4987-6424-7

Data Analytics in Institutional Trading


by Henri Waelbroeck
ISBN: 978-1-4987-7138-2
Big Data Analytics
in Cybersecurity

Edited by
Onur Savas
Julia Deng
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2017 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-7212-9 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright​
.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the
CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface................................................................................................................vii
About the Editors..............................................................................................xiii
Contributors....................................................................................................... xv

Section I APPLYING BIG DATA INTO


DIFFERENT CYBERSECURITY ASPECTS
1 The Power of Big Data in Cybersecurity.................................................3
SONG LUO, MALEK BEN SALEM, AND YAN ZHAI

2 Big Data for Network Forensics............................................................23


YI CHENG, TUNG THANH NGUYEN, HUI ZENG, AND JULIA DENG

3 Dynamic Analytics-Driven Assessment of Vulnerabilities


and Exploitation....................................................................................53
HASAN CAM, MAGNUS LJUNGBERG, AKHILOMEN ONIHA,
AND ALEXIA SCHULZ

4 Root Cause Analysis for Cybersecurity.................................................81


ENGIN KIRDA AND AMIN KHARRAZ

5 Data Visualization for Cybersecurity....................................................99


LANE HARRISON

6 Cybersecurity Training.......................................................................115
BOB POKORNY

7 Machine Unlearning: Repairing Learning Models in Adversarial


Environments......................................................................................137
YINZHI CAO

v
vi ◾ Contents

Section II BIG DATA IN EMERGING


CYBERSECURITY DOMAINS
8 Big Data Analytics for Mobile App Security.......................................169
DOINA CARAGEA AND XINMING OU

9 Security, Privacy, and Trust in Cloud Computing..............................185


YUHONG LIU, RUIWEN LI, SONGJIE CAI, AND YAN (LINDSAY) SUN

10 Cybersecurity in Internet of Things (IoT)...........................................221


WENLIN HAN AND YANG XIAO

11 Big Data Analytics for Security in Fog Computing............................245


SHANHE YI AND QUN LI

12 Analyzing Deviant Socio-Technical Behaviors Using Social


Network Analysis and Cyber Forensics-Based Methodologies............263
SAMER AL-KHATEEB, MUHAMMAD HUSSAIN, AND NITIN AGARWAL

Section III TOOLS AND DATASETS FOR CYBERSECURITY


13 Security Tools......................................................................................283
MATTHEW MATCHEN

14 Data and Research Initiatives for Cybersecurity Analysis..................309


JULIA DENG AND ONUR SAVAS

Index............................................................................................................329
Preface

Cybersecurity is the protection of information systems, both hardware and soft-


ware, from the theft, unauthorized access, and disclosure, as well as intentional or
accidental harm. It protects all segments pertaining to the Internet, from networks
themselves to the information transmitted over the network and stored in data-
bases, to various applications, and to devices that control equipment operations
via network connections. With the emergence of new advanced technologies such
as cloud, mobile computing, fog computing, and the Internet of Things (IoT), the
Internet has become and will be more ubiquitous. While this ubiquity makes our
lives easier, it creates unprecedented challenges for cybersecurity. Nowadays it seems
that not a day goes by without a new story on the topic of cybersecurity, either a
security incident on information leakage, or an abuse of an emerging technology
such as autonomous car hacking, or the software we have been using for years is
now deemed to be dangerous because of the newly found security vulnerabilities.
So, why can’t these cyberattacks be stopped? Well, the answer is very com-
plicated, partially because of the dependency on legacy systems, human errors,
or simply not paying attention to security aspects. In addition, the changing and
increasing complex threat landscape makes traditional cybersecurity mechanisms
inadequate and ineffective. Big data is further making the situation worse, and pres-
ents additional challenges to cybersecurity. For an example, the IoT will generate a
staggering 400 zettabytes (ZB) of data a year by 2018, according to a report from
Cisco. Self-driving cars will soon create significantly more data than people—​
3 billion people’s worth of data, according to Intel. The averagely driven car will
churn out 4000 GB of data per day, and that is just for one hour of driving a day.
Big data analytics, as an emerging analytical technology, offers the capability
to collect, store, process, and visualize BIG data; therefore, applying big data ana-
lytics in cybersecurity becomes critical and a new trend. By exploiting data from
the networks and computers, analysts can discover useful information from data
using analytic techniques and processes. Then the decision makers can make more
informative decisions by taking advantage of the analysis, including what actions
need to be performed, and improvement recommendations to policies, guidelines,
procedures, tools, and other aspects of the network processes.

vii
viii ◾ Preface

This book provides a comprehensive coverage of a wide range of complementary


topics in cybersecurity. The topics include but are not limited to network forensics,
threat analysis, vulnerability assessment, visualization, and cyber training. In addi-
tion, emerging security domains such as the IoT, cloud computing, fog computing,
mobile computing, and the cyber-social networks are studied. The target audience of
this book includes both starters and more experienced security professionals. Readers
with data analytics but no cybersecurity or IT experience, or readers with cybersecu-
rity but no data analytics experience will hopefully find the book informative.
The book consists of 14 chapters, organized into three parts, namely
“Applying Big Data into Different Cybersecurity Aspects,” “Big Data in Emerging
Cybersecurity Domains,” and “Tools and Datasets for Cybersecurity.” The first part
includes Chapters 1–7, focusing on how big data analytics can be used in differ-
ent cybersecurity aspects. The second part includes Chapters 8–12, discussing big
data challenges and solutions in emerging cybersecurity domains, and the last part,
Chapters 13 and 14, present the tools and datasets for cybersecurity research. The
authors are experts in their respective domains, and are from academia, govern-
ment labs, and the industry.
Chapter 1, “The Power of Big Data in Cybersecurity,” is written by Song Luo,
Malek Ben Salem, from Accenture Technology Labs, and Yan Zhai from E8 Security
Inc. This chapter introduces big data analytics and highlights the needs and impor-
tance of applying big data analytics in cybersecurity to fight against the evolving
threat landscape. It also describes the typical usage of big data security analytics
including its solution domains, architecture, typical use cases, and the challenges.
Big data analytics, as an emerging analytical technology, offers the capability to
collect, store, process, and visualize big data, which are so large or complex that
traditional data processing applications are inadequate to deal with. Cybersecurity,
at the same time, is experiencing the big data challenge due to the rapidly grow-
ing complexity of networks (e.g., virtualization, smart devices, wireless connections,
Internet of Things, etc.) and increasing sophisticated threats (e.g., malware, multi-
stage, advanced persistent threats [APTs], etc.). Accordingly, this chapter discusses
how big data analytics technology brings in its advantages, and applying big data
analytics in cybersecurity is essential to cope with emerging threats.
Chapter 2, “Big Data Analytics for Network Forensics,” is written by scien-
tists Yi Cheng, Tung Thanh Nguyen, Hui Zeng, and Julia Deng from Intelligent
Automation, Inc. Network forensics plays a key role in network management and
cybersecurity analysis. Recently, it is facing the new challenge of big data. Big
data analytics has shown its promise of unearthing important insights from large
amounts of data that were previously impossible to find, which attracts the atten-
tion of researchers in network forensics, and a number of efforts have been initiated.
This chapter provides an overview on how to apply big data technologies into net-
work forensics. It first describes the terms and process of network forensics, presents
current practice and their limitations, and then discusses design considerations and
some experiences of applying big data analysis for network forensics.
Preface ◾ ix

Chapter 3, “Dynamic Analytics-Driven Assessment of Vulnerabilities and


Exploitation,” is written by U.S. Army Research Lab scientists Hasan Cam
and Akhilomen Oniha, and MIT Lincoln Laboratory scientists Magnus Ljungberg
and Alexia Schulz. This chapter presents vulnerability assessment, one of the essential
cybersecurity functions and requirements, and highlights how big data analytics could
potentially leverage vulnerability assessment and causality analysis of vulnerability
exploitation in the detection of intrusion and vulnerabilities so that cyber analysts can
investigate alerts and vulnerabilities more effectively and faster. The authors present
novel models and data analytics approaches to dynamically building and analyzing
relationships, dependencies, and causality reasoning among the detected vulner-
abilities, intrusion detection alerts, and measurements. This chapter also describes a
detailed description of building an exemplary scalable data analytics system to imple-
ment the proposed model and approaches by enriching, tagging, and indexing the
data of all observations and measurements, vulnerabilities, detection, and monitoring.
Chapter 4, “Root Cause Analysis for Cybersecurity,” is written by Amin
Kharraz and Professor Engin Kirda of Northwestern University. Recent years have
seen the rise of many classes of cyber attacks ranging from ransomware to advanced
persistent threats (APTs), which pose severe risks to companies and enterprises.
While static detection and signature-based tools are still useful in detecting already
observed threats, they lag behind in detecting such sophisticated attacks where
adversaries are adaptable and can evade defenses. This chapter intends to explain
how to analyze the nature of current multidimensional attacks, and how to identify
the root causes of such security incidents. The chapter also elaborates on how to
incorporate the acquired intelligence to minimize the impact of complex threats
and perform rapid incident response.
Chapter 5, “Data Visualization for Cyber Security,” is written by Professor Lane
Harrison of Worcester Polytechnic Institute. This chapter is motivated by the fact
that data visualization is an indispensable means for analysis and communication,
particularly in cyber security. Promising techniques and systems for cyber data
visualization have emerged in the past decade, with applications ranging from
threat and vulnerability analysis to forensics and network traffic monitoring. In this
chapter, the author revisits several of these milestones. Beyond recounting the past,
however, the author uncovers and illustrates the emerging themes in new and ongo-
ing cyber data visualization research. The need for principled approaches toward
combining the strengths of the human perceptual system is also explored with
analytical techniques like anomaly detection, for example, as well as the increas-
ingly urgent challenge of combatting suboptimal visualization designs—designs
that waste both analyst time and organization resources.
Chapter 6, “Cybersecurity Training,” is written by cognitive psychologist Bob
Pokorny of Intelligent Automation, Inc. This chapter presents training approaches
incorporating principles that are not commonly incorporated into training pro-
grams, but should be applied when constructing training for cybersecurity. It
should help you understand that training is more than (1) providing information
x ◾ Preface

that the organization expects staff to apply; (2) assuming that new cybersecurity
staff who recently received degrees or certificates in cybersecurity will know what is
required; or (3) requiring cybersecurity personnel to read about new threats.
Chapter 7, “Machine Unlearning: Repairing Learning Models in Adversarial
Environments,” is written by Professor Yinzhi Cao of Lehigh University. Motivated
by the fact that today’s systems produce a rapidly exploding amount of data, and
the data further derives more data, this forms a complex data propagation network
that we call the data’s lineage. There are many reasons that users want systems to
forget certain data including its lineage for privacy, security, and usability reasons.
In this chapter, the author introduces a new concept machine unlearning, or simply
unlearning, capable of forgetting certain data and their lineages in learning models
completely and quickly. The chapter presents a general, efficient unlearning approach
by transforming learning algorithms used by a system into a summation form.
Chapter 8, “Big Data Analytics for Mobile App Security,” is written by
Professor Doina Caragea of Kansas State University, and Professor Xinming Ou of
the University of South Florida. This chapter describes mobile app security analysis,
one of the new emerging cybersecurity issues with rapidly increasing requirements
introduced by the predominant use of mobile devices in people’s daily lives, and dis-
cusses how big data techniques such as machine learning (ML) can be leveraged for
analyzing mobile applications such as Android for security problems, in particular
malware detection. This chapter also demonstrates the impact of some challenges
on some existing machine learning-based approaches, and is particularly written to
encourage the practice of employing a better evaluation strategy and better designs
of future machine learning-based approaches for Android malware detection.
Chapter 9, “Security, Privacy, and Trust in Cloud Computing,” is written by
Ruiwen Li, Songjie Cai, and Professor Yuhong Liu Ruiwen Li, and Songjie Cai of
Santa Clara University, and Professor Yan (Lindsay) Sun of the University of Rhode
Island. Cloud computing is revolutionizing the cyberspace by enabling conve-
nient, on-demand network access to a large shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services) that can be rap-
idly provisioned and released. While cloud computing is gaining popularity, diverse
security, privacy, and trust issues are emerging, which hinders the rapid adoption of
this new computing paradigm. This chapter introduces important concepts, mod-
els, key technologies, and unique characteristics of cloud computing, which helps
readers better understand the fundamental reasons for current security, privacy, and
trust issues in cloud computing. Furthermore, critical security, privacy and trust
challenges, and the corresponding state-of-the-art solutions are categorized and dis-
cussed in detail, and followed by future research directions.
Chapter 10, “Cybersecurity in Internet of Things (IoT),” is written by Wenlin Han
and Professor Yang Xiao of the University of Alabama. This chapter introduces the
IoT as one of the most rapidly expanding cybersecurity domains, and presents the
big data challenges faced by IoT, as well as various security requirements and issues
in IoT. IoT is a giant network containing various applications and systems with
Preface ◾ xi

heterogeneous devices, data sources, protocols, data formats, and so on. Thus, the
data in IoT is extremely heterogeneous and big, and this poses heterogeneous big data
security and management problems. This chapter describes current solutions and also
outlines how big data analytics can address security issues in IoT when facing big data.
Chapter 11, “Big Data Analytics for Security in Fog Computing,” is written by
Shanhe Yi and Professor Qun Li of the College of William and Mary. Fog comput-
ing is a new computing paradigm that can provide elastic resources at the edge of
the Internet to enable many new applications and services. This chapter discusses
how big data analytics can come out of the cloud and into the fog, and how security
problems in fog computing can be solved using big data analytics. The chapter also
discusses the challenges and potential solutions of each problem and highlights
some opportunities by surveying existing work in fog computing.
Chapter 12, “Analyzing Deviant Socio-Technical Behaviors using Social
Network Analysis and Cyber Forensics-Based Methodologies,” is written by Samer
Al-khateeb, Muhammad Hussain, and Professor Nitin Agarwal of the University
of Arkansas at Little Rock. In today’s information technology age, our thinking
and behaviors are highly influenced by what we see online. However, misinfor-
mation is rampant. Deviant groups use social media (e.g., Facebook) to coordi-
nate cyber campaigns to achieve strategic goals, influence mass thinking, and steer
behaviors or perspectives about an event. The chapter employs computational social
network analysis and cyber forensics informed methodologies to study information
competitors who seek to take the initiative and the strategic message away from the
main event in order to further their own agenda (via misleading, deception, etc.).
Chapter 13, “Security Tools for Cybersecurity,” is written by Matthew Matchen
of Braxton-Grant Technologies. This chapter takes a purely practical approach to
cybersecurity. When people are prepared to apply cybersecurity ideas and theory to
practical applications in the real world, they equip themselves with tools to better
enable the successful outcome of their efforts. However, choosing the right tools
has always been a challenge. The focus of this chapter is to identify functional areas
in which cybersecurity tools are available and to list examples in each area to dem-
onstrate how tools are better suited to provide insight in one area over the other.
Chapter 14, “Data and Research Initiatives for Cybersecurity,” is written by the
editors of this book. We have been motivated by the fact that big data based cyber-
security analytics is a data-centric approach. Its ultimate goal is to utilize available
technology solutions to make sense of the wealth of relevant cyber data and turn-
ing it into actionable insights that can be used to improve the current practices
of network operators and administrators. Hence, this chapter aims at introducing
relevant data sources for cybersecurity analysis, such as benchmark datasets for
cybersecurity evaluation and testing, and certain research repositories where real
world cybersecurity datasets, tools, models, and methodologies can be found to
support research and development among cybersecurity researchers. In addition,
some insights are added for the future directions on data sharing for big data based
cybersecurity analysis.
http://taylorandfrancis.com
About the Editors

Dr. Onur Savas is a data scientist at Intelligent Automation, Inc. (IAI), Rockville,
MD. As a data scientist, he performs research and development (R&D), leads a
team of data scientists, software engineers, and programmers, and contributes to
IAI’s increasing portfolio of products. He has more than 10 years of R&D expertise
in the areas of networks and security, social media, distributed algorithms, sen-
sors, and statistics. His recent work focuses on all aspects of big data analytics and
cloud computing with applications to network management, cybersecurity, and
social networks. Dr. Savas has a PhD in electrical and computer engineering from
Boston University, Boston, MA, and is the author of numerous publications in
leading journals and conferences. At IAI, he has been the recipient of various R&D
contracts from DARPA, ONR, ARL, AFRL, CTTSO, NASA, and other federal
agencies. His work at IAI has contributed to the development and commercializa-
tion of IAI’s social media analytics tool Scraawl® (www.scraawl.com).

Dr. Julia Deng is a principal scientist and Sr. Director of Network and Security
Group at Intelligent Automation, Inc. (IAI), Rockville, MD. She leads a team of
more than 40 scientists and engineers, and during her tenure at IAI, she has been
instrumental in growing IAI’s research portfolio in networks and cybersecurity. In
her role as a principal investigator and principal scientist, she initiated and directed
numerous R&D programs in the areas of airborne networks, cybersecurity, net-
work management, wireless networks, trusted computing, embedded system, cog-
nitive radio networks, big data analytics, and cloud computing. Dr. Deng has a
PhD from the University of Cincinnati, Cincinnati, OH, and has published over
30 papers in leading international journals and conference proceedings.

xiii
http://taylorandfrancis.com
Contributors

Nitin Agarwal Julia Deng


University of Arkansas at Little Rock Intelligent Automation, Inc.
Little Rock, Arkansas Rockville, Maryland

Samer Al-khateeb Wenlin Han


University of Arkansas at Little Rock University of Alabama
Little Rock, Arkansas Tuscaloosa, Alabama

Songjie Cai Lane Harrison


Santa Clara University Worcester Polytechnic Institute
Santa Clara, California Worcester, Massachusetts

Hasan Cam Muhammad Hussain


U.S. Army Research Lab University of Arkansas at Little Rock
Adelphi, Maryland Little Rock, Arkansas

Yinzhi Cao Amin Kharraz


Lehigh University Northwestern University
Bethlehem, Pennsylvania Boston, Massachusetts

Doina Caragea Engin Kirda


Kansas State University Northwestern University
Manhattan, Kansas Boston, Massachusetts

Yi Cheng Qun Li
Intelligent Automation, Inc. College of William and Mary
Rockville, Maryland Williamsburg, Virginia

xv
xvi ◾ Contributors

Ruiwen Li Malek Ben Salem


Santa Clara University Accenture Technology Labs
Santa Clara, California Washington, DC

Yuhong Liu Onur Savas


Santa Clara University Intelligent Automation, Inc.
Santa Clara, California Rockville, Maryland

Magnus Ljungberg Alexia Schulz


MIT Lincoln Laboratory MIT Lincoln Laboratory
Lexington, Massachusetts Lexington, Massachusetts

Song Luo Yan (Lindsay) Sun


Accenture Technology Labs University of Rhode Island
Washington, DC Kingston, Rhode Island

Matthew Matchen Yang Xiao


Braxton-Grant Technologies University of Alabama
Elkridge, Maryland Tuscaloosa, Alabama

Tung Thanh Nguyen Shanhe Yi


Intelligent Automation, Inc. College of William and Mary
Rockville, Maryland Williamsburg, Virginia

Akhilomen Oniha Hui Zeng


U.S. Army Research Lab Intelligent Automation, Inc.
Adelphi, Maryland Rockville, Maryland

Xinming Ou Yan Zhai


University of South Florida E8 Security Inc.
Tampa, Florida Redwood City, California

Bob Pokorny
Intelligent Automation, Inc.
Rockville, Maryland
APPLYING BIG I
DATA INTO
DIFFERENT
CYBERSECURITY
ASPECTS
http://taylorandfrancis.com
Chapter 1

The Power of Big Data


in Cybersecurity
Song Luo, Malek Ben Salem, and Yan Zhai

Contents
1.1 Introduction to Big Data Analytics...............................................................4
1.1.1 What Is Big Data Analytics?..............................................................4
1.1.2 Differences between Traditional Analytics and Big Data Analytics....4
1.1.2.1 Distributed Storage..............................................................5
1.1.2.2 Support for Unstructured Data............................................5
1.1.2.3 Fast Data Processing............................................................6
1.1.3 Big Data Ecosystem...........................................................................7
1.2 The Need for Big Data Analytics in Cybersecurity........................................8
1.2.1 Limitations of Traditional Security Mechanisms...............................9
1.2.2 The Evolving Threat Landscape Requires New Security
Approaches......................................................................................10
1.2.3 Big Data Analytics Offers New Opportunities to Cybersecurity......11
1.3 Applying Big Data Analytics in Cybersecurity............................................11
1.3.1 The Category of Current Solutions..................................................11
1.3.2 Big Data Security Analytics Architecture........................................12
1.3.3 Use Cases.........................................................................................13
1.3.3.1 Data Retention/Access.......................................................13
1.3.3.2 Context Enrichment..........................................................14
1.3.3.3 Anomaly Detection...........................................................15
1.4 Challenges to Big Data Analytics for Cybersecurity....................................18
References............................................................................................................20

3
4 ◾ Big Data Analytics in Cybersecurity

This chapter introduces big data analytics and highlights the needs and importance
of applying big data analytics in cybersecurity to fight against the evolving threat
landscape. It also describes the typical usage of big data security analytics including
its solution domains, architecture, typical use cases, and the challenges. Big data
analytics, as an emerging analytical technology, offers the capability to collect,
store, process, and visualize big data, which are so large or complex that traditional
data processing applications are inadequate to deal with them. Cybersecurity, at
the same time, is experiencing the big data challenge due to the rapidly growing
complexity of networks (e.g., virtualization, smart devices, wireless connections,
Internet of Things, etc.) and increasing sophisticated threats (e.g., malware, multi-
stage, advanced persistent threats [APTs], etc.). Accordingly, traditional cybersecu-
rity tools become ineffective and inadequate in addressing these challenges and big
data analytics technology brings in its advantages, and applying big data analytics
in cybersecurity becomes critical and a new trend.

1.1 Introduction to Big Data Analytics


1.1.1 What Is Big Data Analytics?
Big data is a term applied to data sets whose size or type is beyond the ability
of traditional relational databases to capture, manage, and process. As formally
defined by Gartner [1], “Big data is high-volume, high-velocity and/or high-variety
information assets that demand cost-effective, innovative forms of information pro-
cessing that enable enhanced insight, decision making, and process automation.”
The characteristics of big data are often referred to as 3Vs: Volume, Velocity, and
Variety. Big data analytics refers to the use of advanced analytic techniques on big
data to uncover hidden patterns, unknown correlations, market trends, customer
preferences and other useful business information. Advanced analytics techniques
include text analytics, machine learning, predictive analytics, data mining, statis-
tics, natural language processing, and so on. Analyzing big data allows analysts,
researchers, and business users to make better and faster decisions using data that
was previously inaccessible or unusable.

1.1.2 Differences between Traditional


Analytics and Big Data Analytics
There is a big difference between big data analytics and handling a large amount
of data in a traditional manner. While a traditional data warehouse mainly focuses
more on structured data relying on relational databases, and may not be able to han-
dle semistructured and unstructured data well, big data analytics offers key advan-
tages of processing unstructured data using a nonrelational database. Furthermore,
data warehouses may not be able to handle the processing demands posed by sets
The Power of Big Data in Cybersecurity ◾ 5

of big data that need to be updated frequently or even continually. Big data analyt-
ics is able to deal with them well by applying distributed storage and distributed
in-memory processing.

1.1.2.1 Distributed Storage
“Volume” is the first “V” of Gartner’s definition of big data. One key feature of big
data is that it usually relies on distributed storage systems because the data is
so massive (often at the petabyte or higher level) that it is impossible for a single
node to store or process it. Big data also requires the storage system to scale up with
future growth. Hyperscale computing environments, used by major big data com-
panies such as Google, Facebook, and Apple, satisfy big data’s storage requirements
by constructing from a vast number of commodity servers with direct-attached
storage (DAS).
Many big data practitioners build their hyberscale computing environments
using Hadoop [2] clusters. Initiated by Google, Apache Hadoop is an open-source
software framework for distributed storage and distributed processing of very large
data sets on computer clusters built from commodity hardware. There are two key
components in Hadoop:

◾◾ HDFS (Hadoop distributed file system): a distributed file system that stores
data across multiple nodes
◾◾ MapReduce: a programming model that processes data in parallel across
multiple nodes

Under MapReduce, queries are split and distributed across parallel nodes and
processed in parallel (the Map step). The results are then gathered and delivered (the
Reduce step). This approach takes advantage of data locality—nodes manipulating
the data they have access to—to allow the dataset to be processed faster and more
efficiently than it would be in conventional supercomputer architecture [3].

1.1.2.2 Support for Unstructured Data


Unstructured data is heterogeneous and variable in nature and comes in many for-
mats, including text, document, image, video, and more. The following lists a few
sources that generate unstructured data:

◾◾ Email and other forms of electronic communication


◾◾ Web-based content, including click streams and social media-related content
◾◾ Digitized audio and video
◾◾ Machine-generated data (RFID, GPS, sensor-generated data, log files, etc.)
and the Internet of Things
6 ◾ Big Data Analytics in Cybersecurity

Unstructured data is growing faster than structured data. According to a 2011


IDC study [4], it will account for 90% of all data created in the next decade.
As a new, relatively untapped source of insight, unstructured data analytics can
reveal important interrelationships that were previously difficult or impossible to
determine.
However, relational database and technologies derived from it (e.g., data ware-
houses) cannot manage unstructured and semi-unstructured data well at large scale
because the data lacks predefined schema. To handle the variety and complexity of
unstructured data, databases are shifting from relational to nonrelational. NoSQL
databases are broadly used in big data practice because they support dynamic
schema design, offering the potential for increased flexibility, scalability, and cus-
tomization compared to relational databases. They are designed with “big data”
needs in mind and usually support distributed processing very well.

1.1.2.3 Fast Data Processing


Big data is not just big, it is also fast. Big data is sometimes created by a large num-
ber of constant streams, which typically send in the data records simultaneously,
and in small sizes (order of kilobytes). Streaming data includes a wide variety of
data such as click-stream data, financial transaction data, log files generated by
mobile or web applications, sensor data from Internet of Things (IoT) devices, in-
game player activity, and telemetry from connected devices. The benefit of big data
analytics is limited if it cannot act on data as it arrives. Big data analytics has to
consider velocity as well as volume and variety, which is a key difference between
big data and a traditional data warehouse. The data warehouse, by contract, is usu-
ally more capable of analyzing historical data.
This streaming data needs to be processed sequentially and incrementally on
a record-by-record basis or over sliding time windows, and used for a wide variety
of analytics including correlations, aggregations, filtering, and sampling. Big data
technology unlocks the value in fast data processing with new tools and meth-
odologies. For example, Apache Storm [5] and Apache Kafka [6] are two popu-
lar stream processing systems. Originally developed by the engineering team at
Twitter, Storm can reliably process unbounded streams of data at rates of millions
of messages per second. Kafka, developed by the engineering team at LinkedIn,
is a high-­throughput distributed message queue system. Both streaming systems
address the need of delivering fast data.
Neither traditional relational databases nor NoSQL databases are capable
enough to process fast data. Traditional relational database is limited in perfor-
mance, and NoSQL systems lack support for safe online transactions. However,
in-memory NewSQL solutions can satisfy the needs for both performance and
transactional complexity. NewSQL is a class of modern relational database man-
agement systems that seek to provide the same scalable performance of NoSQL
systems for online transaction processing (OLTP) read-write workloads while still
The Power of Big Data in Cybersecurity ◾ 7

maintaining the ACID (Atomicity, Consistency, Isolation, Durability) guarantees


of a traditional database system [7]. Some NewSQL systems are built with shared-
nothing clustering. Workload is distributed among cluster nodes for performance.
Data is replicated among cluster nodes for safety and availability. New nodes can
be transparently added to the cluster in order to handle increasing workloads. The
NewSQL systems provide both high performance and scalability in online trans-
actional processes.

1.1.3 Big Data Ecosystem


There are many big data technologies and products available in the market, and the
whole big data ecosystem can be divided generally into three categories: infrastruc-
ture, analytics, and applications, as shown in Figure 1.1.

◾◾ Infrastructure
Infrastructure is the fundamental part of the big data technology. It stores,
processes, and sometimes analyzes data. As discussed earlier, big data infra-
structure is capable of handling both structured and unstructured data at
large volumes and fast speed. It supports a vast variety of data, and makes it
possible to run applications on systems with thousands of nodes, potentially

Big data landscape 2016 (version 3.0)


Infrastructure Analytics Applications

Cross-infrastructure/analytics

Open source

Data sources and APIs Incubators and schools

Last updated 3/23/2016  Matt Turck (@mattturck), Jim Hao (@jimrhao), and FirstMark Capital (@firstmarkcap)

Figure 1.1 Big data landscape.


8 ◾ Big Data Analytics in Cybersecurity

involving thousands of terabytes of data. Key infrastructural technologies


include Hadoop, NoSQL, and massively parallel processing (MPP) databases.
◾◾ Analytics
Analytical tools are designed with data analysis capabilities on the big
data infrastructure. Some infrastructural technologies also incorporate data
analysis, but specifically designed analytical tools are more common. Big data
analytical tools can be further classified into the following sub-categories [8]:
1. Analytics platforms: Integrate and analyze data to uncover new insights,
and help companies make better-informed decisions. There is a particular
focus on this field on latency, and delivering insights to end users in the
timeliest manner possible.
2. Visualization platforms: Specifically designed—as the name might
­suggest—for visualizing data; taking the raw data and presenting it in
complex, multidimensional visual formats to illuminate the information.
3. Business intelligence (BI) platforms: Used for integrating and analyzing
data specifically for businesses. BI platforms analyze data from multiple
sources to deliver services such as business intelligence reports, dash-
boards, and visualizations
4. Machine learning: Also falls under this category, but is dissimilar to the
others. Whereas the analytics platforms input processed data and out-
put analytics or dashboards or visualizations to end users, the input of
machine learning is data where the algorithm “learns from,” and the out-
put depends on the use case. One of the most famous examples is IBM’s
super computer Watson, which has “learned” to scan vast amounts of
information to find specific answers, and can comb through 200 million
pages of structured and unstructured data in minutes.
◾◾ Application
Big data applications are built on big data infrastructure and analytical
tools to deliver optimized insight to end-users by analyzing business specific
data. For example, one type of application is to analyze customer online
behavior for retail companies, to have effective marketing campaigns, and
increase customer retention. Another example is fraud detection for finan-
cial companies. Big data analytics helps companies identify irregular patterns
within account accesses and transactions. While the big data infrastructure
and analytical tools have become more mature recently, big data applications
start receiving more attention.

1.2 The Need for Big Data Analytics in Cybersecurity


While big data analytics has been continuously studied and applied into differ-
ent business sectors, cybersecurity, at the same time, is experiencing the big data
The Power of Big Data in Cybersecurity ◾ 9

challenge due to the rapidly growing complexity of networks (e.g., virtualization,


smart devices, wireless connections, IoT, etc.) and increasingly sophisticated threats
(e.g., malware, multistage, APTs, etc.). It has been commonly believed that cyberse-
curity is one of the top (if not the most) critical areas where big data can be a barrier
to understanding the true threat landscape.

1.2.1 Limitations of Traditional Security Mechanisms


The changing and increasing complex threat landscape makes traditional cyber-
security mechanisms inadequate and ineffective in protecting organizations and
ensuring the continuity of their business in digital and connected context.
Many traditional security approaches, such as network-level and host-level
firewalls, have typically focused on preventing attacks. They take perimeter-based
defense techniques mimicking physical security approaches, which focus primarily
on preventing access from the outside and on defense along the perimeter. More
defense layers can be added around the most valuable assets in the network in
order to implement a defense-in-depth strategy. However, as attacks become more
advanced and sophisticated, organizations can no longer assume that they are
exposed to external threats only, nor can they assume that their defense layers can
effectively prevent all potential intrusions. Cyber defense efforts need to shift focus
from prevention to attack detection and mitigation. Traditional prevention-based
security approaches would then constitute only one piece of a much broader secu-
rity strategy that includes detection methods and potentially automated incident
response and recovery processes.
Traditional intrusion and malware detection solutions rely on known signa-
tures and patterns to detect threats. They are facing the challenge of detecting new
and never-before-seen attacks. More advanced detection techniques are seeking to
effectively distinguish normal and abnormal situations, behaviors, and activities,
either at the network traffic level or at the host activity level or at the user behavior
level. Abnormal behaviors can further be used as the indicator of malicious activity
for detecting never-before-seen attacks. A 2014 report from the security firm Enex
TestLab [9] indicated that malware generation outpaced security advancements
during the second half of 2014 to the point that in some of its monthly e-Threats
automated malware tests, solutions from major security vendors were not able to
detect any of the malware they were tested against.
Security information and event management (SIEM) solutions provide real-
time monitoring and correlation of security events as well as log management
and aggregation capabilities. By their very nature, these tools are used to confirm
a suspected breach rather than proactively detecting it. More advanced security
approaches are needed to monitor the behaviors of networks, systems, applications,
and users in order to detect early signs of a breach before cyber attackers can cause
any damages.
10 ◾ Big Data Analytics in Cybersecurity

1.2.2 The Evolving Threat Landscape Requires


New Security Approaches
New technologies, such as virtualization technologies, smartphones, IoT devices,
and their accelerated pace of change are driving major security challenges for orga-
nizations. Similarly, the huge scale of organizations’ software operations is add-
ing to the complexity that cyber defenders have to deal with. Furthermore, the
expanded attack surface and the increasingly sophisticated threat landscape pose
the most significant challenges to traditional cyber security tools.
For example, the rapid growth of IoT connects a huge number of vulnerable
devices to the Internet, therefore exponentially expands the attack surface for
hackers. The IDC study of worldwide IoT market predicts that the installed base
of IoT endpoints will grow from 9.7 billion in 2014 to more than 25.6 billion
in 2019, hitting 30 billion in 2020 [10]. However, the fast growth of IoT also
exponentially expands the attack surface for hackers. A recent study released by
Hewlett Packard [11] showed that 70% of IoT devices contain serious vulnerabili-
ties. The scale of IoT and the expanded attack surface make traditional network-
based security controls unmanageable and unable to secure all communications
generated by the connected devices. The convergence of information technology
and operations technology driven by the IoT further complicates the task of net-
work administrators.
As another example, advanced persistent threat (APT) has become a serious
threat to business, but traditional detection methods are not effective defending
against it. APT is characterized by being “advanced” in terms of using sophisticated
malware to explore system vulnerabilities and being “persistent” in terms of using
an external command and control system to continuously monitor and extract data
from a specific target. Traditional security is not effective on APT because

◾◾ APT often uses zero-day vulnerabilities to compromise the target. Traditional


signature-based defense does not work on those attacks.
◾◾ Malware used by APT usually initiates communication to the command and
control server from inside, which makes perimeter-based defense ineffective.
◾◾ APT communications are often encrypted using SSL tunnels, which makes
traditional IDS/firewall unable to inspect its contents.
◾◾ APT attacks usually hide in the network for a long time and operate in stealth
mode. Traditional security, which lacks the ability to retain and correlate
events from different sources over a long time, is not capable enough to detect
them.

In short, new cybersecurity challenges make traditional security mechanisms


less effective in many cases, especially when big data is involved.
The Power of Big Data in Cybersecurity ◾ 11

1.2.3 Big Data Analytics Offers New


Opportunities to Cybersecurity
Big data analytics offers the opportunity to collect, store, and process enormous
cybersecurity data. This means that security analytics is no longer limited to ana-
lyzing alerts and logs generated by firewalls, proxy servers, IDSs, and web applica-
tion firewalls (WAFs). Instead, security analysts can analyze a range of new datasets
in a long time period that gives them more visibility into what’s happening on their
network. For example, they can analyze network flows and full packet captures for
network traffic monitoring. They can use communication data (including email,
voice, and social networking activity), user identity context data, as well as web
application logs and file access logs for advanced user behavior analytics.
Furthermore, business process data, threat intelligence, and configuration
information of the assets on the network can be used together for risk assessments.
Malware information and external threat feeds (including blacklists and watch-
lists), GeoIP data, and system and audit trails may help with cyber investigations.
The aggregation and correlation of these various types of data provides more con-
text information that helps broaden situational awareness, minimize cyber risk, and
improve incident response. New use cases are enabled through big data’s capabili-
ties to perform comprehensive analyses through distributed processing and with
affordable storage and computational resources.

1.3 Applying Big Data Analytics in Cybersecurity


1.3.1 The Category of Current Solutions
Existing efforts of applying big data analytics into cybersecurity can be grouped
into the following three major categories [12]:

◾◾ Enhance the accuracy and intelligence of existing security systems.


Security analytics solutions in this category use ready-to-use analytics to
make existing systems more intelligent and less noisy so that the most egre-
gious events are highlighted and prioritized in queues, while alert volume
is reduced. The big data aspect of this solution domain comes in a more
advanced phase of deployment, where data and alerts from separate systems,
e.g., data loss prevention (DLP), SIEM, identity and access management
(IAM), or endpoint protection platform (EPP), are enriched with contextual
information, combined and correlated using canned analytics. This gives an
enterprise a more intelligent and holistic view of the security events in its
organization.
12 ◾ Big Data Analytics in Cybersecurity

◾◾ Combine data and correlated activities using custom or ad hoc analytics.


Enterprises use big data analytics solutions or services to integrate internal and
external data, structured as well as unstructured, and apply their own customized
or ad hoc analytics against these big data sets to find security or fraud events.
◾◾ External cyber threat and fraud intelligence.
Security analytics solutions apply big data analytics to external data on
threats and bad actors, and, in some cases, combine external data with other
relevant data sources, like supply chains, vendor ranking, and social media.
Most vendors of these solutions also create and support communities of
interest where threat intelligence and analytics are shared across customers.
Vendors in this category actively find malicious activities and threats from
the Internet, turn this information into actionable data such as IP addresses
of known bad servers or malware signatures, and share with their customers.

1.3.2 Big Data Security Analytics Architecture


In general, a big data security analytics platform should have five core components
as shown in Figure 1.2.

◾◾ A basic data storage platform to support long-term log data retention and
batch processing jobs. There are a few offerings in the market that skip this
layer and use a single NoSQL database to support all the data retention,
investigation access, and analytics. However, considering all the available
open-source applications in the Hadoop ecosystem, a Hadoop-based plat-
form still gives a more economical, reliable, and flexible data solution for
larger data sets.
◾◾ A data access layer with fast query response performance to support inves-
tigation queries and drill-downs. Because the data access inside Hadoop

Services/apps

Data presentation

Integration
consumption

Data access
Data

Data storage

Figure 1.2 Big data security analytics architecture.


Other documents randomly have
different content
pues lo avia de perder
todo junto y en un dia
o muerte porque no vienes
y llevas esta alma mia
de aqueste cuerpo mezquino
pues se te agradeceria?

61 This is one of the best pieces of the kind.

Vitorioso buelve el Cid


a san Pedro de Cardeña,
de las guerras que ha tenido
con los Moros de Valencia.
Las trompetas van sonando,
por dar aviso que llega,
y entre todos se señalan
los relinchos de Babieca.
El Abad, y monjes salen
a recebirlo a la puerta,
dando alabanças a Dios,
y al Cid mil enorabuenas.
Apeose del calvallo,
y antes de entrar en la Iglesia,
tomò el pendon en sus manos,
y dize desta manera.
Sali de ti templo santo
desterrado de mi tierra,
mas ya buelvo a visitarte
acogido en las agenas.
Desterrome el Rey Alphonso,
porque alla en Santagadea
le tomè el juramento
con mas rigor que el quisiera.
Las leyes eran del pueblo,
que no excedi un punto dellas,
pues como leal vassallo
saquè a mi rey desospecha.
O embidiosos Castellanos,
quan mal pagays la defensa
que tuvistes en mi espada,
ensanchando vuestra cerca.
Veys aqui os traygo ganado
otro reyno, y mil fronteras,
que os quiero dar tierras mias
aunque me echeys de las vuestras.
Pudiera dezirlo a estraños,
mas para cosas tan feas
soy Rodrigo de Bivar
Castellano a las derechas.

The concluding line:—Castellano a las derechas, (the


Castilian as he ought to be) is a description of the Cid,
which was well adapted to produce an impression on the
hearts of the people to whom it was addressed.

62 The following is the commencement of this romance:—

De los trofeos de amor


ya coronadas sus sienes,
muy gallardo entra Ganzul
a jugar cañas a Gelves,
en un hovero furioso,
que al ayre en su curso excede,
y en su pujança y rigor
un leve freno detiene.
La librea de los pajes
es roxa, morada, y verde,
divisa cierta y colores
de la que en su alma tiene:
todos con lanças leonadas
en corredores ginetes,
adornados de penachos,
y de costosos jaezes:
el mismo se trae la adarga,
en quien un fenix parece,
que en vivas llamas se abrasa,
y en ceniza se resuelve;
la letra si bien me acuerdo,
dize: Es inconveniente
poderse dissimular
el fuego que amor enciende, &c.

63

El que poblò las masmorras


De Christianos Caballeros.

64 The subjoined passage forms the latter part of this


romance.

La hermosissima Balaja,
que llorosa en su aposento
las sinrazones del Rey
le pagavan sus cabellos
como tanto estruendo oyò
a un valcon salio corriendo,
y enmudecida le dixo,
dando vozes con silencio:
Vete en paz, que no vas solo,
y en mi ausencia ten consuelo,
que quien te echò de Xerez,
vno te echara de mi pecho:
El con la vista responde,
yo me voy, y no te dexo.
De las agravios de Rey
para tu firmeza a pelo,
Con esto passò la calle,
los ojos atras bolviendo
dos mil vezes: y de Andujar
tomò el camino derecho.

65 Such, for example, is the following ludicrous description of


Hector’s funeral.

En las obsequias de Hector


esta la reyna Troyana
con la linda Policena
y con otras muchas damas
tambien estavan los Griegos
sino Achiles que faltava
que fue a la postre de todos
y en el tempo se assentava
frontero la reyna Elena
que por Hector lamentava
mirando su hermosura
con gran cuydado pensava
si Menelao no fuera
rey Griego la conquistara
para casarse con ella
segun era muy loçana
y assí triste y pensativo
no podia echar la habla
quando miro a Policena
en la coraçon le pesara, &c.
66

Con ravia esta el rey David


rasgando su coraçon
sabiendo que alli en la lid
le mataron a Absalon
cubriose la su cabeça
y subiose a un mirador
con lagrimas de sus ojos
sus canas regadas son
hablando de la su boca
dize esta lamentacion
o fili mi fili mi
o fili mi Absalon
que es de la tu hermosura
tu estremada perficion
los tus cabellos dorados
parecian rayos de sol
tus ojos lindos azules
que jacinta de Sion
o manos que tal hizieron
enemigos de razon, &c.

Any person who in those times was capable of making


redondilla verses, must have found it very easy to produce
such romances as this.

67 No vale las coplas de la Sarabanda, is a proverb of


precisely the same signification as—No vale las coplas de
Calainos, according to Sarmiento. See the remark, page
55. The two proverbs have probably been confounded, for
the romance of Calainos is not in coplas.

68 The following is one of those pieces which may be


regarded as untranslatable.
Rosafresca Rosafresca
tan garrida y con amor
quando y’os tuve en mis braços
no os sabia servir no
y agora que os servira
no os puedo yo averno.
Vuestra fue la culpa amigo
vuestra fue que mia no
embiastes me una carta
con un vuestro servidor
y en lugar de recaudar
el dixera otra razon
qu’erades casado amigo
alla en tierras de Leon
que teneys muger hermosa
y hijos como una flor.
Quien os lo dixo señora
no os dixera verdad no
que yo nunca entre en Castilla
ni alla en tierras de Leon
sino quando era pequeño
que no sabio de amor.

A piece, which is a companion to the above, commences


thus:

Frontefrida, Frontefrida,
Frontefrida, y con amor,
Do todas las avecicas
Van tomar consolacion, &c.

The fiction on which this second song is founded must,


notwithstanding its native beauty, appear a very absurd
fancy to the naturalist, as it describes a nightingale
wooing a turtle dove.

69 “Fizo assaz buenas canciones,” says the Marquis of


Santillana, in his antiquated Spanish, speaking of his
grandfather. The remaining notices which he gives of the
origin of Spanish poetry communicate nothing, in addition
to what has been already mentioned, on those things
respecting which it is most desirable to be informed.

70 See Velasquez, according to Dieze, page 302.

71 See Sarmiento, page 345.

72 See the observations of Sarmiento, page 352.

73 An extract made from this treatise of the Marquis of Villena


by Gregorio Mayans, may be found in the Origines de la
lengua Española, tom. ii. pag. 321. The whole work
probably exists in manuscript in Spanish libraries.

74 Tanto es el provecho, que viene desta dotrina a la vida


civil, quitando ocio y ocupando los generosos ingenios en
tan honesta investigacion, que las otras naciones
desearon y procuraron haver entre si escuela desta
dotrina, y por esso fue ampliada por el mundo en diversas
partes.—The measure of this sonorous period will not be
overlooked.

75 Temporum iniquitate sublimi virtute superata, honorem


vitæ ac bonum nomen fallacibus delinimentis omnibus,
quæ magnam quamque fortunam velut pedissequi
comitantur, præferebat, says, in allusion to him, Nicolas
Antonio, who at the same time refers to the Chronicles,
from which he had drawn his information respecting the
Marquis of Santillana.
76 This elegy is inserted along with other poems by the
Marquis in all the editions of the Cancionero general,
immediately after the spiritual poems. No complete
collection of the works of this celebrated man has yet
been printed.

77 That the Marquis had read Dante can scarcely be doubted,


for he quotes him in this poem:—

Assi conseguimos de aquella manera,


Hasta que llegamos en somo del monte,
No menos cansados que Dante Acheronte.

78 Thus the two following stanzas are crowded with the


names of authors, ancient and modern, with the view of
shewing the loss which Spanish literature had sustained
by the death of Villena.

Perdimos a Homero que mucho honorana


este sacro monte do nos habitamos
perdimos a Ovidio el que coronamos
del arbol laureo que muchos amava
Perdimos Horacio que nos invocava
en todos exordios de su poesia
assi disminuye la nuestra valia
que antiguos tiempos tanto prosperava.
Perdimos a Livio y a Mantuano
Macrobio, Valerio, Salustio, Magneo
pues no olvidemos al moral Agneo
de quien se loava el pueblo Romano
Perdimos a Julio y a Casaliano
Alano, Boecio, Petrarcha, Fulgencio
Perdimos a Dante, Gaufre, Terencio
Juvenal, Estacio, y Quintiliano.
79 Stanzas, like the following, deserve to be extracted from
this work, as they are calculated to shew what might have
been expected of the Marquis of Santillana, had he
cultivated his talent for poetry under more favourable
circumstances.

Mas yo a ti sola me plaze llamar,


o cithara dulce, mas que la d’Orfeo;
que tu sola ayuda, no dudo, mas creo
mi rustica mano podra ministrar.
O Biblioteca de mortal cantar,
fuente meliflua de magna eloquencia,
infunde tu grande y sacra prudencia
en mi, porque yo pueda tu planto esplicar.
A tiempo a la hora suso memorado,
assi como niño que sacan de cuna,
no se falsamente, o si por fortuna,
me vi todo solo al pie de un collado,
Salvatico espesso lexano a poblado
agreste desierto y tan espantable,
que temo verguenza, no siendo culpable,
quando por extenso lo aure recontado.
No vi la carrera de gentes cursada,
ni rastro exercido por do me guiasse,
ni persona alguna a quien demandasse
consejo a mi cuyta tan desmesurada;
Mas sola una senda poco visitada
al medio de aquella tan gran espessura,
bien como adarmento subiente a l’altura
de rayo Dianeo me fue demostrada.

80 Don Alvaro de Luna begins to speak in the first stanzas:—


Vi tesoros ayuntados
por gran daño de su dueño.
Assi como sombra o sueño
son nuestros dias contados:—
Y si fueron prorogados
por sus lagrimas algunos
desto no vemos ningunos
por nuestros negros pecados.
Abrid abrid vuestros ojos,
gentios, mirad a mi,
quanto vistes, quanto vi,
fantasmas fueron y antojos.
Con trabajos con enojos
usurpe tal señoria,
que si fue no era mia
mas endevidos despojos.
Casa, casa, guay de mi!
campo a campo alleguè
casa agena no dexè,
tanto quise quanto vi.
Agora pues ved aqui,
quanto valen mis riquezas
tierras villas fortalezas
tras quien mi tiempo perdi.

81 There is a singular pedantry, with a happy turn of


versification, in a song which commences thus:—

Antes el rodante cielo


tornara manso y quieto,
y sera piadoso Aleto,
y pavoroso Metello.
Que yo jamas olvidasse
tu virtud,
vida mia y mi salud,
ni te dexasse.
Cesar afortunado
cessara de combatir,
y harian desdezir
al Priamides armado—
Quando yo te dexarè,
ydola mia,
ni la tu philosomia
olvidarè; &c.

82 It commences thus:

Gozate, gozosa, madre,


gozo de la humanidad,
templo de la Trinidad,
elegida por dios padre,
Virgen que por el oydo
concebiste,
gaude, virgen, mater Christi,
y nuestro gozo infinido!
Gozate, luz reverida,
segun el Evangelista
por la madre del Baptista
anunciado la venida,
de nuestro gozo Señora
que trayas
vaso de nuestro mexias
gozate pulchra y decora, &c.

In this way the Gozate is repeated through a series of


stanzas.
83 Dieze, in his remarks on Velasquez, erroneously refers to
the publication of Gregorio Mayans, for the proverbs in
verse; but only the original proverbs, without versification,
(refranes que dicen las viejas tras el huego) as collected
by the Marquis, are given in the second volume of that
work, p. 179. The greater part deserve to be better
known, but many of them are unintelligible to foreigners.

84 See the note, page 24.

85 E que cosa es la poesia, que en nuestra vulgar (there is


something equivocal here, for this term was not
vernacular in the Castilian language) llamamos gaya
sciencia, sino un fingimiento de cosas utiles, è veladas con
muy fermosa cobertura, compuestas, distinguidas,
escondidas, por certo cuento, peso, è medida.

86 He appeals to St. Isidore, whom he cites as a guarantee


for this origin of poetry:—Isidro Cartaginès, santo
Arzobispo Hispalense, assi lo pruebra y testifica, e quiere,
que el primero que fizo rythmos y cantó en metro hay sido
Moysen, y despues Joshue, David, Salomon, y Job.

87 Honestæ conditionis, says Nicolas Antonio, speaking of his


family.

88 Only the supplement to this poem is contained in the


Cancionero general. The poem itself was probably too
long to be included in that collection. However, in the
editions of the collected works of Mena (for instance, that
which I have now before me, intitled—Todas las obras del
famosissimo poeta Juan de Mena, &c. Anveres, 1552, 8º)
which Dieze notices, it fills the greater portion of the
volume, and is accompanied by a copious commentary by
Fernan Nuñez.
89 The emphatic praise bestowed on this poem in Dieze’s
observations on Velasquez, (page 168), according to
which Juan de Mena “maintains to his advantage a
comparison with all the poets of all ages,” is sufficient to
prove Dieze’s deficiency in sound criticism.

90 The second stanza contains the theme, but it is very


imperfectly expressed:—

Tus casos fallaces, Fortuna, cantamos


Estados de gentes que giras y trocas,
Tus muchas mudanzas, tus firmezas pocas,
Y las que en tu rueda quexosos hallamos.

91 Mena, politely enough, solicits permission of Fortune to


read her a lesson:

Dame licencia, mudable Fortuna,


Porque yo blasme de ti lo que devo.

Then, in well turned antitheses, he allows her a sort of


regularity which contradicts itself:—

Que tu firmeza es, no ser constante,


Tu temperamento es destemplanza,
Tu mas cierto orden es desordenanza, &c.

92 Providence appears as a most beautiful young woman:—

Una donzella tan mucho hermosa,


Que ante su gesto es loco quien osa
Otras beldades loar de mayores.

93 In the fourth stanza a patriotic flight seems to promise the


recurrence of similar passages:

Como que creo, que fossen menores,


Que los Africanos, los hechos del Cid?
Ni que feroces menos en la lid
Entrassen los nuestros que los Agenores? &c.

On another occasion the author addresses an invocation


to his native city Cordova:

O flor de saber y cabelleria,


Cordova madre, tu hijo perdona,
Si en los cantares, que agora pregona,
No divulgarè tu sabiduria, &c.

94 From the following stanzas the degree of talent possessed


by Juan de Mena for the poetical description of natural
objects, without allegory, may be fairly estimated.

Bien como medico mucho famoso


Que trae el estilo por mano seguido
En cuerpo de golpes diversos herido
Luego socorre alo mas peligroso,
Assi aquel pueblo maldito sañoso
Sintiendo mas daño de parte del Conde
Con todas sus fuerças juntando responde
Alli do el peligro mas era dañoso.

Alli disparavan bombardas y truenos


Y los trabucos tiravan ya luego
Piedras y dardos y hachas de fuego
Con que los nuestros hazian ser menos.
de Moros tenidos por buenos
Lançan temblando las sus azagayas,
Passan las lindes palenques y rayas,
Doblan sus fuerças con miedos agenos.

Mientra morian y mientra matavan


De parte del agua ya crecen las ondas
Y cobran las mares sobervias y hondas
Los campos que ante los muros estavan,
Tanto que los que de alli peleavan
A los navios si se retrayan,
Las aguas crescidas les ya defendian
Tornar a las fustas que dentro dexavan.

95 When the poet, in his ideal world, sees Don Alvaro, by a


singular fancy he pretends not to know him, in order that
he may question his guide (Providence) respecting him, in
imitation of a similar passage in Homer:—

Tu, Providencia, declara de nuevo,


Quien es aquel Caballero, que veo,
Que mucho en el cuerpo parece a Tydeo,
E en consejo a Nestor el longevo.

Among other things Providence replies:—

Este cavalga sobre la Fortuna


Y doma su cuello con asperas riendas,
Y aunque del tenga tan muchas deprendas,
Ella no le osa tocar de ninguna.
Miralo, miralo en platica alguna,
Con humildes, no tanto feroces!
Como, indiscreto, y tu no conoces
Al Condojos estable Alvaro de Luna?

96 For instance, the word longevo in the verses quoted


above.

97 The opening stanzas may be regarded as a poetic preface


or dedication; but they gain nothing by that.

Al muy prepotente Don Juan el Segundo,


Aquel, con quien Jupiter tuvo tal zelo,
Que tanta de parte le haze del mundo,
Quanta a si misme se haze en el cielo;
Al gran d’España, al Cesar novelo,
Al que es con fortuna bien afortunado
Aquel, con quien cabe virtud y reynado,
A el las rodillas hincadas por suelo.

98 This poem is not to be found in the Cancionero general,


but it is included in the Obras, mentioned in the note,
page 92. Juan de Mena gave it the absurd title of
Calamicleos, compounded from the latin calamitas and the
Greek κλεος. It was afterwards called, simply, La
Coronacion.

99 Most of these questions were not very difficult to answer;


for instance, the following, which is preceded by three
introductory stanzas in a very courtly style:—

Mostradme qual es aquel animal,


que luego se mueve en los quatro pies,
despues se sostiene en solos los tres,
despues en los dos va muy mas ygual.
Sin ser del especie quadrupedal
el curso que hizo despues reytera
assi que en los quatro d’aquesta manera
fenece el que nace de su natural.
Del hombre se halla ser gran enemigo,
porque lo hiere do nunca sospecha,
y donde mas plaze menos aprovecha
tanta ponçoña derrama consigo.
Dad vos Señor pues un tal castigo,
o de virtudes tal arma que vista,
porque alomenos punando resista
contra quien tiene tal guerra comigo.

100 The poem commences thus:—

Canta tu, Christiana musa,


La mas que civil batalla,
Que entre voluntad se halla
Y Razon, que nos accusa.

101 Nicolas Antonio, whom Dieze follows in his remarks on


Velasquez, is the authority for these notices.

102 In the beginning of the sixteenth century, Spanish books


were printed in Seville by German printers. At the end of
an edition, probably the first, of the proverbs collected by
the Marquis of Santillana, (see page 88,) are the following
words, which Mayans y Siscar has reprinted:—Aqui se
acaben los refranes—imprimidos en la muy noble y leal
civdad de Sevilla por Jacobo Comberger, Aleman, año
1508.
103 On this subject Nicolas Antonio’s Bib. Hisp. vet. lib. x. cap.
6. may be compared with Velasquez and Dieze, page 165.

104 To this number they amount in the old folio edition,


printed with gothic characters, which forms one of the
literary curiosities of the library of Gottingen. Dieze, in his
observations on Velasquez, page 177, gives a particular
account of this, as well as of the succeeding editions of
the Cancionero general.

105 With this spiritual composition, the Cancionero general


commences. The reader will have enough in the first
stanza:—

Enantes, que culpa fuesso cansada,


Tu, Virgen benigna, ya yves delante,
Tan lexos del crimen y del semejante,
Que sola quedaste daquel libertada, &c.

106 This silly conceit, which consists only of eight lines,


commences thus:—

La M madre te muestra,
La A te manda adorar, &c.

107 The Ave begins thus:—

Ave, preciosa Maria,


Que se deve interpretar
Trasmontana de la mar,
Que los mareantes guia.
108 In the third strophe he thus addresses king Ferdinand:—

Gran señor, los, que creyeron


Estas consejeros tales,
De sus culmines reales
En lo mas hondo cayeron.
Si esto contradiran
Algunos con ambicion,
Testigos se les daran.
Uno sera Roboan,
Hijo del rey Solomon.

109 A new edition of Jorge Manrique’s Coplas, with glosses or


poetic paraphrases by various authors, appeared at
Madrid in 1779.

The following are the two first strophes, and the rhythmic
structure of the rest is not less beautiful.

Recuerde el alma dormida,


avive el seso y despierte
contemplando
come se pasa la vida,
come se viene la muerte
tan callando:
quan presto se va el placer,
como despues de acordado
da dolor,
como a nuestro parescer
qualquiera tiempo pasado
fue mejor.
Pues que vemos lo presente
quan en un punto se es ido
y acabado,
si juzgamos sabiamente,
daremos lo no venido
por pasado
No se engañe nadie, no,
pensando que ha de durar
lo que espera,
mas que duro lo que viò
pues que todo ha de pasar
por tal manera.

110 For instance, the following passage from a song by Juan


de Mena:—

Ya dolor del dolorido,


Que con olvido cuydado,
Pues que antes olvidado
Me veo, que fallecido.
Ya fallece mi sentido &c.

Or:—

Cuydar me hace cuydado


Lo que cuydar no devria,
Y cuydando en lo passado
Por mi no passa alegria.

Such plays of words are to be found throughout the whole


Cancionero.

111 The commencement of one of his songs, the two first


strophes of which are subjoined, is exceedingly beautiful;
but in the sequel the lyric spark is extinguished by
pedantry.

Muy mas clara que la luna


sola una
en el mundo vos nacistes,
tan gentil, que no vecistes
ni tuvistes
competidora ninguna,
Desde niñez en la cuna
cobrastes fama, beldad,
con tanta graciosidad,
que vos doto la fortuna.
Que assi vos organizo
y formò
la composicion humana,
que vos soys la mas loçana,
soberana
que la natura criò.
Quien sino vos mereciò
de virtudes ser monarcha?
Quanto bien dixo Petrarcha,
por vos lo profetizo.

It would be absurd to attempt the translation of many of


the specimens which are necessary to the illustration of
this work; and with respect to these lines the tender
breathing of the poetry would be entirely lost in a literal
version.

112 Reason, like a talkative person, commences the dialogue,


and has also the last word; she thus addresses her
opponent:—
Pensamiento, pues mostrays
en vos misma claro el daño,
pregunt’os, que me digays
camino de tanto engaño,
do venis o donde vays
a tierra, que desconoce
muy presto la gente della
donde nace una querella,
y quien bien no le conoce
vive en ella.
Porque en ella ay una suerte,
d’una engañosa esparança
que el plazer nos da muerte,
por do el fin de su holgura
en trabajo se convierte.
Do sus glorias alcançadas,
puesto ya que sean seguras,
o con quantas amarguras
hallaras que son mezcladas
sus dulçuras!

113 He is particularly successful in expressing with old Spanish


plainness the emotions of passion; as for instance in the
following concluding strophes of a farewell song.

De vos me parto, quexando,


y de mi, muy descontento
de mi triste pensamiento.
Mi vivir lo va llorando
vuestro mal conocimiento.
Assi que por sola vos
yo de todos vo enemigo,
pues me parto, como digo,
mal con vos y mal con Dios,
y mal comigo.
Aunque desto en la verdad
poca culpa tengo yo,
que mi fé no se mudò,
vuesta mala voluntad
m’a traido en lo qu’ estò.
Por do mis cuytas agora
vuestras seran desde aqui,
pues por vos a vos perdi,
y por vos a Dios, señora,
y mas a mi.

114 What a picturesque storm of passion appears under the


antiquated garb of the following stanzas! and with what a
fantastic play of words are they interspersed!

La fuerça del fuego, que alumbra, que ciega


mi cuerpo, mi alma, mi muerte, mi vida,
do entrado hiere, do toca, do llega,
mata y no muere su llama encendida.
Pues que harè, triste, que todo me ofende?
Lo bueno y lo malo me causan congoxa,
quemandome el fuego que mata, qu’ enciende,
su fuerça que fuerça que ata, que prende,
que prende, que suelta, que tira, que afloxa.
Aso yre triste, que alegre me halle
pues tantos peligros me tienen en medio,
que llore, que ria, que grite, que calle,
ni tengo, ni quiero, ni espero remedio?
Ni quiero que quiera, ni quiero querer,
pues tanto me quiere tan raviosa plaga,
ni ser yo vencido, ni quiero vencer,
ni quiero pesar, ni quiero plazer,
ni se que me diga, ni se que me haga.

115 The following are the first and second strophes of this
song. Love is here a hell, in which the thoughts burn.

Que tu beldad fue querer!


Mas a ti que a mi me quiero.
Tu beldad fue mensagero
de morir en tu poder.
Tu nubloso disfavor
me cerco sin fin eterno
d’unos fuegos qu’es amor
cuyo nombre es el infierno.
Qu’en su encendida casa
se queman mis pensamientos,
alli montan los tormentos
mis entrañas hazen brasa.
Alli sospiro los dias,
que morir no puede luego
alli las lagrimas mias
fortalezen mas en fuego.

116 This curious composition begins like a testamentary


arrangement, and then immediately takes a poetic turn:—

Pues Amor quiere que muera,


y de tan penada muerte,
en tal edad,
pues que yo en tiempo tan fuerte,
quiero ordenar mi postrera
voluntad.
Pero ya que tal me siento,
que no lo podre hazer,
la que causa mi tormento
pues que tiene mi poder
ordene mi testamento.
Y pues mi ventura quiso
mis pensamientos tornar
ciegos, vanos,
no quiero otro paraiso,
sino mi alma dexar
en sus manos.
Pero que lleve de claro
la misma forma y tenor,
d’aquel que hizo d’amor
don Diego Lopez de Haro,
pues que yo muero amador.

117 The following is by a poet named Tapia.

Gran congoxa es esperar,


quando tarda el esperança,
mas quien tiene confiança
por tardar,
no deve desesperar.
Assi que vos, pensamiento,
que passays pena esperando,
galardon se va negando,
bien lo siento,
mas tened vos sufrimiento.
Y quiça podreys ganar
con firmeza sin dudança
lo cierto del esparança
que el tardar
no lo puede desviar.

118 The author of the following Villancico is named Escriva.


Que sentis, coraçon mio,
no dezis,
que mal es el que sentis.
Que sentistes aquel dia,
quando mi señora vistes,
que perdistes alegria,
y descando despedistes,
como a mi nunca bolvistes.
no dezis,
donde estays que no venis.
Qu’ es de vos, qu’ en mi nos fallo,
coraçon, quien os agena?
Qu’ es de vos, que aunque callo,
vuestro mal tambien me pena?
Quien os atò tal cadena.
no dezis,
que mal es el que sentis.

119 These glosses, which certainly belong to the fifteenth


century, prove the still higher antiquity of the glossed
romances. As a proof of this, we may quote the
commencement of a gloss of the Rosa fresca, (see p. 74),
though it is not one of the most successful productions of
this class.

LA GLOSA DE PINAR.

Quando y os quise querida,


si supiera conoceros,
n’os tuviera yo perdida
ni acuciara yo la vida
agora para quereros.
Y porqu’ es bien que padezca
desta causa mi dolor,
llam’os yo sin qu’ os merezca,
Rosa fresca, rosa fresca,
tan garrida y con amor.
Llam’os yo con voz plañida,
llena de gran compassion,
con el alma entristecida
del angustia dolorida,
que ha sufrido el coraçon.
Que le haze mil pedaços,
yo muero do quier que vò
pues que por mis embaraços.
Quando y’os tuve en mis braços
no vos supe servir, no.
No porque os uviesse errado,
con pensamiento de errar,
mas si me days por culpado,
pues publico mi pecado
deveys me de perdonar.
No porque quando os servia
mi querer os desirvio,
mas porque passo solia,
Y agora que os serviria,
no vos puedo yo aver, no.

120 The device of an enamoured knight in the true Spanish


style: WITHOUT THEE I AM WITHOUT GOD, AND WITHOUT
MYSELF, was thus glossed.

Mote.

Sin vos, y sin Dios y mi.


Glosa de Don Jorge Manrique.

Yo soy quien libre me vi,


yo quien pudiera olvidaros,
yo so el que por amaros
estoy desque os conoci
sin Dios y sin vos y mi.
Sin Dios, porque en vos adoro
sin vos, pues no me quereys,
pues sin mi ya esto decoro,
que vos soys quien me teneys.
Assi que triste naci,
pues que pudiera olvidaros,
yo soy el que por amaros
esto desque os conoci
sin Dios y sin vos y mi.

121 An accurate idea of all the romances of this class may be


derived from the Historia de los Vandos de los Zegris y
Abencerrages, Caballeros Moros de Granada, a work well
known to those who are acquainted with Spanish
literature. It has been several times printed. The edition
which I have now before me (Lisboa 1616,) seems to be
one of the latest. On the title page the author styles
himself, Ginez Perez de Hita, and on that page also appear
the words, Aora nuevamente sacado de un libro Arabigo.
The German critic Blankenburgh, is of opinion, that there
is no more reason for supposing this work to be a
translation from the Arabic, than that Don Quixote was
derived from a similar source. But the word sacado on the
title page, by no means indicates that it is a translation.
The author has evidently derived much of his information,
such for instance, as the genealogical register of the
families, from Moorish sources. He has probably availed
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like