1st Edition
Practical Graph Mining with R
Discover Novel and Insightful Knowledge from Data Represented as a Graph
Practical Graph Mining with R presents a "do-it-yourself" approach to extracting interesting patterns from graph data. It covers many basic and advanced techniques for the identification of anomalous or frequently recurring patterns in a graph, the discovery of groups or clusters of nodes that share common patterns of attributes and relationships, the extraction of patterns that distinguish one category of graphs from another, and the use of those patterns to predict the category of new graphs.
Hands-On Application of Graph Data Mining
Each chapter in the book focuses on a graph mining task, such as link analysis, cluster analysis, and classification. Through applications using real data sets, the book demonstrates how computational techniques can help solve real-world problems. The applications covered include network intrusion detection, tumor cell diagnostics, face recognition, predictive toxicology, mining metabolic and protein-protein interaction networks, and community detection in social networks.
Develops Intuition through Easy-to-Follow Examples and Rigorous Mathematical Foundations
Every algorithm and example is accompanied with R code. This allows readers to see how the algorithmic techniques correspond to the process of graph data analysis and to use the graph mining techniques in practice. The text also gives a rigorous, formal explanation of the underlying mathematics of each technique.
Makes Graph Mining Accessible to Various Levels of Expertise
Assuming no prior knowledge of mathematics or data mining, this self-contained book is accessible to students, researchers, and practitioners of graph data mining. It is suitable as a primary textbook for graph mining or as a supplement to a standard data mining course. It can also be used as a reference for researchers in computer, information, and computational science as well as a handy guide for data analytics practitioners.
Introduction Kanchana Padmanabhan, William Hendrix, and Nagiza F. Samatova
Graph Mining Applications
Book Structure
An Introduction to Graph Theory Stephen Ware
What Is a Graph?
Vertices and Edges
Comparing Graphs
Directed Graphs
Families of Graphs
Weighted Graphs
Graph Representations
An Introduction to R Neil Shah
What Is R?
What Can R Do?
R Packages
Why Use R?
Common R Functions
R Installation
An Introduction to Kernel Functions John Jenkins
Kernel Methods on Vector Data
Extending Kernel Methods to Graphs
Choosing Suitable Graph Kernel Functions
Kernels in This Book
Link Analysis Arpan Chakraborty, Kevin Wilson, Nathan Green, Shravan Kumar Alur, Fatih Ergin, Karthik Gurumurthy, Romulo Manzano, and Deepti Chinta
Introduction
Analyzing Links
Metrics for Analyzing Networks
The PageRank Algorithm
Hyperlink-Induced Topic Search (HITS)
Link Prediction
Applications
Graph-Based Proximity Measures Kevin A. Wilson, Nathan D. Green, Laxmikant Agrawal, Xibin Gao, Dinesh Madhusoodanan, Brian Riley, and James P. Sigmon
Defining the Proximity of Vertices in Graphs
Evaluating Relatedness Using Neumann Kernels
Applications
Frequent Subgraph Mining Brent E. Harrison, Jason C. Smith, Stephen G. Ware, Hsiao-Wei Chen, Wenbin Chen, and Anjali Khatri
About Frequent Subgraph Mining
The gSpan Algorithm
The SUBDUE Algorithm
Mining Frequent Subtrees with SLEUTH
Applications
Cluster Analysis Kanchana Padmanabhan, Brent Harrison, Kevin Wilson, Michael L. Warren, Katie Bright, Justin Mosiman, Jayaram Kancherla, Hieu Phung, Benjamin Miller, and Sam Shamseldin
Introduction
Minimum Spanning Tree Clustering
Shared Nearest Neighbor Clustering
Betweenness Centrality Clustering
Highly Connected Subgraph Clustering
Maximal Clique Enumeration
Clustering Vertices with Kernel k-Means
Application
How to Choose a Clustering Technique
Classification Srinath Ravindran, John Jenkins, Huseyin Sencan, Jay Prakash Goel, Saee Nirgude, Kalindi K. Raichura, Suchetha M. Reddy, and Jonathan S. Tatagiri
Overview of Classification
Classifcation of Vector Data: Support Vector Machines
Classifying Graphs and Vertices
Applications
Dimensionality Reduction Madhuri R. Marri, Lakshmi Ramachandran, Pradeep Murukannaiah, Padmashree Ravindra, Amrita Paul, Da Young Lee, David Funk, Shanmugapriya Murugappan, and William Hendrix
Multidimensional Scaling
Kernel Principal Component Analysis
Linear Discriminant Analysis
Applications
Graph-Based Anomaly Detection Kanchana Padmanabhan, Zhengzhang Chen, Sriram Lakshminarasimhan, Siddarth Shankar Ramaswamy, and Bryan Thomas Richardson
Types of Anomalies
Random Walk Algorithm
GBAD Algorithm
Tensor-Based Anomaly Detection Algorithm
Applications
Performance Metrics for Graph Mining Tasks Kanchana Padmanabhan and John Jenkins
Introduction
Supervised Learning Performance Metrics
Unsupervised Learning Performance Metrics
Optimizing Metrics
Statistical Significance Techniques
Model Comparison
Handling the Class Imbalance Problem in Supervised Learning
Other Issues
Application Domain-Specific Measures
Introduction to Parallel Graph Mining William Hendrix, Mekha Susan Varghese, Nithya Natesan, Kaushik Tirukarugavur Srinivasan, Vinu Balajee, and Yu Ren
Parallel Computing Overview
Embarassingly Parallel Computation
Calling Parallel Codes in R
Creating Parallel Codes in R Using Rmpi
Practical Issues in Parallel Programming
Index
Exercises and Bibliography appear at the end of each chapter.
Biography
Nagiza F. Samatova is an associate professor of computer science at North Carolina State University and a senior research scientist at Oak Ridge National Laboratory.
"The authors provide a tour de force introduction to the different data representations (vectors, matrices), and introduce graph structures and the questions that can be answered with them. ... The book has many strong points. There is a companion website that hosts slide presentations for almost all chapters, as well the R code needed to run the example code. The impatient reader can start going through the presentations and experimenting with the code right away. The more patient reader can read the book from cover to cover. For many reader categories, this summary of existing relevant work and approaches for data mining graph structures is a welcome addition, for which the authors deserves much praise."
--Radu State, Computing Reviews