Community Detection in the Setting of Generalized Random Dot Product Graphs
Abstract: Graph and network data, in which samples are represented not as a collection of feature vectors but as relationships between pairs of observations, are increasingly widespread in various fields, ranging from analyzing data in the social sciences to training machine learning models for artificial intelligence tasks. One common goal of analyzing graph data is community detection or graph clustering, in which the graph is partitioned into subgraphs in an unsupervised yet meaningful manner (e.g., by optimizing an objective function or recovering unobserved labels). Because traditional clustering techniques were developed for data that can be represented as vectors, they cannot be applied directly to graphs. In this research, we investigate the use of a family of spectral decomposition based approaches for community detection for block models (random graph models with inherent community structure) by demonstrating how under the generalized random dot product graph (GRDPG) framework all block models can be represented as a collection of feature vectors organized by community, then applying clustering methods for these feature vector representations by exploiting the linear structures that the block models induce, and finally deriving the asymptotic properties of these methods. We further extend this connection between block models and community- organized GRDPGs to propose more flexible, nonlinear community structures, using real graphs with nonlinear structures as motivating examples.
About the speaker: John Koo is a Senior Collaborative Statistician for the Biostatistics Consulting and Collaboration Core at NYU GPH. He earned his PhD at the Department of Statistics at Indiana University Bloomington, where he worked on the theory and analysis of graph and network statistics. He later went on to complete his postdoctoral fellow at the Department of Biostatistics and Health Data Science at the Indiana University School of Medicine. In addition to his academic background, John has held various data science roles, primarily in agriculture, in which he advised farmers in designing experiments, monitoring their fields, and analyzing crop, soil, and climate data.
This event is open to the NYU community for in-person attendance. The general public may RSVP for virtual attendance.