Da Yan, associate professor of Computer Science at the Luddy School of Informatics, Computing, and Engineering, collaborated on research that received best demo runner-up status at the 40th IEEE International Conference on Data Engineering 2024 in the Netherlands for the paper, “FSM-Explorer: An Interactive Tool for Frequent Subgraph Pattern Mining from a Big Graph.”
The demo paper was led by Yan, former Ph.D. student Jalal Khalil, now an assistant professor at St. Cloud State University, as well as members of Yan’s group and other researchers from Auburn University and Singapore’s Nanyang Technological University.
Yan said FSM-Explorer showcases how users can easily mine frequent subgraph patterns from a big real graph by parameter tuning in the system, and how they can conveniently examine the many matched instances one batch at a time to improve productivity.
In other words, it can lead to faster, more efficient, problem solving.
Yan said the backend of FSM-Explorer is built with the T-FSM system developed by Yan’s group (published at the SIGMOD 2023 Conference) and led by his Ph.D. student, Lyuheng Yuan. It’s used for the parallel mining of frequent subgraph patterns in a big graph.
“T-FSM features its ideal speedup ratio with a number of CPU cores used,” Yan said, “and a new anti-monotonic support measure called Fraction-Score that is more accurate than the conventional measure called MNI.”
Fraction-Score is a generalized support measure designed to capture all possible instances and handle overlapping instances. It was originally used to develop efficient algorithms to solve co-location pattern mining problems, and extended to mine frequent subgraph patterns in the T-FSM work.
Yan said FSM-Explorer also builds user-friendly graphical interfaces on top of T-FSM to allow users to easily use T-FSM to find frequent subgraph patterns of interest in their real applications with real graph data.
Yan’s group has worked on various graph-parallel systems -- including highly regarded Pregel+, Blogel and G-thinker -- for the last decade. Yan said the results have been published in the best conferences in data base systems and parallel computing.
He said recent systems feature the use of a task-based programming paradigm called T-thinker, which was advertised as a “great innovative idea” by the Computing Community Consortium. The T-FSM system and its demonstrated software FSM-Explorer are designed based on the T-thinker paradigm.
“Our long-term goal on graph-parallel system research is to integrate various parallel graph processing and mining operations on top of a real industrial graph platform such as PuppyGraph to allow convenient use, and to provide user-friendly GUI for these operations to broaden its user community,” Yan said.
The bottom-line goal -- integrate the developed graph querying and mining tools into an industrial software such as PuppyGraph to solve complex problems with graphs in real applications.
The International Conference on Data Engineering, one of the premier conferences in data and information engineering, annually draws more than 500 researchers and developers in academia, industry and government from around the world. Focused on research in designing, building, managing and evaluating advanced data intensive systems and applications, it allows researchers and developers to explore cutting-edge ideas and discuss tools, techniques and experiences.