The-Anh Vu-Le (Nah) (original) (raw)

Greetings.

I am a third-year Ph.D. student at the University of Illinois at Urbana-Champaign (UIUC).

I recently found myself thrown into the world of network science, particularly in modeling and community detection in complex networks. I am blessed to be advised by Professor Tandy Warnow, who is both a great mentor and scientist.

Research My main research focus is network science. Previously, I studied the theoretical aspect of machine learning and related fields, especially deep learning, generative models, reinforcement learning, and many other cool things.

Past study I received my Bachelor’s degree from the Honors Program of the Department of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City (VNU-HCMUS) in Nov 2020. In 2021, I took graduate-level courses from a Master’s program at the John Von Neumann Institute, Vietnam National University, Ho Chi Minh City (JVN) but did not finish the degree.

In my undergraduate years, I was a research assistant at the SELab, advised by Professor Minh-Triet Tran. After graduation and before joining UIUC, I was an intern/resident at VinAI Research, advised by Professor Minh-Hoai Nguyen, Professor Tung Pham, Professor Viet-Anh Nguyen, and Professor Nhat Ho. All of the people mentioned are those to whom I owe a debt of gratitude.

news

Feb 05, 2025	Submitted 3 manuscripts to different journals! Fingers crossed! 🤞
Oct 04, 2024	2 papers accepted to CNA 2024!
Jul 27, 2023	1 paper accepted to ACM MM 2023!
Dec 15, 2022	The start of a new blog!

latest posts

selected publications

EC-SBM
EC-SBM Synthetic Network Generator
In submission to Applied Network Science, 2025
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.
RECCS
RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation
Lahari Anne, The-Anh Vu-Le, Minhyuk Park, and 2 more authors
In submission to Advances in Complex Systems, 2025
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.
SBM+WCC
Improved Community Detection using Stochastic Block Models
Minhyuk Park, Daniel Wang Feng, Siya Digra, and 4 more authors
In submission to PLOS Complex Systems, 2025
Identifying edge-dense communities that are also well-connected is an important aspect of understanding community structure. Prior work has shown that community detection methods can produce poorly connected communities, and some can even produce internally disconnected communities. In this study we evaluate the connectivity of communities obtained using Stochastic Block Models. We find that SBMs produce internally disconnected communities from real-world networks. We present a simple technique, Well-Connected Clusters (WCC), which repeatedly removes small edge cuts until the communities meet a user-specified threshold for well-connectivity. Our study using a large collection of synthetic networks based on clustered real-world networks shows that using WCC as a post-processing tool with SBM community detection typically improves clustering accuracy. WCC is fast enough to use on networks with millions of nodes and is freely available in open source form.
GroundedBERT
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment
The-Anh Vu-Le*, Cong-Duy Nguyen*, Thong Nguyen, and 2 more authors
In Proceedings of the 31st ACM International Conference on Multimedia, 2023
Language models have been supervised with both language-only objective and visual grounding in existing studies of visual-grounded language learning. However, due to differences in the distribution and scale of visual-grounded datasets and language corpora, the language model tends to mix up the context of the tokens that occurred in the grounded data with those that do not. As a result, during representation learning, there is a mismatch between the visual information and the contextual meaning of the sentence. To overcome this limitation, we propose GroundedBERT - a grounded language learning method that enhances the BERT representation with visually grounded information. GroundedBERT comprises two components: (i) the original BERT which captures the contextual representation of words learned from the language corpora, and (ii) a visual grounding module which captures visual information learned from visual-grounded datasets. Moreover, we employ Optimal Transport (OT), specifically its partial variant, to solve the fractional alignment problem between the two modalities. Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
m-POT
Improving Mini-batch Optimal Transport via Partial Transportation
Khai Nguyen*, Dang Nguyen*, The-Anh Vu-Le, and 2 more authors
In Proceedings of the 39th International Conference on Machine Learning, 2022
Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.