Facilitating Online Healthcare Support Group Formation Using Topic Modeling

Published at MedInfo 2025 (Studies in Health Technology and Informatics)
Pronob Kumar Barman, Tera L. Reynolds, James Foulds
University of Maryland, Baltimore County

The Problem

Patients increasingly rely on online health forums to find peer support, share experiences, and seek advice throughout their healthcare journeys. However, these large-scale platforms often fall short — inconsistent engagement, overwhelming volume, and a lack of personalization mean that many patients never find the meaningful connections that research has shown can significantly improve health outcomes. Manually forming cohesive support groups is labor-intensive, doesn’t scale, and struggles to account for the diverse demographics, health concerns, and interaction patterns across a patient community.

Approach and Methodology

We developed the Group-specific Dirichlet Multinomial Regression (gDMR) model, a probabilistic generative framework that automates personalized support group formation by jointly modeling three complementary signals: user-generated content (posts and questions), demographic features (age, gender, country, membership history), and interaction patterns within the community.

The core methodological innovation is integrating node embeddings into the topic modeling pipeline. We construct a directed interaction graph where users are nodes and edges represent replies or comments, weighted by frequency. Using Node2Vec, we generate 64-dimensional embeddings that capture both local and global network structures — meaning users who frequently interact around shared health concerns are more likely to be grouped together. These embeddings are incorporated alongside demographics as regression features in the gDMR generative process, which uses collapsed Gibbs sampling combined with BFGS optimization to learn group-specific word distributions and membership coefficients.

The learned group membership distributions are then refined through constrained K-Means clustering with thematic similarity (via TF-IDF, weighted at 70%) and feature similarity (embeddings and demographics, weighted at 30%), producing final groups that are balanced in size (10–22 members), country-specific for cultural relevance, and thematically coherent.

Key Findings and Contributions

Experiments on a large-scale dataset from MedHelp — comprising over 2 million questions and 8 million answers — demonstrated that gDMR with node embeddings significantly outperforms standard baselines across all evaluation metrics. The model achieved 52.2% lower perplexity compared to DMR and 38.5% lower than gDMR without embeddings, indicating substantially better generalization. It also achieved the best topic coherence score (−2.401 vs. −4.488 for LDA and −4.789 for DMR), producing more interpretable and semantically consistent group themes. The highest held-out log-likelihood (−403.350) further confirmed superior predictive accuracy. Within-group cosine similarities were notably higher than random baselines, validating that the formed groups are genuinely semantically cohesive.

The resulting support groups are both thematically focused and demographically grounded — for example, a digestive health group for users in Russia, a cardiovascular health group for users in Turkey, and a liver health group for users in Southeast Asia.

Real-World Applications and Impact

This work bridges explainable AI and healthcare, two domains where interpretability and trust are essential. Unlike black-box clustering, gDMR produces transparent group assignments grounded in identifiable topics and demographic coefficients, allowing clinicians and platform designers to understand why a patient was placed in a particular group.

Beyond online forums, the framework has potential to integrate with Electronic Health Record (EHR) systems, enabling healthcare providers to automate patient support group formation — whether in-person, online, or hybrid — based on clinical notes, demographics, and care interaction histories. This could meaningfully enhance patient-centered care, particularly for chronic disease management where sustained peer support has demonstrated clinical benefits.

Future Directions

Our ongoing research extends this work along several fronts. We are conducting human-centered user studies to understand how patients perceive and engage with AI-formed support groups, ensuring the technology aligns with actual user needs and preferences. We are also developing neural collaborative filtering approaches to address the cold-start problem — recommending appropriate groups for new users who lack interaction history. Critically, we are incorporating fairness-aware learning techniques and continuous demographic monitoring to mitigate risks of algorithmic bias, as group assignments based on demographics and behavior can inadvertently marginalize underrepresented populations. The long-term goal is a complete, human-centered pipeline from algorithmic group formation to sustained, equitable patient engagement.

About the Author: Pronob Kumar Barman is a PhD Candidate in Information Systems at the University of Maryland, Baltimore County. His research focuses on human-centered AI for online health communities, spanning probabilistic generative modeling, fairness-aware algorithms, and neural collaborative filtering for healthcare applications.

Reference: Barman, P. K., Reynolds, T. L., & Foulds, J. (2025). Facilitating Online Healthcare Support Group Formation Using Topic Modeling. Studies in Health Technology and Informatics, 329, 1049-1053.

Pronob Kumar Barman Google Scholar

Pronob Kumar Barman LinkedIn

If you would like to publish your articles, insights, or stories related to Tech and AI on Ninth Post, feel free to contact us.

You can email us at rohini_ninthpost@outlook.com
Or
Connect with us on LinkedIn: https://www.linkedin.com/in/rohini-tumpala/

We look forward to hearing from you!

From SaaS to AaaS: Navigating the Legal and Ethical Shift to Agents-as-a-Service

Bypassing Traditional MFA: Why We Are Moving Ninth Post to Passkey-First Infrastructure

The Sovereignty Tax: Analyzing the True Cost of Moving to European-Native AI Clusters

The Sovereign Cloud Shift: Why European Firms are Abandoning US-Based AI Infrastructure

Biometric Security in the Age of Deepfakes: Testing Heart-Rate ID and Vein Pattern Authentication

Micro-SaaS is Dead, Long Live Micro-Agents: The New Unit Economics of Software

The 2026 Audit: How AI Agents are Revolutionizing Corporate Compliance and Risk Management

The 2nm Breakthrough: What the Latest Chip Architecture Means for Local AI Inference