Published at MedInfo 2025 (Studies in Health Technology and Informatics)
Pronob Kumar Barman, Tera L. Reynolds, James Foulds
University of Maryland, Baltimore County
The Problem
Patients increasingly rely on online health forums to find peer support, share experiences, and seek advice throughout their healthcare journeys. However, these large-scale platforms often fall short — inconsistent engagement, overwhelming volume, and a lack of personalization mean that many patients never find the meaningful connections that research has shown can significantly improve health outcomes. Manually forming cohesive support groups is labor-intensive, doesn’t scale, and struggles to account for the diverse demographics, health concerns, and interaction patterns across a patient community.
Approach and Methodology
We developed the Group-specific Dirichlet Multinomial Regression (gDMR) model, a probabilistic generative framework that automates personalized support group formation by jointly modeling three complementary signals: user-generated content (posts and questions), demographic features (age, gender, country, membership history), and interaction patterns within the community.
The core methodological innovation is integrating node embeddings into the topic modeling pipeline. We construct a directed interaction graph where users are nodes and edges represent replies or comments, weighted by frequency. Using Node2Vec, we generate 64-dimensional embeddings that capture both local and global network structures — meaning users who frequently interact around shared health concerns are more likely to be grouped together. These embeddings are incorporated alongside demographics as regression features in the gDMR generative process, which uses collapsed Gibbs sampling combined with BFGS optimization to learn group-specific word distributions and membership coefficients.
The learned group membership distributions are then refined through constrained K-Means clustering with thematic similarity (via TF-IDF, weighted at 70%) and feature similarity (embeddings and demographics, weighted at 30%), producing final groups that are balanced in size (10–22 members), country-specific for cultural relevance, and thematically coherent.
Key Findings and Contributions
Experiments on a large-scale dataset from MedHelp — comprising over 2 million questions and 8 million answers — demonstrated that gDMR with node embeddings significantly outperforms standard baselines across all evaluation metrics. The model achieved 52.2% lower perplexity compared to DMR and 38.5% lower than gDMR without embeddings, indicating substantially better generalization. It also achieved the best topic coherence score (−2.401 vs. −4.488 for LDA and −4.789 for DMR), producing more interpretable and semantically consistent group themes. The highest held-out log-likelihood (−403.350) further confirmed superior predictive accuracy. Within-group cosine similarities were notably higher than random baselines, validating that the formed groups are genuinely semantically cohesive.
The resulting support groups are both thematically focused and demographically grounded — for example, a digestive health group for users in Russia, a cardiovascular health group for users in Turkey, and a liver health group for users in Southeast Asia.
Real-World Applications and Impact
This work bridges explainable AI and healthcare, two domains where interpretability and trust are essential. Unlike black-box clustering, gDMR produces transparent group assignments grounded in identifiable topics and demographic coefficients, allowing clinicians and platform designers to understand why a patient was placed in a particular group.
Beyond online forums, the framework has potential to integrate with Electronic Health Record (EHR) systems, enabling healthcare providers to automate patient support group formation — whether in-person, online, or hybrid — based on clinical notes, demographics, and care interaction histories. This could meaningfully enhance patient-centered care, particularly for chronic disease management where sustained peer support has demonstrated clinical benefits.
Future Directions
Our ongoing research extends this work along several fronts. We are conducting human-centered user studies to understand how patients perceive and engage with AI-formed support groups, ensuring the technology aligns with actual user needs and preferences. We are also developing neural collaborative filtering approaches to address the cold-start problem — recommending appropriate groups for new users who lack interaction history. Critically, we are incorporating fairness-aware learning techniques and continuous demographic monitoring to mitigate risks of algorithmic bias, as group assignments based on demographics and behavior can inadvertently marginalize underrepresented populations. The long-term goal is a complete, human-centered pipeline from algorithmic group formation to sustained, equitable patient engagement.
About the Author: Pronob Kumar Barman is a PhD Candidate in Information Systems at the University of Maryland, Baltimore County. His research focuses on human-centered AI for online health communities, spanning probabilistic generative modeling, fairness-aware algorithms, and neural collaborative filtering for healthcare applications.
Pronob Kumar Barman Google Scholar
If you would like to publish your articles, insights, or stories related to Tech and AI on Ninth Post, feel free to contact us.
You can email us at rohini_ninthpost@outlook.com
Or
Connect with us on LinkedIn: https://www.linkedin.com/in/rohini-tumpala/
We look forward to hearing from you!
