
From GANs to Diffusion: GDA for Perception Tasks
19 Nov 2025
Highlights its use in perception (segmentation, detection) but notes the limited exploration of filtering and effective utilization.

Enhancing Long-Tailed Segmentation with Gradient Cache and BSGAL
19 Nov 2025
Proposes BSGAL, a Generative Active Learning algorithm that uses gradient cache to filter unlimited synthetic data for long-tailed instance segmentation.

Cross-Model Validation: MIVPG's Efficacy on Encoder-Decoder vs. Decoder-Only LLMs
19 Nov 2025
MIVPG's CSA module remains effective when switching from FLAN-T5-XL to the OPT-2.7b LLM architecture.

Theoretical Proof: CSA Module Maintains MIL Properties
19 Nov 2025
Provides the theoretical proof for Proposition 2, establishing that the Correlated Self-Attention (CSA) module in MIVPG maintains permutation equivalence.

Visual Prompt Generation: Cross-Attention in Q-Former
19 Nov 2025
Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings.

Future MLLMs: Contribution of MIL-Based Techniques and Enriched Visual Signals
18 Nov 2025
This paper concludes that MIVPG is a general, powerful component for fusing enriched visual representations in MLLMs.

MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning
18 Nov 2025
MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.

Gigapixel Pathology: MIVPG Outperforms Baselines in Medical Captioning
18 Nov 2025
MIVPG significantly outperforms baselines by using instance correlation and shows strong domain adaptation over epochs.

Data Scarcity and MLLMs: Using MIL to Uncover Latent Patterns in Single-Image Tasks
18 Nov 2025
Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.