cover

From GANs to Diffusion: GDA for Perception Tasks

19 Nov 2025

Highlights its use in perception (segmentation, detection) but notes the limited exploration of filtering and effective utilization.

cover

Enhancing Long-Tailed Segmentation with Gradient Cache and BSGAL

19 Nov 2025

Proposes BSGAL, a Generative Active Learning algorithm that uses gradient cache to filter unlimited synthetic data for long-tailed instance segmentation.

cover

Cross-Model Validation: MIVPG's Efficacy on Encoder-Decoder vs. Decoder-Only LLMs

19 Nov 2025

MIVPG's CSA module remains effective when switching from FLAN-T5-XL to the OPT-2.7b LLM architecture.

cover

Theoretical Proof: CSA Module Maintains MIL Properties

19 Nov 2025

Provides the theoretical proof for Proposition 2, establishing that the Correlated Self-Attention (CSA) module in MIVPG maintains permutation equivalence.

cover

Visual Prompt Generation: Cross-Attention in Q-Former

19 Nov 2025

Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings.

cover

Future MLLMs: Contribution of MIL-Based Techniques and Enriched Visual Signals

18 Nov 2025

This paper concludes that MIVPG is a general, powerful component for fusing enriched visual representations in MLLMs.

cover

MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning

18 Nov 2025

MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.

cover

Gigapixel Pathology: MIVPG Outperforms Baselines in Medical Captioning

18 Nov 2025

MIVPG significantly outperforms baselines by using instance correlation and shows strong domain adaptation over epochs.

cover

Data Scarcity and MLLMs: Using MIL to Uncover Latent Patterns in Single-Image Tasks

18 Nov 2025

Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.