
Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs
15 Nov 2025
Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MIVPG for efficiency.

MIVPG and Instance Correlation: Enhanced Multi-Instance Learning
15 Nov 2025
MIVPG uses a Correlated Self-Attention (CSA) module to unveil instance correlation, fulfilling all MIL properties while outperforming Q-Former.

Multimodal Fusion: MIVPG's Hierarchical MIL Approach for Multi-Image Samples
15 Nov 2025
Details MIVPG's hierarchical approach to MIL for multi-image samples. It treats both image patches and whole images as 'instances' for feature aggregation

MIL Perspective: Analyzing Q-Former as a Multi-Head Mechanism
14 Nov 2025
Proves Q-Former is a Multi-Head MIL module due to permutation invariance in its cross-attention.

Visual Prompt Generators (VPGs): Encoding Images to LLM Tokens
14 Nov 2025
Explains how MLLMs use VPGs and cross-attention with learnable query embeddings to extract essential visual tokens from image patches for LLM input.

Multiple Instance Learning: Review of Instance and Embedding Level Approaches
13 Nov 2025
Reviews Multiple Instance Learning, contrasting instance-level and embedding-level approaches, while focusing on neural network pooling methods.

MLLM Adapters: Review of VPGs and Multimodal Fusion
12 Nov 2025
Reviews state-of-the-art MLLMs. Highlights the challenge of expanding current models beyond the simple one-to-one image text relationship.

Dusted Input Images: Visualizing Decision Boundary Distillation
12 Nov 2025
This article explains and visualizes the use of "dusted input images"—inputs perturbed with strong Gaussian noise—to distill the model's decision boundary

Network Size and Task Number: Ablation Study on IIL Performance and Stability
12 Nov 2025
This article presents an ablation study showing that the proposed IIL method performs well with larger networks