
Future MLLMs: Contribution of MIL-Based Techniques and Enriched Visual Signals
18 Nov 2025
This paper concludes that MIVPG is a general, powerful component for fusing enriched visual representations in MLLMs.

MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning
18 Nov 2025
MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.

Gigapixel Pathology: MIVPG Outperforms Baselines in Medical Captioning
18 Nov 2025
MIVPG significantly outperforms baselines by using instance correlation and shows strong domain adaptation over epochs.

Data Scarcity and MLLMs: Using MIL to Uncover Latent Patterns in Single-Image Tasks
18 Nov 2025
Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.

IGQ-ViT: Instance-Aware Group Quantization for Low-Bit Vision Transformers
17 Nov 2025
IGQ-ViT speeds up Vision Transformers with dynamic channel grouping, low-bit precision, and minimal latency overhead across real hardware.

Why Dynamic Grouping Beats Traditional Quantizers for Vision Transformers
17 Nov 2025
Instance-aware group quantization (IGQ-ViT) improves ViT accuracy by dynamically grouping channels and tokens to handle scale variation efficiently.

Instance-Aware Grouped Quantization (IGQ-ViT) Sets New Benchmarks for ViT PTQ
17 Nov 2025
IGQ-ViT delivers state-of-the-art low-bit quantization for ViTs, achieving strong accuracy on ImageNet and COCO with smarter group allocation.

Why Uniform Quantizers Break ViTs
17 Nov 2025
Adaptive quantization for ViTs: IGQ-ViT improves accuracy by handling channel and token scale variations with smarter grouping.

What Makes Vision Transformers Hard to Quantize?
17 Nov 2025
A breakdown of modern neural network & transformer quantization, from QAT and PTQ to new dynamic group methods that reduce model size without big accuracy loss.