cover

Future MLLMs: Contribution of MIL-Based Techniques and Enriched Visual Signals

18 Nov 2025

This paper concludes that MIVPG is a general, powerful component for fusing enriched visual representations in MLLMs.

cover

MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning

18 Nov 2025

MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.

cover

Gigapixel Pathology: MIVPG Outperforms Baselines in Medical Captioning

18 Nov 2025

MIVPG significantly outperforms baselines by using instance correlation and shows strong domain adaptation over epochs.

cover

Data Scarcity and MLLMs: Using MIL to Uncover Latent Patterns in Single-Image Tasks

18 Nov 2025

Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.

cover

IGQ-ViT: Instance-Aware Group Quantization for Low-Bit Vision Transformers

17 Nov 2025

IGQ-ViT speeds up Vision Transformers with dynamic channel grouping, low-bit precision, and minimal latency overhead across real hardware.

cover

Why Dynamic Grouping Beats Traditional Quantizers for Vision Transformers

17 Nov 2025

Instance-aware group quantization (IGQ-ViT) improves ViT accuracy by dynamically grouping channels and tokens to handle scale variation efficiently.

cover

Instance-Aware Grouped Quantization (IGQ-ViT) Sets New Benchmarks for ViT PTQ

17 Nov 2025

IGQ-ViT delivers state-of-the-art low-bit quantization for ViTs, achieving strong accuracy on ImageNet and COCO with smarter group allocation.

cover

Why Uniform Quantizers Break ViTs

17 Nov 2025

Adaptive quantization for ViTs: IGQ-ViT improves accuracy by handling channel and token scale variations with smarter grouping.

cover

What Makes Vision Transformers Hard to Quantize?

17 Nov 2025

A breakdown of modern neural network & transformer quantization, from QAT and PTQ to new dynamic group methods that reduce model size without big accuracy loss.