Image and Video Matting Benchmarks: Performance Analysis of MaGGIe

cover
19 Dec 2025

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

5. Experiments

We developed our model using PyTorch [20] and the Sparse convolution library Spconv [10]. Our codebase is built upon the publicly available implementations of MGM [56] and

Table 2. Details of Video Instance Matting Training and Testing Sets. V-HIM2K5 for training and V-HIM60 for model evaluation. Each video contains 30 frames.

Table 3. Superiority of Mask Embedding Over Stacking in HIM2K+M-HIM2K. Our mask embedding technique demonstrates enhanced performance compared to traditional stacking methods.

OTVM [45]. In the first Sec. 5.1, we discuss the results when pre-training on the image matting dataset. The performance on the video dataset is shown in the Sec. 5.2. All training settings are reported in the supplementary material.

Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.