Efficient Instance Matting: Analyzing Speed, Memory, and Accuracy in MaGGIe

cover
19 Dec 2025

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

5.1. Pre-training on image data

Quantitative results. We evaluated our model against previous baselines after retraining them on our I-HIM50K dataset. Besides original works, we modified SparseMat’s

Table 5. Comparative Performance on HIM2K+M-HIM2K. Our method outperforms baselines, with average results (large numbers) and standard deviations (small numbers) on the benchmark. The upper group represents methods predicting each instance separately, while the lower models utilize instance information. Gray rows denote public weights trained on external data, not retrained on I-HIM50K. MGM† denotes the MGM-in-the-wild. MGM⋆ refers to MGM with all masks stacked with the input image. Models are tested on images with a short side of 576px. Bold and underline highlight the best and second-best models per metric, respectively.

first layer to accept a single mask input. Additionally, we expanded MGM to handle up to 10 instances, denoted as MGM⋆ . We also include the public weights of InstMatt [49] and MGM-in-the-wild [39]. The performance with different masks M-HIM2K are reported in Table 5. The public InstMatt showed the best performance, but this comparison may not be entirely fair as it was trained on private external data. Our model demonstrated comparable results on composite and natural sets, achieving the lowest error in most metrics. MGM⋆ also performed well, suggesting that processing multiple masks simultaneously can facilitate instance interaction, although this approach slightly impacted the Grad metric, which reflects the output’s detail.

We also measure the memory and speed of models on M-HIM2K natural set in Fig. 4. While InstMatt, MGM, and SparseMat have the inference time increasing linearly to the number of instances, MGM⋆ and ours keep steady performance in both memory and speed.

Qualitative results. MaGGIe’s ability to capture fine details and effectively separate instances is showcased in Fig. 5. At the exact resolution, our model not only achieves highly detailed outcomes comparable to running MGM separately for each instance but also surpasses both

Figure 4. Our model keeps steady memory and time complexity when the number of instance increases. InstMatt’s complexity increases linearly with the number of instances.

the public and retrained versions of InstMatt. A key strength of our approach is its proficiency in distinguishing between different instances. This is particularly evident when compared to MGM, where we observed overlapping instances, and MGM⋆ , which has noise issues caused by processing multiple masks simultaneously. Our model’s refined instance separation capabilities highlight its effectiveness in handling complex matting scenarios.

Authors:

(1) Chuong Huynh, University of Maryland, College Park ([email protected]);

(2) Seoung Wug Oh, Adobe Research (seoh,[email protected]);

(3) Abhinav Shrivastava, University of Maryland, College Park ([email protected]);

(4) Joon-Young Lee, Adobe Research ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.