Unsupervised Learning of Graph Matching with Mixture of Modes via Discrepancy Minimization

Runzhong Wang, Shanghai Jiao Tong University, China
Junchi Yan (corresponding author), Shanghai Jiao Tong University, China
Xiaokang Yang, Shanghai Jiao Tong University, China

Paper: [link]
Preliminary conference version (in NeurIPS 2020): [link]
Code and pre-trained models: [github]

We propose a unified unsupervised learning framework for comprehensively three graph matching variants:

Two Graph Matching (2GM): Matching two graphs from the same category;
Multi-Graph Matching (MGM): Joint matching of more than two graphs from the same category;
Multi-Graph Matching with a Mixture of Modes (MGM³): Joint matching of more than two graphs from different categories.

To our best knowledge, these three problems cover all existing graph matching variants.

Specifically, this paper considers the settings of partial & outlier matching and matching of a mixture of modes:

Figure 1. Comparison of the four GM settings over multiple graphs (same color denotes correspondence): (a) the classical full matching where all node pairs have a valid matching, (b) partial matching where some inlier nodes are occluded and all nodes partially correspond to a universe (here the universe has 4 nodes), (c) outlier matching where the noisy outliers do not correspond to any nodes in the universe, (d) the mix of both partial and outlier matching, which is the most challenge yet the most realistic setting. Most existing papers only consider (a), some recent efforts also consider (b), and in this paper we propose a robust approach considering (a)(b)(d). (c) is less common because outliers usually appear together with partial matching.

Figure 2. Example of matching with a mixture of two modes. Within mode 1 or mode 2 (intra-mode) there is full-matching, and between mode 1 and mode 2 (inter-mode) the matching is non-valid.

Our proposed unsupervised learning framework is a discrepancy-minimization framework: The learning target is to minimize the discrepancy between a Sinkhorn branch and a GM solver branch.
The Sinkhorn branch is differentiable, and the GM solver branch naturally truncates the gradient, prohibiting the Siamese framework from degenerating to trivial solutions.
Here the GM solver in white box can be any of solvers for matching either of two-graph, multi-graph, or graphs from a mixture of modes. CNN features are learned in this unsupervised learning framework.

Figure 3. The red modules support gradient back-propagation, and the white modules do not.

To fulfill the unified framework for all graph matching settings, we resort to the classic graduated assignment (GA) algorithm as the embodiment of GM solver. Specifically, we combine the classic GA algorithm with modern deep learning models, and we propose three unsupervised graduated assignment models:

GANN-2GM for two graph matching (2GM);
GANN-MGM for multi-graph matching (MGM);
GANN-MGM³ for multi-graph matching with a mixture of modes (MGM³).

A brief summary of results:
The performance of unsupervised models are comparative or even better than state-of-the-art supervised graph matching models.

Experiment Results on Willow Object Class dataset (2GM and MGM): unsupervised models are as strong as supervised models.

method	learning	car	duck	face	mbike	wbottle	mean
Learning-free Solvers
MatchLift (Chen et al. ICML 2014)	free	0.665	0.554	0.931	0.296	0.700	0.629
MatchALS (Zhou et al. ICCV 2015)	free	0.629	0.525	0.934	0.310	0.669	0.613
MSIM (Wang et al. CVPR 2018)	free	0.750	0.732	0.937	0.653	0.814	0.777
MGM-Floyd (Jiang et al. PAMI 2020)	free	0.850	0.793	1.000	0.843	0.931	0.883
HiPPI (Bernard et al. ICCV 2019)	free	0.740	0.880	1.000	0.840	0.950	0.882
VGG16+Sinkhorn	free	0.776±0.238	0.732±0.253	0.990±0.044	0.525±0.205	0.824±0.190	0.769
GA-GM (ours)	free	0.814±0.300	0.885±0.197	1.000±0.000	0.809±0.243	0.954±0.113	0.892
GA-MGM (ours)	free	0.746±0.153	0.900±0.106	0.997±0.021	0.892±0.139	0.937±0.072	0.894
Supervised Methods
HARG-SSVM (Cho et al. ICCV 2013)	supervised	0.584	0.552	0.912	0.444	0.666	0.632
GMN (Zanfir et al. CVPR 2018)	supervised	0.743	0.828	0.993	0.714	0.767	0.809
PCA-GM (Wang et al. ICCV 2019)	supervised	0.840	0.935	1.000	0.767	0.969	0.902
DGMC (Fey et al. ICLR 2020)	supervised	0.903	0.890	1.000	0.921	0.971	0.937
CIE-H (Yu et al. ICLR 2020)	supervised	0.822	0.812	1.000	0.900	0.976	0.902
BBGM (Rolinek et al. ECCV 2020)	supervised	0.968	0.899	1.000	0.998	0.994	0.972
NMGM (Wang et al. PAMI 2021)	supervised	0.973	0.854	1.000	0.860	0.977	0.933
Unsupervised Methods
GANN (with HiPPI)	unsupervised	0.845±0.313	0.868±0.1573	0.992±0.010	0.911±0.172	0.954±0.090	0.914
GANN-GM (ours)	unsupervised	0.854±0.261	0.898±0.226	1.000±0.000	0.886±0.186	0.964±0.104	0.920
GANN-MGM~(ours)	unsupervised	0.964±0.058	0.949±0.057	1.000±0.000	1.000±0.000	0.978±0.035	0.978

Experiment Results on Pascal VOC Keypoint dataset (2GM and MGM) (unfiltered setting): unsupervised pretraining + supervised finetuning is better than train from scratch.

model	learning	backbone	#params	aero	bike	bird	boat	bottle	bus	car	cat	chair	cow	table	dog	horse	mbkie	person	plant	sheep	sofa	train	tv	mean
GMN (Zanfir et al. CVPR 2018)	sup.	VGG16	12.9M	28.0	55.0	33.1	27.0	79.1	52.2	26.0	40.2	28.4	36.0	29.8	33.7	39.4	43.0	22.1	71.8	30.8	25.9	58.8	78.0	41.9
PCA-GM (Wang et al. ICCV 2019)	sup.	VGG16	41.7M	27.5	56.5	36.6	27.7	77.8	49.2	23.9	42.3	27.4	38.2	38.7	36.5	39.3	42.8	25.6	74.3	32.6	24.7	51.5	74.3	42.4
GANN-GM (ours)	unsup.	VGG16	12.4M	12.6	19.5	16.6	18.5	41.1	32.4	19.3	12.3	24.3	17.2	38.0	12.2	15.9	18.2	19.4	35.5	14.8	15.4	41.5	60.8	24.3
GANN-MGM (ours)	unsup.	VGG16	12.4M	18.1	33.4	20.2	28.2	71.7	33.9	22.0	24.7	23.5	22.4	50.2	19.6	20.9	27.7	19.5	74.5	19.3	26.5	39.8	72.7	33.4
GANN-MGM (ours)	unsup.	ResNet34	21.3M	16.4	37.4	23.2	29.5	80.8	33.6	24.1	17.8	26.9	22.6	53.4	16.9	21.8	30.1	23.3	86.0	19.7	31.7	39.0	71.5	35.3
NGMv2 (Wang et al. PAMI 2021)	sup.	VGG16	71.4M	45.9	66.6	57.2	47.3	87.4	64.8	50.5	59.9	39.7	60.9	42.1	58.3	58.5	61.9	44.6	94.5	50.1	35.2	73.1	82.1	59.0
NGMv2 (Wang et al. PAMI 2021)	sup.	ResNet34	77.9M	45.1	65.5	52.7	44.0	87.3	69.4	56.1	62.2	45.7	63.6	61.9	59.6	59.2	67.8	54.4	96.9	57.0	45.9	74.3	83.6	62.6
BBGM (Rolinek et al. ECCV 2020)	sup.	VGG16	71.4M	41.3	65.7	55.0	43.4	86.8	61.1	35.5	59.0	40.2	60.0	29.7	57.1	57.5	65.9	37.7	95.8	52.6	30.3	74.4	84.1	56.7
BBGM (Rolinek et al. ECCV 2020)	sup.	ResNet34	77.9M	38.1	69.1	54.2	45.0	87.0	74.7	43.3	62.3	48.3	63.7	63.8	60.9	60.4	65.4	50.2	97.1	56.2	45.9	78.4	82.2	62.3
GANN-MGM (ours)+ NGMv2 (Wang et al. PAMI 2021)	unsup.	VGG16	71.4M	47.0	69.4	53.7	46.3	85.7	67.6	59.0	60.2	45.9	61.0	29.9	57.9	59.5	63.2	47.4	92.2	51.5	39.9	71.6	78.3	59.4 (+0.4)
GANN-MGM (ours)+ NGMv2 (Wang et al. PAMI 2021)	+sup.	ResNet34	77.9M	46.5	66.2	56.5	46.5	85.9	73.8	57.4	61.4	47.3	65.7	63.9	59.4	60.1	70.6	54.7	94.3	57.0	51.8	74.9	82.4	63.8 (+1.2)
GANN-MGM (ours) +BBGM (Rolinek et al. ECCV 2020)	unsup.	VGG16	71.4M	43.0	69.2	55.3	46.3	85.4	66.7	53.3	61.4	46.7	64.0	33.7	61.7	60.7	64.0	43.9	94.0	54.9	52.3	78.0	80.0	60.7 (+4.0)
GANN-MGM (ours) +BBGM (Rolinek et al. ECCV 2020)	+sup.	ResNet34	77.9M	39.6	68.9	56.2	46.7	87.4	71.9	44.5	62.0	47.9	65.8	50.8	62.6	61.6	62.9	49.4	96.9	59.6	49.7	80.6	85.3	62.5 (+0.2)

Experiment Results on CUB2011 dataset (2GM and MGM): unsupervised models comparative to supervised models.

Figure 4. Mean and STD of precision, recall, and f1-score of graph matching with a single mode on CUB2011 dataset. MGM statics are computed from all graph pairs in each category, and two-graph matching statics are computed from 1000 random graph pairs. Improvement can be seen from our learning-free methods to our unsupervised methods, which are comparative with novel supervised learning methods on the testing set.

Experiment Results on Willow Object Class dataset (MGM³): unsupervised model better than learning-free algorithms.

method	learning	8 Cars, 8 Ducks, 8 Motorbikes					40 Cars, 50 Ducks, 40 Motorbikes
method	learning	CP	RI	CA	MA	time (s)	CP	RI	CA	MA	time (s)
RRWM (Cho et al. ECCV 2010)	free	0.879	0.871	0.815	0.748	0.4	0.962	0.949	0.926	0.751	8.8
CAO-C (Yan et al. PAMI 2016)	free	0.908	0.903	0.86	0.878	3.3	0.971	0.96	0.956	0.906	1051.5
CAO-PC (Yan et al. PAMI 2016)	free	0.887	0.883	0.831	0.87	1.8	0.971	0.96	0.956	0.886	184.0
DPMC (Wang et al. AAAI 2020)	free	0.931	0.923	0.89	0.872	1.2	0.969	0.959	0.948	0.941	97.5
GA-MGM³(ours)	free	0.921	0.905	0.893	0.653	10.6	0.89	0.871	0.85	0.669	107.8
GANN-MGM³(ours)	unsupervised	0.976	0.970	0.963	0.896	5.2	0.974	0.968	0.956	0.906	80.7

Evaluation metrics: CP = Clustering Purity, RI = Rand Index, CA = Clustering Accuracy, MA = Matching Accuracy.

Experiment Results on CUB2011 and Pascal VOC Keypoint dataset (MGM³)

Figure 5&6. Experiment result on Pascal VOC Keypoint dataset for initializing VGG16 weights of NGMv2 and BBGM by the proposed unsupervised learning model GANN-MGM³. Without modifying any model architectures, the unsupervised pretraining by GANN-MGM³ leads to faster convergence and better performance compared to the versions initialized by ImageNet classification weights.

Abstract