Interactive Steering of Hierarchical Clustering

Weikai Yang1   Xiting Wang2   Jie Lu1   Wenwen Dou3   Shixia Liu1

1Tsinghua University       2Microsoft Research Asia       3University of North Carolina at Charlotte

Teaser Image
Teaser Image

Reweightor: (a) The reweighting relationships between 3 (out of 14) validation sample clusters and 6 (out of 35) training sample clusters. V1 and V2 contain low-quality validation samples, resulting in many inconsistent training samples in S1 and S2. (b) After correcting the noisy labels of low-quality validation samples, increasing the weights of high-quality validation samples, and verifying inconsistent training samples, the reweighting results are improved (S′′ 1 and S′ 2).


Label quality issues, such as noisy labels and imbalanced class distributions, have negative effects on model performance. Automatic reweighting methods identify problematic samples with label quality issues by recognizing their negative effects on validation samples and assigning lower weights to them. However, these methods fail to achieve satisfactory performance when the validation samples are of low quality. To tackle this, we develop Reweighter, a visual analysis tool for sample reweighting. The reweighting relationships between validation samples and training samples are modeled as a bipartite graph. Based on this graph, a validation sample improvement method is developed to improve the quality of validation samples. Since the automatic improvement may not always be perfect, a co-cluster-based bipartite graph visualization is developed to illustrate the reweighting relationships and support the interactive adjustments to validation samples and reweighting results. The adjustments are converted into the constraints of the validation sample improvement method to further improve validation samples. We demonstrate the effectiveness of Reweighter in improving reweighting results through quantitative evaluation and two case studies.

arxiv pdf supp video bib code