Speeding Up Graph Similarity Matching with Efficient Tensor Ops
Issue
The graph similarity algorithm for matching image-text graph pairs was too slow, particularly in the pairwise comparison step.
Each image-text pair requires computing a similarity matrix between the merged nodes of the image graph and the text graph. When extended to batch inference over many candidate graphs, the computational cost scales poorly: for every image graph, we must compute the similarity between all node pairs in the corresponding text graphs.
The initial implementation used nested for-loops or inefficient matrix operations, creating a bottleneck that made the system unusable for large-scale retrieval.
Solution
Replaced the nested similarity computation with a batched and vectorized tensor operation using torch.einsum
.
We represent node embeddings of image graphs as a 3D tensor G_img
with shape [B, N, D]
and text graph nodes as G_txt
with shape [B, M, D]
, where:
B
= batch sizeN
,M
= number of nodes in image and text graphsD
= embedding dimension
To compute pairwise similarities efficiently, we use:
# Compute pairwise dot-product similarity between all nodes in image and text graphs
similarity_matrix = torch.einsum('bnd,bmd->bnm', G_img, G_txt)