Speeding Up Graph Similarity Matching with Efficient Tensor Ops
Issue
The graph similarity algorithm for matching image-text graph pairs was too slow, particularly in the pairwise comparison step.
Each image-text pair requires computing a similarity matrix between the merged nodes of the image graph and the text graph. When extended to batch inference over many candidate graphs, the computational cost scales poorly: for every image graph, we must compute the similarity between all node pairs in the corresponding text graphs.
The initial implementation used nested for-loops or inefficient matrix operations, creating a bottleneck that made the system unusable for large-scale retrieval.
Solution
Replaced the nested similarity computation with a batched and vectorized tensor operation using torch.einsum.
We represent node embeddings of image graphs as a 3D tensor G_img with shape [B, N, D] and text graph nodes as G_txt with shape [B, M, D], where:
B= batch sizeN,M= number of nodes in image and text graphsD= embedding dimension
To compute pairwise similarities efficiently, we use:
# Compute pairwise dot-product similarity between all nodes in image and text graphs
similarity_matrix = torch.einsum('bnd,bmd->bnm', G_img, G_txt)
