Tamanna Hossain-Kay

Orthogonal Procrustes

Tamanna Hossain-Kay / 2021-06-28


I came across the Orthogonal Procrustes technique when reading the paper Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. It was used to align word embeddings from different runs of SVD (Singular Value Decomposition) and SGNS (Skipgram with Negative Sampling), since they can both result in arbitrary orthogonal transformations of word vectors.

So what is Orthogonal Procrustes?

Problem
Solution
Proof of Solution
Example

Problem

Given two matrices A, B find a matrix R for mapping A to B. Specifically,

R=argminΩΩABF subject to ΩTΩ=I where F denotes the Frobenius norm.

Solution

Let M=BAT and M=UΣVT be its factorization (SVD). Then the matrix for mapping A to B is R=UVT.

Proof of Solution

This proof is based on the one found in Wikipedia: Orthogonal Procrustes problem with additional exposition for my own understanding:

R=argminΩΩABF2=argminΩΩAB,ΩABFFrobenius innter product induces the norm=argminΩΩAF2+BF22ΩA,BF=argminΩAF2+BF22ΩA,BFΩ disappears from first two norms due to constraint ΩTΩ=I=argminΩ2ΩA,BFFirst two terms don't contain Ω=argmaxΩΩA,BFSign change, and constant is not relevant to optimization

But note that,

<ΩA,B>F=tr((CA)T,B)Definition of Frobenius Inner Product=tr(ATΩT,B)=tr(ΩTBAT)Invariance of trace matrix to cyclic permutations of product=tr(ΩTUΣVT)Factorization of M=BAT=tr(VTΩTUΣ)=<UTΩV,Σ>FDefinition of Frobenius Inner Product

Thus,

R=argmaxΩ<UTΩV,Σ>F=argmaxΩS,Σ where S=UTΩV

But note that, S is an orthogonal matrix since it is the product of orthogonal matrices: Ω by the constraint, U, V due to the SVD factorization of M. Then expression is maximised when S equals the identity matrix I.

Thus I=UTRV R=UVT where R is the solution for the optimal value of Ω that minimizes the norm squared ΩABF2.

Example

Consider a 2D matrix A, which represents an eighth note. The matrix (eighth_note.csv) along with all the code that will be used here to implement Orthogonal Procrustes can be found on Github. The repo also contains code used to generate the matrix, plus a matrix that represents a smiley face (smiley.csv) as an alternative :)

A is plotted in pink below.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Read data
A=pd.read_csv('eighth_note.csv')
A=A.to_numpy().transpose() 

#Plot
plt.scatter(A.transpose()[:,0], A.transpose()[:,1], c='#ea99b2')
plt.xlim([-650,580])
plt.ylim([0,550])
plt.show()

Create a matrix B, which is a rotation of A with some noise. B is plotted in lavender below.

#Create B

theta = np.pi * 0.6
R_known = np.array([[np.cos(theta), -np.sin(theta)],[np.sin(theta), np.cos(theta)]])

epsilon = 0.01 # noise
B = np.dot(R_known,A) + epsilon * np.random.normal(0,1, A.shape)

#Plot
plt.scatter(A.transpose()[:,0], A.transpose()[:,1], c='#ea99b2')
plt.scatter(B.transpose()[:,0], B.transpose()[:,1], c='#8b7fa4')

plt.xlim([-650,580])
plt.ylim([0,550])

plt.show()

Now Orthogonal Procrustes is used to find a matrix R which will allow for a mapping from A to B.

#Find orthogonal matrix

from numpy.linalg import svd

M=np.dot(B,A.transpose())
u,s,vh=svd(M)
R=np.dot(u,vh)

Transform A using R to map it to B. The transformed matrix Anew=RA.

Anew is plotted using cyan below: it overlaps B so closely that we can’t see the lavendar plot of B underneath.

# Mapped matrix
new_A=np.dot(R,A)

from matplotlib import pyplot as plt

plt.scatter(B.transpose()[:,0], B.transpose()[:,1], c='#8b7fa4')
plt.scatter(A.transpose()[:,0], A.transpose()[:,1], c='#ea99b2')
plt.scatter(new_A.transpose()[:,0], new_A.transpose()[:,1], c='#55c3b8')

plt.xlim([-650,580])
plt.ylim([0,550])

plt.show()