Geodesic interpolation of frame-wise speaker embeddings for the...
Overlapped speech에서의 speaker embedding regularizer
Through this loss, the frame-wise speaker embeddings are encouraged to lie on the geodesic, i.e., the path along the hypersphere, between single-speaker embeddings of the two active speakers
mixture model-based clustering for overlapping clustering
2023, Frame-wise and overlap-robust speaker embeddings [ICASSP]
single sipeaker에 대해서만 훈련하였음에도 불구하고, overlapping 구간에서 frame-wise embedding이 한명에서 한명으로 부드럽게 transfer되는 것을 보여줌
Problem
there was no control over how exactly this occurred
Objective
explicitly encourage the frame-wise embedding to reflect the two competing speakers
Proposal
이전 paper에서 사용하던 MSE loss 대신에 geodesic loss로 대체
<aside> 💡 정확히는 overlap에 해당하는 조건을 추가한 것
</aside>
$\mathbf{d}(\tilde\alpha_t)$는 overlap region에서 $\mathbf{d}_1$과 $\mathbf{d}_2$의 linear interpolation이면서 $\hat{\mathbf{d}}_t$ 최소의 Eulidean distance를 가지는 조건
최적의 $\tilde{\alpha}_t$은 constraines least square problem을 통해 얻을 수 있음
$r$ 값을 통해서 teacher embedding으로 다시 scale를 조정