AA+DR+NS: "Adapting Speaker Embeddings for Speaker Diarisation", in Proc. Interspeech, 2021. (Naver) [Paper]
Introduction
Background
- speaker embedding의 quaility가 speaker diarization 성능에 크게 좌지우지됨
Problem
Goal (Objective)
- Speaker diarization에 적합한 sepaker embedding을 위해 SV embedding으로 부터 adaptation 하는 것
Method
- Diarization을 목적으로 한 speaker embedding으로 adaptation을 하기 위해서 3가지 기법을 제안
- Dimensionalyity Reduction (DR)
- Attention-based embedding Aggregation (AA)
- Non-speech Clustering (NS)
2. Baseline System

2.2 Speaker embedding extraction
- VoxCeleb1,2로 훈련된 ResNet34 기반 Pre-trained model 사용
- Extraction for diarizaation
- Semgent window size: 1.5s
- Segment hop size: 0.5s
- Embedding dim: 256-D
2.3 Clustering
1. AHC (agglomerative hierarchical clustering)
