Lecture Review/DSBA

Doc2Vec & Others

frances._.sb 2022. 3. 2. 13:26

고려대 강필성 교수님의 강의를 짧게 요약하였습니다.


sentence/paragraph/document-leveld에서 embedding을 보겠습니다.


[2015] Document Embedding

  • Paragraph Vector model : Distributed Memory(PV-DM) model

- Paragraph vectors are shared for all windows generated from the same paragraph, but not across paragraphs

  Paragraph ID 는 항상 해당 단어 모델링할 때 같은 값을 가진다.

- Word vectors are shared across all paragraphs


  • Paragraph Vector model : Distributed Bag of Words(PV-DBOW) model

- Ignore the context words in the input, and force the model to predict words randomly sampled from the paragrah in the output

  PV-DM모델은 다음 단어가 뭐가 올지에 대해 예측하는 반면에, 위 모델은 ramdomly sampled 되어도 상관이없다.

- 입력을 사용할 때  wordvector를 필요로 하지 않는다. 

- PV-DM alone usually works well for most task, but the combination of PV-DM and PV-DBOW are recommended




[2016+] Let's Embed Everything!

  • Supervised Paragraph Vector (SPV) for Class Embedding


