[2018] ELMo : Embedding from Language Model

Lecture Review/DSBA

[2018] ELMo : Embedding from Language Model

frances._.sb 2022. 3. 18. 15:02

728x90

[2018] ELMo : Embedding from Language Model

Pre-trained word representations

- NLP관련 downstream task (QA, classification..)가 많은 neural language understanding model의 key component

High quality representations should ideally model

- 단어의 복잡성을 모델링할 수 있다. (e.g., syntax and semantics)

- 언어학적인 contexts 상에 서로 다르게 사용될 때 해당하는 사용법을 사용해야 합니다. (i.e., to model polysemy)

※ ploysemy의 예시를 들어보자면 눈(eye) 와 눈(snow)

GloVe vs. ELMo

- GloVe 의 play source를 보자면 nearest neighbors는 [playing, game, games, played, player... ] : sport 관련

- biLM은 play가 쓰이는 상황을 보고 파악을 할 수 있다. (놀다, 연주하다.. 등)

※ biLM 은 ELMo의 실제 구조

Features

- entire input sentence의 function에 대한 representation

- embedding vector는 bidirectional LSTM (language model)에서 학습합니다.

- 모든 internal layers의 hiddden vectors와 결합하여 deep 한 성질이 있습니다.

▷ vectors의 linear combination stack들이 각각 학습이 되고, 이는 top LSTM layer를 사용합니다.

▷ 그렇게 되면 가장 high-level LSTM state는 단어 의미의 context-dependent aspects를 표현할 수 있고, 보다 낮은 부분들은 syntax aspects를 표현합니다.

Graphical illustration

- 지금까지 주어진 단어 sequence를 받아서 다음 단어를 예측하는 language modeling입니다.

- bi-directional model은 forward와 backward를 둘 다 사용하는 모델입니다.

: "stick"이 존재하는 model의 위치가 같고, 동일한 level에 있는 hidden state의 vector값이 합쳐서 concatenate hidden state가 되고, forward와 backward model이 같은 level에 있는 hidden state (또는 embedding vector)를 옆으로 concate 한 것입니다.

그 후 각 vector의 가중합을 구하게 되면 ELMo에 대한 embedding 한 "stick"의 값이 나온 것입니다.

여기서 s0, s1, s2는 downstream task가 어떤 것인지에 대해 함께 학습이 되는 parameter입니다.

(syntax parsing이나 POS tagging이라면 s0부분의 가중치가, classification이나 sentiment analysis라면 s2의 가중치가 커질 것입니다.)

ELMo 수식

- R_k : 맨 첫 식을 의미

- s : softmax-normalized weights

- h : 각 forward / backward의 값

- γ : task model이 entire ELMo vector를 scale 하는 데 사용되는 값

728x90

저작자표시

'Lecture Review > DSBA' 카테고리의 다른 글

[2018-2019] GPT + GPT-2 (0)	2022.03.21
[2018] BERT (0)	2022.03.20
[2017] Transformer : Attention Is All You Need (0)	2022.03.17
[2014] Seq2Seq Learning with Neural Networks (0)	2022.03.16
Topic Modeling - 2 (0)	2022.03.16

현재글[2018] ELMo : Embedding from Language Model

Subeen lab 열심히 공부하는 사회초년생

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Subeen lab