[2019] T5 : text to text transformer

Lecture Review/DSBA

[2019] T5 : text to text transformer

frances._.sb 2022. 3. 23. 22:18

728x90

[2019] Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer

: 여러 task가 존재할 때 이 task 자체를 text로 변환하고 input값과 함께 넣어주면 output값 또한 text로 뽑아낸다고 설명합니다.

main idea

① text to text

e.g.)

- grammar check dataset (CoLA)

original input : sentence : "I am a great man."

original target : 1

processed input : CoLA sentence : "I am a great man."

processed target : acceptable

- Sentiment dataset (SST2)

original input : sentence : "it confirms fincher's status as a film maker who artfully bends technical know-how to the service of psychological insight."

original target : 1

processed input : SST2 sentence : "it confirms fincher's status as a film maker who artfully bends technical know-how to the service of psychological insight."

processed target : positive

∴ text 형태로 주어진 문제에서 text 정답 찾기인 것이다.

② transfer learning in NLP

- BERT style model은 encoder-only로, single prediction per input token이나 single prediction for an entire input sequence를 뽑아내어 classification이나 span prediction을 하게 되는데, 이 점이 큰 단점이 된다.

- T5 model은 encoder-decoder structure로, 모든 NLP task에서 동일한 모델, loss, hyperparameter를 사용 가능하다.

이에 적용된 방법론들을 간략하게 살펴보도록 하겠습니다.

ⓐ model architecture : encoder, decoder only 모델보다 basic transformer 구조가 높은 성능을 보이고 있다.

ⓑ pre-training objectives : pretraining에서 noising 된 input을 denoising 하며 단어를 예측하는 방식이 효율성이 높음.

ⓓ training strategies : multitask learning이 unsupervised pretraining과 비슷한 성능, 학습 시 task별 proportion 필요.

ⓔ scaling : 모델의 크기 늘리거나 아상블을 시도하여 실험, 작은 모델은 큰 데이터로 학습하는게 효과적임을 발견.

ⓕ pushing the limits : 110억 개의 parameter를 가지는 모델을 훈련하여 SOTA달성. 1 trillon개 넘은 token 훈련 진행.

③ training objective : modified MLM

- MLM은 bidirectional model 구조를 가집니다.

- BERT는 하나의 token에 masking이지만 t5는 연속된 token을 하나의 mask로 변환합니다.

- encoder-decoder 구조로 input과 target을 가집니다.

- input에서 mask되지 않은 부분을 target에서 맞추어야 합니다.

- output level에서 FFNN + Softmax를 통해 sequence 생성합니다.

④ structure of model

⑤ corruption

: BERT-style + Replace spans + 15% + 3 이 가장 효율적인 모델이라고 논문에서 주장하고 있습니다.

experiment and result

728x90

저작자표시 (새창열림)

'Lecture Review > DSBA' 카테고리의 다른 글

[2019] BART (0)	2022.03.23
[2019] RoBERTa : A Robustly Optimized BERT Pretraining Approach (0)	2022.03.23
[2019] XLNet : Generalized Autogressive Pretrainig for Language (0)	2022.03.23
[2018-2019] GPT + GPT-2 (0)	2022.03.21
[2018] BERT (0)	2022.03.20

현재글[2019] T5 : text to text transformer

Subeen lab