'Lecture Review/DSBA' 카테고리의 글 목록 (2 Page)

Topic Modeling - 1

관련 예제 코드는 여기를 확인하면 됩니다. Topic Model - corpus에 존재하는 단어 중 topics k개를 정의하여 어떤 단어가 많이 발생하는지 알 수 있다. - 특정 topic이 얼만큼의 비중을 가지고 섞여있는지 알 수 있다. Disadvantage of LSA - data가 normally distributed data이어야 한다. - term occurrence가 정규분포를 따르지 않는다. - still, tf-idf(weighted matrix) 사용 시에는 좋은 성능을 보인다. Probabilistic Topic Model : Generative Approach - 문서는 topic의 distribution, topic은 word의 distribution이다. - statistica..

Lecture Review/DSBA 2022.03.08

Dimensionality Reduction

[1990] LSA(Latent Semantic Analysis) Singular Value Decomposition : SVD - real or complex matrix를 factorization - A = $U ∑ V^T$ - A = mxn (m>n) / $U$ = mxm / $V^T$ = nxn Properties of SVD - Singular vectors of the matrix U and V are orthogonal - The number of positive singular values in ∑ = Rank(A) Reduce SVDs - Thin SVD : ∑ → square matrix - Compact SVD : remove zero-singular values and corresp..

Lecture Review/DSBA 2022.03.04

Doc2Vec & Others

고려대 강필성 교수님의 강의를 짧게 요약하였습니다. sentence/paragraph/document-leveld에서 embedding을 보겠습니다. [2015] Document Embedding Paragraph Vector model : Distributed Memory(PV-DM) model - Paragraph vectors are shared for all windows generated from the same paragraph, but not across paragraphs Paragraph ID 는 항상 해당 단어 모델링할 때 같은 값을 가진다. - Word vectors are shared across all paragraphs Paragraph Vector model : Distribu..

Lecture Review/DSBA 2022.03.02

NNLM/Word2Vec/GloVe/FastText

고려대학교 강필성교수님 강의를 짧게 정리하였습니다. [2003] NNLM (Neural Network Language Model) Purpose : one-hot vector의 curse of dimensionality를 해결하겠다. - 각 word는 distributed word feature vector 로 표현할 수 있다. - word sequences in terms 의 probability function로 표현할 수 있다. - probability function의 parameters와 word feature vectors를 동시에 할 수 있다. Why it works? - similar roles (semantically and synthetically)에서 문장을 generalize 할 수..

Lecture Review/DSBA 2022.03.02

Subeen lab

Lecture Review/DSBA 14

티스토리툴바