SlideShare une entreprise Scribd logo
1  sur  122
Télécharger pour lire hors ligne
기계독해 QA:
검색인가, NLP인가?
이름 : 서민준
소속 : NAVER / Clova ML
QA = Question Answering
너 큰일난듯. 탑항공 폐업했대!
*실제로 일어난 일
허럴? 진짜?
왜 폐업했대?
몰라
내 표 환불가능할까?
도와줘 네이버!
도움이 안되는 친굴세.
전화도 안받어…
CONTENTS
1. 검색으로 “찾는” QA – 10분
2. NLP로 “읽는” QA – 10분
3. 검색과 NLP의 접점 – 20분
4. Q&A – 5분
1. 검색으로 “찾는” QA
탑항공 폐업
탑항공 폐업
• 내용 및 제목의 관련성
• 비슷한 검색을 한 유저가 읽은 문서
• 웹사이트의 신뢰도
• 문서의 인기도
• 검색자의 정보
• …
종합적으로 고려해요!
탑항공 폐업
• 내용 및 제목의 관련성
• 비슷한 검색을 한 유저가 읽은 문서
• 웹사이트의 신뢰도
• 문서의 인기도
• 검색자의 정보
• …
Word Matching
검색한 단어가 존재하는 문서를 가져옴
• Ctrl-F
• 제목에만 적용할 경우 꽤 효과적임
“탑항공이 폐업한게 진짜임?”
“탑항공 폐업 레알?”
TF-IDF
Term Frequency – Inverse Document Frequency
• 중요 키워드 (흔하지 않은 단어)에 더 높은 가중치를 줌.
• 질문이 길어지고 문서 내용 검색을 한다면 필수
Okapi BM25
“Best Matching” (Robertson et al., 1970s)
• TF-IDF 의 “업그레이드 버젼”
• TF 부분을 변경
왜 더했다 뺐다
그러는 거야
LSA
Latent Semantic Analysis (Deerwester et al., 1988)
• Bag of words (sparse) à dense vector via SVD
• 각 단어에 추상적인 “태그”를 달아줌
• 추상적인 ”태그”를 통해 다른 단어끼리도 비교할 수 있게 됨.
• “폐업” ~ “망하다” ~ “몰락”
내가 궁금한 걸 꼭 집어서 알려줄 순 없어?
문서는 찾아드릴 수 있는데요…
검색 != 문장 독해
검색의 한계
문장을 “읽는” 것이 아니다
• 단어 (lexical) 수준의 정보습득은 가능하나…
• 문법적 (syntactic) 또는 의미적 (semantic) 맥락을 파악 못함.
• 문서나 문단 수준 이상으로 “꼭 집어서” 답을 가져오기 힘듬.
2. NLP로 “읽는” QA
게으른 우리가 원하는 것
제가 읽어봤는데요,
대내외적인 경영환경 악화로
폐업했대요.
왜 폐업했대?
똑똑하구만!
기계학습의 첫 단계:
인풋 ,아웃풋 정의하기
탑항공이 왜 폐업했대?
대내외적인 경영환경 악화
Inputs Output
일단 인풋과
아웃풋을
정의해보잣
생성 or 추출?
생성모델은 매력적으로 보이지만…
Generative Model 의 문제점
서비스 퀄리티가 안나온다.
• 엉뚱한 답을 내는 경우가 너무 많음.
• 데이터 퀄리티 컨트롤이 어려움. (예: MS MARCO1)
1 Nguyen et al. MS MARCO: A human generated machine reading
comprehension dataset. 2016.
평가 (Evaluation) 도 어렵다.
• BLEU 가 있기는 하지만…
결국 Extractive
5분만에 보는
Neural Extractive QA Trend
7 Milestones in Extractive QA
1. Sentence-level QA (May 2015)
2. Phrase-level QA (May 2016)
3. Cross-attention (Nov 2016)
4. Self-attention (Mar 2017)
5. Transfer learning (Nov 2017)
6. Super-human level (Jan 2018)
7. What’s next? (Nov 2018)
Task definition
Models
7 Milestones in Extractive QA
1. Sentence-level QA (May 2015)
2. Phrase-level QA (May 2016)
3. Cross-attention (Nov 2016)
4. Self-attention (Mar 2017)
5. Transfer learning (Nov 2017)
6. Super-human level (Jan 2018)
7. What’s next? (Nov 2018)
1. Sentence-level QA
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Yang et al. WikiQA: A Challenge Dataset for Open-domain
Question Answering. EMNLP 2015.
1. Sentence-level QA
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is
at Corinth, with all the saints which are in
all Achaia”.
Who wrote second
Corinthians?
Yang et al. WikiQA: A Challenge Dataset for Open-domain
Question Answering. EMNLP 2015.
하지만 답이 너무 길다…
답만 딱 보여줄 수 없을까?
2. Phrase-level QA
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Rajpurkar et al. SQuAD: 100,000+ Questions for Machine
Comprehension of Text. EMNLP 2016
2. Phrase-level QA
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Rajpurkar et al. SQuAD: 100,000+ Questions for Machine
Comprehension of Text. EMNLP 2016
SQuAD: 100,000+
2년동안
100+ models!
7 Milestones in Extractive QA
1. Sentence-level QA (May 2015)
2. Phrase-level QA (May 2016)
3. Cross-attention (Nov 2016)
4. Self-attention (Mar 2017)
5. Transfer learning (Nov 2017)
6. Super-human level (Jan 2018)
7. What’s next? (Nov 2018)
2. Cross-attention
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
문서를 읽으면서 질문을 참고
질문을 읽으면서 문서를 참고
2. Cross-attention
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Seo et al. Bi-directional attention flow for machine
comprehension. ICLR 2017.
2. Self-attention
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Clark & Gardner. Simple and effective multi-paragraph reading
comprehension. 2017
문서를 읽으면서 문서의 다른
부분을 참고
2. Self-attention
Second Epistle to the Corinthians The
Second Epistle to the Corinthians, often
referred to as Second Corinthians (and
written as 2 Corinthians), is the eighth book
of the New Testament of the Bible. Paul the
Apostle and “Timothy our brother” wrote
this epistle to “the church of God which is at
Corinth, with all the saints which are in all
Achaia”.
Who wrote second
Corinthians?
Clark & Gardner. Simple and effective multi-paragraph reading
comprehension. 2017
Unlabeled Corpus를 활용할 수
없을까?
4. Transfer learning
3 billion words, unlabeled 2 million words, labeled
Language
model
Peters et al. Deep contextualized word representations.
NAACL 2018.
컴퓨터가
사람을 이길 수 있을까?
5. Super-human level
• Ensemble
• NLP tools (POS, parser, etc.)
• Data Augmentation
• A lot of layers
Hi, Nice to
meet you!
MT 안녕, 반가워! MT
Hello, great
to see you!
• 이….
• 것…
• 저…
• 것…
Yu et al. QANet: Combining local convolution with global self-
attention for reading comprehension. ICLR 2018.
오늘 4시
Track 4
사람보다
5% 높음.
7 Milestones in Extractive QA
1. Sentence-level QA (May 2015)
2. Phrase-level QA (May 2016)
3. Cross-attention (Nov 2016)
4. Self-attention (Mar 2017)
5. Transfer learning (Nov 2017)
6. Super-human level (Jan 2018)
7. What’s next? (Nov 2018)
QuAC (Conversational)
Choi et al., EMNLP 2018
HotpotQA (Reasoning)
Yang et al., EMNLP 2018
정확한 건 좋은데, 얼마나 걸려?
음… GPU를 사용하면 한 문서 읽는데 0.1초정도?
0.1초
하지만
Linear-time 의 굴레
에서 벗어날 수가 없다.
Microsoft Research Asia. R-Net: machine reading comprehension
with self matching networks. 2017.
문서 560만개
단어 30억개
정확한 건 좋은데, 얼마나 걸려?
음… GPU를 사용하면 한 문서 읽는데 0.1초정도?
그러니까… 6일정도요.
3. 검색과 NLP의 접점
질문 하나에 1주일?
!#$@*%(@*@
아 그러면 검색을 이용해서 문서를 찾고,
그거만 읽을게요!
Solution 1: 찾고나서 읽자!
1961
Chen et al. Reading Wikipedia to Answer Open-Domain
Questions. ACL 2017.
잠깐, 그런데 검색엔진이 잘못된 답을
내면 어떡하지?
“탑항공이 폐업한게 진짜임?”
Error Propagation 뿐만 아니라
엘레강스가 부족하다
*End-to-End 중독
1961
하나의 깔끔한 모델
?
5.8일
문서 560만개
4,000x 느리다5,000,000x 짧고
0.1초 CPU
Titan Xp
200억배 빠르게?!
Solution 2: 찾기와 읽기를 동시에?
검색은 어떻게 문서를 빨리 찾을까?
그린팩토리 도서관이군.
역사
미술
IT
.
.
.
소설
한국전쟁은 언제 터졌어?
A
B
K
.
.
.
벡터를 정리하자
[0.3, 0.5, …]
[0.7, 0.1, …]
[0.6, 0.2, …]
.
.
.
[0.4, 0.4, …]
한국전쟁은
언제 터졌어?
[…]
[…]
[…]
.
.
.
[0.5, 0.1, …]
[0.3, 0.4, …]
[0.4, 0.5, …]
[0.8, 0.1, …]
[0.4, 0.4, …]
[0.4, 0.3, …]
Locality-Sensitive Hashing
비슷한 아이템의 충돌을 최대화
MIPS
Locality-Sensitive Hashing (LSH)
• Symmetric: distance functions (Nearest Neighbor Search)
• L2
• L1
• Cosine
• Asymmetric: inner product (MIPS)
• Dot product
!(#$%
log $ )
!(#$)
*= 근사 factor (<1)
Shrivastava and Li. Asymmetric lsh (alsh) for sublinear time maximum
inner product search (mips). NIPS 2014.
Sublinear-time 근사값.
아주 빠르다.
문서 à 구문 (Phrase)?
Super Bowl 50 !"
American football game !#
National Football League !$
Denver Broncos !%
…
Which NFL team
represented the
AFC at
Super Bowl 50?
&
MIPS
수식으로 보는 기존과 비교
• 문서 d 와 쿼리 q 가 주어졌을 때:
!" = argmax
)
*+("|.; 0)
*+("|.; 0) ∝ exp(5+ ", ., 0 )
*+("|.; 0) ∝ exp(7+(.) 8 9+(", 0))
where
기존: 매 새로운 질문마다 F
를 재계산 해야 함.
제안: H 는 미리 계산 될 수
있고 index (hash) 될 수도
있음
Decomposition
그러나
Decomposition이 매우 어렵다
새로운 연구문제:
Phrase-Indexed QA (PIQA)
5분간 듣는 PIQA 1년 삽질기
1. Baseline 은 그리 어렵지 않았다
2. Duality의 활용
3. Multimodality…
4. Sparsity: 단번에 9% 업!
5. Scalability: 가능은 하지만 만만치 않은…
작년 6월부터
Baseline 1: LSTM
… water transforms into steam within a boiler … What does water turn into when heated?
Document Question
Bi-LSTM Bi-LSTM
!" !# !$ !% !& !' !(
Weighted Sum
)
Nearest Neighbor
Baseline 2: Self-Attention
… water transforms into steam within a boiler … What does water turn into when heated?
Document Question
!" !# !$ !% !& !' !(
“steam” “water” + “transform” + “boiler” “What” “water” + “turn” + “heated”
type clue type clue
)% *
dot
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
Seo et al. Phrase-indexed question answering: a new challenge
toward scalable document comprehension. EMNLP 2018.
Duality & Multimodality
Barack Obama was 44th president from 2009 to 2017.
일대다 관계
!, #
$
$(!, #)
Q1: Who was president in 2009?
Q2: Who was the 44th president?
Duality를 활용할 수 없을까?
Duality: Question Reconstruction
What does water turn into when heated?
Question
Bi-LSTM
!" !# !$ !% !& !' !(
Weighted Sum
)
Nearest Neighbor
Generation
seq2seq decoder
(without attention)
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
LSTM+SA+Dual 63.2 52.0
어떻게 Multimodality를
해결할 수 있을까?
!"($|&; ()
Barack Obama was 44th president from 2009 to 2017.
Who was president in 2009? Who was the 44th president?
*"($|&; ()
Multimodality
이론적인 해결책
Barack Obama was 44th president from 2009 to 2017.
Q1: Who was president in 2009?
Q2: Who was the 44th president?
Latent Variable 을 사용하면 된다?
!, #
$
$(!, &1, #)
$(!, z2, #)
그래서 (1년동안!) 시도해 본 것들
1. Multiple identical models (ensemble)
2. Orthogonality regularization
3. Sequential decoding
4. Latent variable from Gaussian distribution
5. Latent variable from surrounding words
그래서 (1년동안!) 시도해 본 것들
1. Multiple identical models (ensemble)
2. Orthogonality regularization
3. Sequential decoding
4. Latent variable from Gaussian distribution
5. Latent variable from surrounding words
정확성을 좀 올려주지만,
30배 이상의 storage가 필요.
안됨…
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
LSTM+SA+Dual 63.2 52.0
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
LSTM+SA+Dual 63.2 52.0
LSTM+SA+Multi-mode 66.5 55.1
Dense + Sparse?
Sparse vector
“steam” “water” + “transform” + “boiler”
type clue
!"
steamboiler water transform
Dense vector
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
LSTM+SA+Dual 63.2 52.0
LSTM+SA+Multi-mode 66.5 55.1
SQuAD F1 (%) EM (%)
First Baseline 40.0 51.0
SOTA 91.2 85.4
PI-SQuAD F1 EM
LSTM 57.2 46.8
LSTM+SA 59.8 49.0
LSTM+SA+Dual 63.2 52.0
LSTM+SA+Multi-mode 66.5 55.1
LSTM+SA+Sparse+ELMo 69.3 58.7
To be on arXiv soon
Scalability 고려사항 1
• SQuAD 는 문서 하나만 보는 것. à 벤치마크의 성격이 강함
• 실제 QA 시나리오가 아님.
• End-to-end 가 Pipeline보다 더 나을거라는 보장?
추가 실험들이 필요!
Scalability 고려사항 2
• 영어 위키피디아 단어수: 30억개
• 단어당 구문수: 평균 7개
• 구문당 vector dimension: 1024
• Float32: 4 Byte
약 90 TB (210억개의 구문)
Scalability 고려사항 2
• 영어 위키피디아 단어수: 30억개
• 단어당 구문수: 평균 7개
• 구문당 vector dimension: 1024
• Float32: 4 Byte
최적화 가능
약 90 TB (210억개의 구문)
phrase embedding에 내포된 의미?
Super Bowl 50 !"
American football game !#
National Football League !$
Denver Broncos !%
…
Which NFL team
represented the
AFC at
Super Bowl 50?
&
MIPS
According to the American Library
Association, this makes …
… tasked with drafting a European
Charter of Human Rights, …
비슷한 타입의 고유명사 (lexical)
The LM engines were successfully test-
fired and restarted, …
Steam turbines were extensively
applied …
비슷한 semantic (의미) 및 syntactic (문법) 구조
… primarily accomplished through the
ductile stretching and thinning.
… directly derived from the
homogeneity or symmetry of space …
비슷한 syntactic (문법) 구조
그러니까 결론이 뭐야?
검색과 NLP의 아름다운 조화
아직 갈길은 멀지만, 같이 연구하고
고민해 보자구요!
나는 당장 잘되는게
필요하다구
둘 다 할게요 ㅜㅜ
tl;dr:
Representing the world knowledge
in an elegant way
Thank you
Q & A
질문은 Slido에 남겨주세요.
sli.do
#deview
TRACK 2
We are Hiring!
Domains
• Speech Recognition
• Speech Synthesis
• Computer Vision
• Natural Language
• NSML / AutoML
• Finance AI
• App/Web Services
Positions
• Research Scientist
• Research Engineer
• SW Engineer
• Android / iOS Engineer
• Backend Engineer
• Data Engineer
• UI/UX Engineer
• Internship Member
• Global Residency
clova-jobs@navercorp.com

Contenu connexe

Tendances

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embeddingtaeseon ryu
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.Yongho Ha
 
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsAll-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsMakoto Takenaka
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNNTakashi Abe
 
[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎Deep Learning JP
 
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)Tae Young Lee
 
Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델KwangHyeonPark
 
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料at grandpa
 
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색 제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색 BOAZ Bigdata
 
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용NAVER Engineering
 
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축Heewon Jeon
 
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -tmtm otm
 
TOROS N2 - lightweight approximate Nearest Neighbor library
TOROS N2 - lightweight approximate Nearest Neighbor libraryTOROS N2 - lightweight approximate Nearest Neighbor library
TOROS N2 - lightweight approximate Nearest Neighbor libraryif kakao
 
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画(2021.3) 不均一系触媒研究のための機械学習と最適実験計画
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画Ichigaku Takigawa
 
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018Taehoon Kim
 
Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展Yuichi Yoshida
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템NAVER D2
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...Masatoshi Yoshida
 

Tendances (20)

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsAll-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN
 
[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎
 
機械学習と主成分分析
機械学習と主成分分析機械学習と主成分分析
機械学習と主成分分析
 
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)
파이썬 데이터과학 1일차 - 초보자를 위한 데이터분석, 데이터시각화 (이태영)
 
Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델Enliple BERT-Small을 이용한 KorQuAD 모델
Enliple BERT-Small을 이용한 KorQuAD 모델
 
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料
機械学習プロフェッショナルシリーズ輪読会 #2 Chapter 5 「自己符号化器」 資料
 
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색 제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색
제 15회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [YouPlace 팀] : 카프카와 스파크를 활용한 유튜브 영상 속 제주 명소 검색
 
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용
미등록단어 문제 해결을 위한 비지도학습 기반 한국어자연어처리 방법론 및 응용
 
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한  질의어 오타 교정 시스템 구축
한글 검색 질의어 오타 패턴 분석과 사용자 로그를 이용한 질의어 오타 교정 시스템 구축
 
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -
 
TOROS N2 - lightweight approximate Nearest Neighbor library
TOROS N2 - lightweight approximate Nearest Neighbor libraryTOROS N2 - lightweight approximate Nearest Neighbor library
TOROS N2 - lightweight approximate Nearest Neighbor library
 
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画(2021.3) 不均一系触媒研究のための機械学習と最適実験計画
(2021.3) 不均一系触媒研究のための機械学習と最適実験計画
 
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018
머신러닝 해외 취업 준비: 닳고 닳은 이력서와 고통스러웠던 면접을 돌아보며 SNU 2018
 
Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
 

Similaire à [223]기계독해 QA: 검색인가, NLP인가?

Sue Bell AAA 2016
Sue Bell AAA 2016Sue Bell AAA 2016
Sue Bell AAA 2016Ray Poynter
 
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Antonio Toral
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptxPriyadharshiniG41
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptxPriyadharshiniG41
 
What an old bird says 2010
What an old bird says 2010What an old bird says 2010
What an old bird says 2010Zhibo Xiao
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptxbuivantan_uneti
 
What’s wrong with research papers - and (how) can we fix it?
What’s wrong with research papers -  and (how) can we fix it?What’s wrong with research papers -  and (how) can we fix it?
What’s wrong with research papers - and (how) can we fix it?Anita de Waard
 
Tips on Transcribing Qualitative Interviews
Tips on Transcribing Qualitative InterviewsTips on Transcribing Qualitative Interviews
Tips on Transcribing Qualitative InterviewsCelia Emmelhainz
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer scienceFelienne Hermans
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
Live Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One HomeLive Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One HomeStephanie Brown
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slidesFilip Ilievski
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceLeon Derczynski
 
PyCon APAC 2016 Keynote
PyCon APAC 2016 KeynotePyCon APAC 2016 Keynote
PyCon APAC 2016 KeynoteWes McKinney
 

Similaire à [223]기계독해 QA: 검색인가, NLP인가? (20)

Sue Bell AAA 2016
Sue Bell AAA 2016Sue Bell AAA 2016
Sue Bell AAA 2016
 
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
What an old bird says 2010
What an old bird says 2010What an old bird says 2010
What an old bird says 2010
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Lo "AI-infused interfaces for reading AI preprints"
Lo "AI-infused interfaces for reading AI preprints"Lo "AI-infused interfaces for reading AI preprints"
Lo "AI-infused interfaces for reading AI preprints"
 
What’s wrong with research papers - and (how) can we fix it?
What’s wrong with research papers -  and (how) can we fix it?What’s wrong with research papers -  and (how) can we fix it?
What’s wrong with research papers - and (how) can we fix it?
 
Tips on Transcribing Qualitative Interviews
Tips on Transcribing Qualitative InterviewsTips on Transcribing Qualitative Interviews
Tips on Transcribing Qualitative Interviews
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
Live Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One HomeLive Usability Lab: See One, Do One & Take One Home
Live Usability Lab: See One, Do One & Take One Home
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
PyCon APAC 2016 Keynote
PyCon APAC 2016 KeynotePyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
 

Plus de NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[231] Clova 화자인식
[231] Clova 화자인식[231] Clova 화자인식
[231] Clova 화자인식NAVER D2
 

Plus de NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[231] Clova 화자인식
[231] Clova 화자인식[231] Clova 화자인식
[231] Clova 화자인식
 

Dernier

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Dernier (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

[223]기계독해 QA: 검색인가, NLP인가?

  • 1. 기계독해 QA: 검색인가, NLP인가? 이름 : 서민준 소속 : NAVER / Clova ML
  • 2. QA = Question Answering
  • 3. 너 큰일난듯. 탑항공 폐업했대! *실제로 일어난 일 허럴? 진짜? 왜 폐업했대? 몰라 내 표 환불가능할까? 도와줘 네이버! 도움이 안되는 친굴세. 전화도 안받어…
  • 4. CONTENTS 1. 검색으로 “찾는” QA – 10분 2. NLP로 “읽는” QA – 10분 3. 검색과 NLP의 접점 – 20분 4. Q&A – 5분
  • 7. 탑항공 폐업 • 내용 및 제목의 관련성 • 비슷한 검색을 한 유저가 읽은 문서 • 웹사이트의 신뢰도 • 문서의 인기도 • 검색자의 정보 • … 종합적으로 고려해요!
  • 8. 탑항공 폐업 • 내용 및 제목의 관련성 • 비슷한 검색을 한 유저가 읽은 문서 • 웹사이트의 신뢰도 • 문서의 인기도 • 검색자의 정보 • …
  • 9. Word Matching 검색한 단어가 존재하는 문서를 가져옴 • Ctrl-F • 제목에만 적용할 경우 꽤 효과적임
  • 11. TF-IDF Term Frequency – Inverse Document Frequency • 중요 키워드 (흔하지 않은 단어)에 더 높은 가중치를 줌. • 질문이 길어지고 문서 내용 검색을 한다면 필수
  • 12. Okapi BM25 “Best Matching” (Robertson et al., 1970s) • TF-IDF 의 “업그레이드 버젼” • TF 부분을 변경 왜 더했다 뺐다 그러는 거야
  • 13. LSA Latent Semantic Analysis (Deerwester et al., 1988) • Bag of words (sparse) à dense vector via SVD • 각 단어에 추상적인 “태그”를 달아줌 • 추상적인 ”태그”를 통해 다른 단어끼리도 비교할 수 있게 됨. • “폐업” ~ “망하다” ~ “몰락”
  • 14. 내가 궁금한 걸 꼭 집어서 알려줄 순 없어? 문서는 찾아드릴 수 있는데요…
  • 16. 검색의 한계 문장을 “읽는” 것이 아니다 • 단어 (lexical) 수준의 정보습득은 가능하나… • 문법적 (syntactic) 또는 의미적 (semantic) 맥락을 파악 못함. • 문서나 문단 수준 이상으로 “꼭 집어서” 답을 가져오기 힘듬.
  • 19. 제가 읽어봤는데요, 대내외적인 경영환경 악화로 폐업했대요. 왜 폐업했대? 똑똑하구만!
  • 20. 기계학습의 첫 단계: 인풋 ,아웃풋 정의하기
  • 21. 탑항공이 왜 폐업했대? 대내외적인 경영환경 악화 Inputs Output 일단 인풋과 아웃풋을 정의해보잣
  • 24. Generative Model 의 문제점 서비스 퀄리티가 안나온다. • 엉뚱한 답을 내는 경우가 너무 많음. • 데이터 퀄리티 컨트롤이 어려움. (예: MS MARCO1) 1 Nguyen et al. MS MARCO: A human generated machine reading comprehension dataset. 2016. 평가 (Evaluation) 도 어렵다. • BLEU 가 있기는 하지만…
  • 27. 7 Milestones in Extractive QA 1. Sentence-level QA (May 2015) 2. Phrase-level QA (May 2016) 3. Cross-attention (Nov 2016) 4. Self-attention (Mar 2017) 5. Transfer learning (Nov 2017) 6. Super-human level (Jan 2018) 7. What’s next? (Nov 2018) Task definition Models
  • 28. 7 Milestones in Extractive QA 1. Sentence-level QA (May 2015) 2. Phrase-level QA (May 2016) 3. Cross-attention (Nov 2016) 4. Self-attention (Mar 2017) 5. Transfer learning (Nov 2017) 6. Super-human level (Jan 2018) 7. What’s next? (Nov 2018)
  • 29. 1. Sentence-level QA Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Yang et al. WikiQA: A Challenge Dataset for Open-domain Question Answering. EMNLP 2015.
  • 30. 1. Sentence-level QA Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Yang et al. WikiQA: A Challenge Dataset for Open-domain Question Answering. EMNLP 2015.
  • 32. 답만 딱 보여줄 수 없을까?
  • 33. 2. Phrase-level QA Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Rajpurkar et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2016
  • 34. 2. Phrase-level QA Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Rajpurkar et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2016
  • 37. 7 Milestones in Extractive QA 1. Sentence-level QA (May 2015) 2. Phrase-level QA (May 2016) 3. Cross-attention (Nov 2016) 4. Self-attention (Mar 2017) 5. Transfer learning (Nov 2017) 6. Super-human level (Jan 2018) 7. What’s next? (Nov 2018)
  • 38. 2. Cross-attention Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians?
  • 39. 문서를 읽으면서 질문을 참고 질문을 읽으면서 문서를 참고
  • 40. 2. Cross-attention Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Seo et al. Bi-directional attention flow for machine comprehension. ICLR 2017.
  • 41.
  • 42. 2. Self-attention Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Clark & Gardner. Simple and effective multi-paragraph reading comprehension. 2017
  • 43. 문서를 읽으면서 문서의 다른 부분을 참고
  • 44. 2. Self-attention Second Epistle to the Corinthians The Second Epistle to the Corinthians, often referred to as Second Corinthians (and written as 2 Corinthians), is the eighth book of the New Testament of the Bible. Paul the Apostle and “Timothy our brother” wrote this epistle to “the church of God which is at Corinth, with all the saints which are in all Achaia”. Who wrote second Corinthians? Clark & Gardner. Simple and effective multi-paragraph reading comprehension. 2017
  • 45.
  • 47. 4. Transfer learning 3 billion words, unlabeled 2 million words, labeled Language model Peters et al. Deep contextualized word representations. NAACL 2018.
  • 48.
  • 50.
  • 51. 5. Super-human level • Ensemble • NLP tools (POS, parser, etc.) • Data Augmentation • A lot of layers Hi, Nice to meet you! MT 안녕, 반가워! MT Hello, great to see you! • 이…. • 것… • 저… • 것… Yu et al. QANet: Combining local convolution with global self- attention for reading comprehension. ICLR 2018.
  • 52.
  • 53.
  • 55. 7 Milestones in Extractive QA 1. Sentence-level QA (May 2015) 2. Phrase-level QA (May 2016) 3. Cross-attention (Nov 2016) 4. Self-attention (Mar 2017) 5. Transfer learning (Nov 2017) 6. Super-human level (Jan 2018) 7. What’s next? (Nov 2018)
  • 56. QuAC (Conversational) Choi et al., EMNLP 2018 HotpotQA (Reasoning) Yang et al., EMNLP 2018
  • 57. 정확한 건 좋은데, 얼마나 걸려? 음… GPU를 사용하면 한 문서 읽는데 0.1초정도?
  • 59. 하지만 Linear-time 의 굴레 에서 벗어날 수가 없다. Microsoft Research Asia. R-Net: machine reading comprehension with self matching networks. 2017.
  • 61. 정확한 건 좋은데, 얼마나 걸려? 음… GPU를 사용하면 한 문서 읽는데 0.1초정도? 그러니까… 6일정도요.
  • 63. 질문 하나에 1주일? !#$@*%(@*@ 아 그러면 검색을 이용해서 문서를 찾고, 그거만 읽을게요!
  • 65. 1961 Chen et al. Reading Wikipedia to Answer Open-Domain Questions. ACL 2017.
  • 66. 잠깐, 그런데 검색엔진이 잘못된 답을 내면 어떡하지? “탑항공이 폐업한게 진짜임?”
  • 72. Solution 2: 찾기와 읽기를 동시에?
  • 73. 검색은 어떻게 문서를 빨리 찾을까?
  • 77. [0.3, 0.5, …] [0.7, 0.1, …] [0.6, 0.2, …] . . . [0.4, 0.4, …] 한국전쟁은 언제 터졌어? […] […] […] . . . [0.5, 0.1, …] [0.3, 0.4, …] [0.4, 0.5, …] [0.8, 0.1, …] [0.4, 0.4, …] [0.4, 0.3, …] Locality-Sensitive Hashing 비슷한 아이템의 충돌을 최대화 MIPS
  • 78. Locality-Sensitive Hashing (LSH) • Symmetric: distance functions (Nearest Neighbor Search) • L2 • L1 • Cosine • Asymmetric: inner product (MIPS) • Dot product
  • 79. !(#$% log $ ) !(#$) *= 근사 factor (<1) Shrivastava and Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). NIPS 2014.
  • 81. 문서 à 구문 (Phrase)?
  • 82. Super Bowl 50 !" American football game !# National Football League !$ Denver Broncos !% … Which NFL team represented the AFC at Super Bowl 50? & MIPS
  • 83. 수식으로 보는 기존과 비교 • 문서 d 와 쿼리 q 가 주어졌을 때: !" = argmax ) *+("|.; 0) *+("|.; 0) ∝ exp(5+ ", ., 0 ) *+("|.; 0) ∝ exp(7+(.) 8 9+(", 0)) where 기존: 매 새로운 질문마다 F 를 재계산 해야 함. 제안: H 는 미리 계산 될 수 있고 index (hash) 될 수도 있음 Decomposition
  • 86. 5분간 듣는 PIQA 1년 삽질기 1. Baseline 은 그리 어렵지 않았다 2. Duality의 활용 3. Multimodality… 4. Sparsity: 단번에 9% 업! 5. Scalability: 가능은 하지만 만만치 않은… 작년 6월부터
  • 87. Baseline 1: LSTM … water transforms into steam within a boiler … What does water turn into when heated? Document Question Bi-LSTM Bi-LSTM !" !# !$ !% !& !' !( Weighted Sum ) Nearest Neighbor
  • 88. Baseline 2: Self-Attention … water transforms into steam within a boiler … What does water turn into when heated? Document Question !" !# !$ !% !& !' !( “steam” “water” + “transform” + “boiler” “What” “water” + “turn” + “heated” type clue type clue )% * dot
  • 89. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 Seo et al. Phrase-indexed question answering: a new challenge toward scalable document comprehension. EMNLP 2018.
  • 91. Barack Obama was 44th president from 2009 to 2017. 일대다 관계 !, # $ $(!, #) Q1: Who was president in 2009? Q2: Who was the 44th president?
  • 93. Duality: Question Reconstruction What does water turn into when heated? Question Bi-LSTM !" !# !$ !% !& !' !( Weighted Sum ) Nearest Neighbor Generation seq2seq decoder (without attention)
  • 94. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0
  • 95. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 LSTM+SA+Dual 63.2 52.0
  • 97. !"($|&; () Barack Obama was 44th president from 2009 to 2017. Who was president in 2009? Who was the 44th president? *"($|&; () Multimodality
  • 99. Barack Obama was 44th president from 2009 to 2017. Q1: Who was president in 2009? Q2: Who was the 44th president? Latent Variable 을 사용하면 된다? !, # $ $(!, &1, #) $(!, z2, #)
  • 100. 그래서 (1년동안!) 시도해 본 것들 1. Multiple identical models (ensemble) 2. Orthogonality regularization 3. Sequential decoding 4. Latent variable from Gaussian distribution 5. Latent variable from surrounding words
  • 101. 그래서 (1년동안!) 시도해 본 것들 1. Multiple identical models (ensemble) 2. Orthogonality regularization 3. Sequential decoding 4. Latent variable from Gaussian distribution 5. Latent variable from surrounding words 정확성을 좀 올려주지만, 30배 이상의 storage가 필요. 안됨…
  • 102. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 LSTM+SA+Dual 63.2 52.0
  • 103. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 LSTM+SA+Dual 63.2 52.0 LSTM+SA+Multi-mode 66.5 55.1
  • 105. Sparse vector “steam” “water” + “transform” + “boiler” type clue !" steamboiler water transform Dense vector
  • 106. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 LSTM+SA+Dual 63.2 52.0 LSTM+SA+Multi-mode 66.5 55.1
  • 107. SQuAD F1 (%) EM (%) First Baseline 40.0 51.0 SOTA 91.2 85.4 PI-SQuAD F1 EM LSTM 57.2 46.8 LSTM+SA 59.8 49.0 LSTM+SA+Dual 63.2 52.0 LSTM+SA+Multi-mode 66.5 55.1 LSTM+SA+Sparse+ELMo 69.3 58.7 To be on arXiv soon
  • 108. Scalability 고려사항 1 • SQuAD 는 문서 하나만 보는 것. à 벤치마크의 성격이 강함 • 실제 QA 시나리오가 아님. • End-to-end 가 Pipeline보다 더 나을거라는 보장? 추가 실험들이 필요!
  • 109. Scalability 고려사항 2 • 영어 위키피디아 단어수: 30억개 • 단어당 구문수: 평균 7개 • 구문당 vector dimension: 1024 • Float32: 4 Byte 약 90 TB (210억개의 구문)
  • 110. Scalability 고려사항 2 • 영어 위키피디아 단어수: 30억개 • 단어당 구문수: 평균 7개 • 구문당 vector dimension: 1024 • Float32: 4 Byte 최적화 가능 약 90 TB (210억개의 구문)
  • 112. Super Bowl 50 !" American football game !# National Football League !$ Denver Broncos !% … Which NFL team represented the AFC at Super Bowl 50? & MIPS
  • 113. According to the American Library Association, this makes … … tasked with drafting a European Charter of Human Rights, … 비슷한 타입의 고유명사 (lexical)
  • 114. The LM engines were successfully test- fired and restarted, … Steam turbines were extensively applied … 비슷한 semantic (의미) 및 syntactic (문법) 구조
  • 115. … primarily accomplished through the ductile stretching and thinning. … directly derived from the homogeneity or symmetry of space … 비슷한 syntactic (문법) 구조
  • 116. 그러니까 결론이 뭐야? 검색과 NLP의 아름다운 조화 아직 갈길은 멀지만, 같이 연구하고 고민해 보자구요! 나는 당장 잘되는게 필요하다구 둘 다 할게요 ㅜㅜ
  • 117.
  • 118. tl;dr: Representing the world knowledge in an elegant way
  • 120. Q & A
  • 122. We are Hiring! Domains • Speech Recognition • Speech Synthesis • Computer Vision • Natural Language • NSML / AutoML • Finance AI • App/Web Services Positions • Research Scientist • Research Engineer • SW Engineer • Android / iOS Engineer • Backend Engineer • Data Engineer • UI/UX Engineer • Internship Member • Global Residency clova-jobs@navercorp.com