SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Approximate nearest
neighbors & vector
models
I’m Erik
• @fulhack
• Author of Annoy, Luigi
• Currently CTO of Better
• Previously 5 years at Spotify
What’s nearest
neighbor(s)
• Let’s say you have a bunch of points
Grab a bunch of points
5 nearest neighbors
20 nearest neighbors
100 nearest neighbors
…But what’s the point?
• vector models are everywhere
• lots of applications (language processing,
recommender systems, computer vision)
MNIST example
• 28x28 = 784-dimensional dataset
• Define distance in terms of pixels:
MNIST neighbors
…Much better approach
1. Start with high dimensional data
2. Run dimensionality reduction to 10-1000 dims
3. Do stuff in a small dimensional space
Deep learning for food
• Deep model trained on a GPU on 6M random pics
downloaded from Yelp
156x156x32
154x154x32
152x152x32
76x76x64
74x74x64
72x72x64
36x36x128
34x34x128
32x32x128
16x16x256
14x14x256
12x12x256
6x6x512
4x4x512
2x2x512
2048
2048
128
1244
3x3 convolutions
2x2 maxpool
fully
connected
with dropout
bottleneck
layer
Distance in smaller space
1. Run image through the network
2. Use the 128-dimensional bottleneck layer as an
item vector
3. Use cosine distance in the reduced space
Nearest food pics
Vector methods for text
• TF-IDF (old) – no dimensionality reduction
• Latent Semantic Analysis (1988)
• Probabilistic Latent Semantic Analysis (2000)
• Semantic Hashing (2007)
• word2vec (2013), RNN, LSTM, …
Represent documents and/or
words as f-dimensional vector
Latentfactor1
Latent factor 2
banana
apple
boat
Vector methods for
collaborative filtering
• Supervised methods: See everything from the
Netflix Prize
• Unsupervised: Use NLP methods
CF vectors – examplesIPMF item item:
P(i ! j) = exp(bT
j bi)/Zi =
exp(bT
j bi)
P
k exp(bT
k bi)
VECTORS:
pui = aT
u bi
simij = cos(bi, bj) =
bT
i bj
|bi||bj|
O(f)
i j simi,j
2pac 2pac 1.0
2pac Notorious B.I.G. 0.91
2pac Dr. Dre 0.87
2pac Florence + the Machine 0.26
Florence + the Machine Lana Del Rey 0.81
IPMF item item MDS:
P(i ! j) = exp(bT
j bi)/Zi =
exp( |bj bi|
2
)
P
k exp( |bk bi|
2
)
simij = |bj bi|
2
(u, i, count)
@L
Geospatial indexing
• Ping the world: https://github.com/erikbern/ping
• k-NN regression using Annoy
Nearest neighbors the
brute force way
• we can always do an exhaustive search to find the
nearest neighbors
• imagine MySQL doing a linear scan for every
query…
Using word2vec’s brute
force search
$ time echo -e "chinese rivernEXITn" | ./distance GoogleNews-
vectors-negative300.bin	
!
Qiantang_River		 0.597229	
Yangtse		 0.587990	
Yangtze_River		 0.576738	
lake		 0.567611	
rivers		 0.567264	
creek		 0.567135	
Mekong_river		 0.550916	
Xiangjiang_River		 0.550451	
Beas_river		 0.549198	
Minjiang_River		 0.548721	
real	2m34.346s	
user	1m36.235s	
sys	0m16.362s
Introducing Annoy
• https://github.com/spotify/annoy
• mmap-based ANN library
• Written in C++, with Python and R bindings
• 585 stars on Github
Using Annoy’s search
$ time echo -e "chinese rivernEXITn" | python
nearest_neighbors.py ~/tmp/word2vec/GoogleNews-vectors-
negative300.bin 100000	
Yangtse	 0.907756	
Yangtze_River	 0.920067	
rivers	 0.930308	
creek	 0.930447	
Mekong_river	 0.947718	
Huangpu_River	 0.951850	
Ganges	 0.959261	
Thu_Bon	 0.960545	
Yangtze	 0.966199	
Yangtze_river	 0.978978	
real	0m0.470s	
user	0m0.285s	
sys	0m0.162s
Using Annoy’s search
$ time echo -e "chinese rivernEXITn" | python
nearest_neighbors.py ~/tmp/word2vec/GoogleNews-vectors-
negative300.bin 1000000	
Qiantang_River	 0.897519	
Yangtse	 0.907756	
Yangtze_River	 0.920067	
lake	 0.929934	
rivers	 0.930308	
creek	 0.930447	
Mekong_river	 0.947718	
Xiangjiang_River	 0.948208	
Beas_river	 0.949528	
Minjiang_River	 0.950031	
real	0m2.013s	
user	0m1.386s	
sys	0m0.614s
(performance)
1. Building an Annoy
index
Start with the point set
Split it in two halves
Split again
Again…
…more iterations later
Side note: making trees
small
• Split until K items in each leaf (K~100)
• Takes (n/K) memory instead of n
Binary tree
2. Searching
Nearest neighbors
Searching the tree
Problemo
• The point that’s the closest isn’t necessarily in the
same leaf of the binary tree
• Two points that are really close may end up on
different sides of a split
• Solution: go to both sides of a split if it’s close
Trick 1: Priority queue
• Traverse the tree using a priority queue
• sort by min(margin) for the path from the root
Trick 2: many trees
• Construct trees randomly many times
• Use the same priority queue to search all of them at
the same time
heap + forest = best
• Since we use a priority queue, we will dive down
the best splits with the biggest distance
• More trees always helps!
• Only constraint is more trees require more RAM
Annoy query structure
1. Use priority queue to search all trees until we’ve
found k items
2. Take union and remove duplicates (a lot)
3. Compute distance for remaining items
4. Return the nearest n items
Find candidates
Take union of all leaves
Compute distances
Return nearest neighbors
“Curse of dimensionality”
Are we screwed?
• Would be nice if the data is has a much smaller
“intrinsic dimension”!
Improving the algorithm
Queries/s
1-NN accuracy
more accurate
faster
• https://github.com/erikbern/ann-benchmarks
ann-benchmarks
perf/accuracy tradeoffs
Queries/s
1-NN accuracy
search more nodes
more trees
Things that work
• Smarter plane splitting
• Priority queue heuristics
• Search more nodes than number of results
• Align nodes closer together
Things that don’t work
• Use lower-precision arithmetic
• Priority queue by other heuristics (number of trees)
• Precompute vector norms
Things for the future
• Use a optimization scheme for tree building
• Add more distance functions (eg. edit distance)
• Use a proper KV store as a backend (eg. LMDB) to
support incremental adds, out-of-core, arbitrary
keys: https://github.com/Houzz/annoy2
Thanks!
• https://github.com/spotify/annoy
• https://github.com/erikbern/ann-benchmarks
• https://github.com/erikbern/ann-presentation
• erikbern.com
• @fulhack

Contenu connexe

Tendances

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.Yongho Ha
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 

Tendances (20)

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
Shap
ShapShap
Shap
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Lstm
LstmLstm
Lstm
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 

En vedette

김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기Harns (Nak-Hyoung) Kim
 
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David StelzerDeveloping Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David StelzerJessica Tams
 
영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출동윤 이
 
Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4Huey Park
 
NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지Daehoon Han
 
게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지Harns (Nak-Hyoung) Kim
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4동석 김
 
Deep learning as_WaveExtractor
Deep learning as_WaveExtractorDeep learning as_WaveExtractor
Deep learning as_WaveExtractor동윤 이
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러Heungsub Lee
 
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012Esun Kim
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기Sumin Byeon
 
Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발Chris Ohk
 
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임Imseong Kang
 
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다Lee Dustin
 
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]Sumin Byeon
 
Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)Esun Kim
 
8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈Harns (Nak-Hyoung) Kim
 
버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션동석 김
 

En vedette (20)

김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기
 
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David StelzerDeveloping Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
 
영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출
 
Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4
 
NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지
 
게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4
 
Deep learning as_WaveExtractor
Deep learning as_WaveExtractorDeep learning as_WaveExtractor
Deep learning as_WaveExtractor
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러
 
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
 
Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발
 
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
 
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
 
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
 
Docker
DockerDocker
Docker
 
Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)
 
8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈
 
버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션
 

Similaire à Approximate nearest neighbor methods and vector models – NYC ML meetup

Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageMLconf
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends준호 김
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMarijn van Zelst
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Codemotion
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsChitta Ranjan
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptxssuserf583ac
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptxRohanBorgalli
 

Similaire à Approximate nearest neighbor methods and vector models – NYC ML meetup (20)

Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 

Dernier

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 

Dernier (20)

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 

Approximate nearest neighbor methods and vector models – NYC ML meetup