Presented by David Giffin | Etsy. See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 13 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges including, the evolution of indexing at Etsy, how HBase and Hadoop have taken indexing from hours to minutes, how and why we use bittorrent for Solr replication, how we track search performance, our approach to shave crucial milliseconds off every search, and an overview of our continuous deployment strategy, web / search config integration and A/B testing and analytics.
57. HBase + Hadoop
$ ./compare
ERROR: please provide two index directories
example: ./compare -p 0.1 -i user_id ./index ./index-1332867952588
options:
-p --percent= percent of the index to check
-i --id= primary key id field in the index
-h --hash= comparison or hash field in the index
<index> <index>
Monday, May 14, 12
58. HBase + Hadoop
$ ./compare
/search/data/person/index-1332867952588/
/search/data/person/index-1335378487672
id field: user_id
hash field: hash
percentage: 0.0010
files: /search/data/person/index-1332867952588/ /search/
data/person/index-1335378487672
/search/data/person/index-1332867952588 contains 1515512 docs
/search/data/person/index-1335378487672 contains 14837972 docs
1516 of 1516 documents are the same
Monday, May 14, 12
71. Replication
Fork of TTorent: https://github.com/
etsy/ttorrent
Multi-File Support
Large File Support
Fork BitTorrent: Comming Soon
Monday, May 14, 12