Algolia is a distributed Search-as-a-Service API that processes more than 4 billions user-generated queries per month. Algolia’s DNA is performance: Algolia's service is optimized to reply in milliseconds from any location worldwide while maintaining high availability. Sylvain will provide you details on Algolia architecture, showing how they have designed their fault tolerant service and how they cracked the website & in-app search.
http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/223926122/
What's New in Teams Calling, Meetings and Devices March 2024
Algolia - Hosted Search API
1. Instant Search API
Build Unique Search Experiences
Sylvain Utard
VP of Engineering
sylvain@algolia.com
@sylvainutard
Enterprise Search and Analytics
2. @algolia
Who am I?
5 years @ Exalead, leading the core-engine & NLP teams
• C++
• ExaScript (RIP)
• Java
2 years @ Algolia, VP of Engineering
• C++
• Ruby
• Java
• and 10+ other languages…
@sylvainutard
17. @algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
18. @algolia
Unique set of constraints
High volume of Read & Write operations
High-availability
Worldwide data distribution
19. @algolia
API Software Stack
Started as a mobile offline SDK
Written in C++
Search code embedded in Nginx as a module
Indexing is done in a separate process
Two redis instances
20. @algolia
API Hardware
Fast CPU (Xeon E5 >3.5GHz)
In Memory (128GB)
Backed by High-end SSD in Raid-0 (800GB)
Specific kernel settings
22. @algolia
What is a cluster
Master-Master
Stream of writes via Consensus
At least 3 machines
23. @algolia
A write in practice
One of the machines accept
the write operation via the API (https)
/1/indexes/MyFirstIndex/batch
24. @algolia
A write in practice
The file is saved on the three machines
as a temporary file
tmp1265
tmp7864
tmp2357
25. @algolia
A write in practice
Launch the consensus by contacting
the RAFT master
startConsensus(tmp2357, tmp7864, tmp1265)
26. @algolia
A write in practice
1 -Master send the commit order to all nodes
2- Each node returns the next job ID to master
3- If there is a majority the file is committed
27. @algolia
A write in practice
Same job ID on all hosts
Send to slave replicate in parallel
Processed in parallel on all hosts
job42
job42
job42
28. @algolia
In case one host is down
Continue to accept writes
The two other hosts keep jobs
Jobs are sequential, will catch up at restart
job42job42
35. @algolia
• 13 locations = 25 datacenters
• No ideal worldwide provider
• AWS is not in India, Eastern EU, Africa…
• Need to handle several providers
• Anticipate long deliveries / customs
• Keep as few providers as possible
Distributed Search Network - Worldwide Synchronization