From beta service to a worldwide distributed API.
From 1 to over 2000 customers all around the world.
This presentation takes you through 12 of the steps Algolia has been through to build and scale its hosted search API, always keeping in mind high availability and speed.
08448380779 Call Girls In Civil Lines Women Seeking Men
Algolia's Fury Road to a Worldwide API - Take Off Conference 2016
1. ALGOLIA’S FURY ROAD TO A WORLDWIDE API
Build Unique Search Experiences
Olivier Lance
Solutions Engineer
olivier.lance@algolia.com
@olance
Take Off Conference
2016
7. .1 March 2013
High Availability was designed…
but not implemented
A single machine in 2 different locations:
Canada/East and Europe/West
Focus on performance, searching over indexing
First customer in prod
RAM: 32GB
Proc: 4 cores, 3.4-3.8 GHz
SSD: 2x 120 GB Raid-0
(Intel 320)
8. .2 June 2013
Implementation of high availability in our architecture
3 machines with a consensus on write… but in the same data center
API clients handled automatic retries in case of error
APPID-1.algolia.io, APPID-2.algolia.io, APPID-3.algolia.io
RAM: 64GB
Proc: 6 cores, 3.2-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel 320)
9. .3 August 2013
Official launch of the service
Two locations: Europe/West and Canada/East
Same provider but different network
equipment and power units (cheap multi-AZ)
10 API clients, developed manually
(https keep alive, using TLS correctly, retry strategy…)
RAM: 128GB
Proc: 8 cores, 3.1-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel S3500)
10. .4 January 2014
Deployment is a big risk for high availability
Agile development, 6000+ unit tests, 200+ non-regression tests…
But no instant rollback! Result: 8 minutes of indexing downtime ☂
From then on - start with test clusters
- instant rollback
11. .5 October 2014
Automation via Chef
Significant increase in managed machines
Shell Scripts -> Chef
Automation is great but s**t happens…
A typo in a cookbook nearly broke our prod!
From then on: 2 versions of the cookbooks
deployed to different servers of the same cluster
12. .6 November 2014
DNS is a SPOF in the architecture
Service was intermittently slow in Asia…
Culprit =.io TLD
Migration to .net TLD and a new DNS provider
Extensive testing but… nothing goes as planed!
☁ Black Thursday ☁
(see http://bit.ly/algoliablackthursday)
13. .7 February 2015
Launch of our synchronized worldwide infrastructure
8 new regions!
Low latency everywhere with automatic replication
12regions
16. .8 March 2015
Better high availability per region
Spread our US clusters across two
completely different providers
• 2 different data centers in close
locations (24 miles, 1ms latency)
• 3 different machines
• 2 completely different autonomous
systems
17. .9 May 2015
Introducing several DNS
providers
Retry strategy in API clients, again!
1. APPID-dsn.algolia.net
2. Retry randomly,
APPID-1.algolianet.com
APPID-2.algolianet.com
APPID-3.algolianet.com
18. .10 July 2015
Three completely independent
providers per cluster
With 2 providers we could still
loose indexing
Clusters spanning multiple data
centers, autonomous systems and
upstream providers.
19. .11 April 2016
Finer grained monitoring
Our monitoring was at the minute
granularity (with ServerDensity)
Moved to Wavefront to enable
drilling down at the second level (on
demand)
500 metrics/server monitored
20. .12 September 2016
Algolia Vault
Algolia’s response to security challenges of
larger organizations
Restrict API access to specific IP addresses
to get your own « private cloud »
Encryption at rest for sensible data, in
addition to encrypting all communications