Algolia's Fury Road to a Worldwide API - Take Off Conference 2016

•

0 j'aime•366 vues

From beta service to a worldwide distributed API. From 1 to over 2000 customers all around the world. This presentation takes you through 12 of the steps Algolia has been through to build and scale its hosted search API, always keeping in mind high availability and speed.

Technologie

ALGOLIA’S FURY ROAD TO A WORLDWIDE API
Build Unique Search Experiences
Olivier Lance
Solutions Engineer
olivier.lance@algolia.com
@olance
Take Off Conference 
2016

@algolia
A hosted search API
that focuses on Developer and User Experience

@algolia
With intuitive relevance
A hosted search API
From anywhereReplies in milliseconds

@algolia
Algolia Today
15regions 47data centers
2000+customers in 100+ countries
30B+ Write operationsper month
15B+ User-generated queriesper month

.1 March 2013
High Availability was designed…
but not implemented
A single machine in 2 diﬀerent locations:
Canada/East and Europe/West
Focus on performance, searching over indexing
First customer in prod
RAM: 32GB
Proc: 4 cores, 3.4-3.8 GHz
SSD: 2x 120 GB Raid-0
(Intel 320)

.2 June 2013
Implementation of high availability in our architecture
3 machines with a consensus on write… but in the same data center
API clients handled automatic retries in case of error
APPID-1.algolia.io, APPID-2.algolia.io, APPID-3.algolia.io
RAM: 64GB
Proc: 6 cores, 3.2-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel 320)

.3 August 2013
Oﬃcial launch of the service
Two locations: Europe/West and Canada/East
Same provider but diﬀerent network
equipment and power units (cheap multi-AZ)
10 API clients, developed manually
(https keep alive, using TLS correctly, retry strategy…)
RAM: 128GB
Proc: 8 cores, 3.1-3.8 GHz
SSD: 2x 300 GB Raid-0
(Intel S3500)

.4 January 2014
Deployment is a big risk for high availability
Agile development, 6000+ unit tests, 200+ non-regression tests…
But no instant rollback! Result: 8 minutes of indexing downtime ☂
From then on - start with test clusters
- instant rollback

.5 October 2014
Automation via Chef
Signiﬁcant increase in managed machines
Shell Scripts -> Chef
Automation is great but s**t happens…
A typo in a cookbook nearly broke our prod!
From then on: 2 versions of the cookbooks
deployed to diﬀerent servers of the same cluster

.6 November 2014
DNS is a SPOF in the architecture
Service was intermittently slow in Asia…
Culprit =.io TLD
Migration to .net TLD and a new DNS provider
Extensive testing but… nothing goes as planed!
☁ Black Thursday ☁
(see http://bit.ly/algoliablackthursday)

.7 February 2015
Launch of our synchronized worldwide infrastructure
8 new regions!
Low latency everywhere with automatic replication
12regions

@algolia
Distributed Search Network - Worldwide Synchronization

.8 March 2015
Better high availability per region
Spread our US clusters across two
completely diﬀerent providers
• 2 diﬀerent data centers in close
locations (24 miles, 1ms latency)
• 3 diﬀerent machines
• 2 completely diﬀerent autonomous
systems

.9 May 2015
Introducing several DNS
providers
Retry strategy in API clients, again!
1. APPID-dsn.algolia.net
2. Retry randomly,
APPID-1.algolianet.com
APPID-2.algolianet.com
APPID-3.algolianet.com

.10 July 2015
Three completely independent
providers per cluster
With 2 providers we could still
loose indexing
Clusters spanning multiple data
centers, autonomous systems and
upstream providers.

.11 April 2016
Finer grained monitoring
Our monitoring was at the minute
granularity (with ServerDensity)
Moved to Wavefront to enable
drilling down at the second level (on
demand)
500 metrics/server monitored

.12 September 2016
Algolia Vault
Algolia’s response to security challenges of
larger organizations
Restrict API access to speciﬁc IP addresses
to get your own « private cloud »
Encryption at rest for sensible data, in
addition to encrypting all communications

@algolia
Design early
Do not over engineer
Focus on execution
Building an HA architecture takes time

@algolia
THANK YOU!
QUESTIONS?
olivier.lance@algolia.com
Full version on Medium http://bit.ly/algoliafuryroad

Recommandé

Fury road to a worldwide API - API Days - December 2015Julien Lemoine

Algolia - Hosted Search API enterprisesearchmeetup

Meetup Angular.JS #12 ParisSylvain Utard

Algolia's Fury Road to a Worldwide APIPaul-Louis NECH

Elk meetupAsaf Yigal

Auditing data and answering the life long question, is it the end of the day ...Simona Meriam

Nielsen Presents: Fun with Kafka, Spark and Offset ManagementSimona Meriam

Spark logs made easySimona Meriam

Recommandé

Fury road to a worldwide API - API Days - December 2015Julien Lemoine

Algolia - Hosted Search API enterprisesearchmeetup

Meetup Angular.JS #12 ParisSylvain Utard

Algolia's Fury Road to a Worldwide APIPaul-Louis NECH

Elk meetupAsaf Yigal

Auditing data and answering the life long question, is it the end of the day ...Simona Meriam

Nielsen Presents: Fun with Kafka, Spark and Offset ManagementSimona Meriam

Spark logs made easySimona Meriam

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...HostedbyConfluent

Laravel and SOLRPeter Steenbergen

AWS as a code - using ansible serkancapkan

Getting started with Laravel & ElasticsearchPeter Steenbergen

Finding Cars and Hunting Down Logs - ElasticSearch @AutoScout24Philipp Garbe

How we use the play frameworkItai Gilo

Tracing Java Applications on AzureCodeOps Technologies LLP

Spark UDFs are EviL, Catalyst to the rEsCue!Adi Polak

A quick introduction to AWS Lambdaogeisser

Serverless Code Deployments in AWSMarko Tomic

Cassandra Summit 2014: Astyanax — To Be or Not To BeDataStax Academy

LINE NOW Scratch Card - From Nothing to Production in one monthLINE Corporation

Kibana overviewRinat Tainov

Crawlers com serverless @ Serverless Floripa - 1st commitRicardo Lima

Async streamsChristian Nagel

Atlassian Connect – Add Ons For Every Platform - Tanguy CrussonAtlassian

Serverless microservices in the wildRotem Tamir

Pie on AWSKuan Yen Heng

Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Berl...Codemotion

"Hacking" JIRA and Confluence Cloud Part 1 - Connect Your Apps - Travis SmithAtlassian

Security zones: adding or removing websitesjollymedal7131

Algolia @ProductTank Paris #13 - Dec 2014Gaëtan Gachet

Contenu connexe

Tendances

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...HostedbyConfluent

Laravel and SOLRPeter Steenbergen

AWS as a code - using ansible serkancapkan

Getting started with Laravel & ElasticsearchPeter Steenbergen

Finding Cars and Hunting Down Logs - ElasticSearch @AutoScout24Philipp Garbe

How we use the play frameworkItai Gilo

Tracing Java Applications on AzureCodeOps Technologies LLP

Spark UDFs are EviL, Catalyst to the rEsCue!Adi Polak

A quick introduction to AWS Lambdaogeisser

Serverless Code Deployments in AWSMarko Tomic

Cassandra Summit 2014: Astyanax — To Be or Not To BeDataStax Academy

LINE NOW Scratch Card - From Nothing to Production in one monthLINE Corporation

Kibana overviewRinat Tainov

Crawlers com serverless @ Serverless Floripa - 1st commitRicardo Lima

Async streamsChristian Nagel

Atlassian Connect – Add Ons For Every Platform - Tanguy CrussonAtlassian

Serverless microservices in the wildRotem Tamir

Pie on AWSKuan Yen Heng

Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Berl...Codemotion

"Hacking" JIRA and Confluence Cloud Part 1 - Connect Your Apps - Travis SmithAtlassian

Tendances (20)

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...

Laravel and SOLR

AWS as a code - using ansible

Getting started with Laravel & Elasticsearch

Finding Cars and Hunting Down Logs - ElasticSearch @AutoScout24

How we use the play framework

Tracing Java Applications on Azure

Spark UDFs are EviL, Catalyst to the rEsCue!

A quick introduction to AWS Lambda

Serverless Code Deployments in AWS

Cassandra Summit 2014: Astyanax — To Be or Not To Be

LINE NOW Scratch Card - From Nothing to Production in one month

Kibana overview

Crawlers com serverless @ Serverless Floripa - 1st commit

Async streams

Atlassian Connect – Add Ons For Every Platform - Tanguy Crusson

Serverless microservices in the wild

Pie on AWS

Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Berl...

"Hacking" JIRA and Confluence Cloud Part 1 - Connect Your Apps - Travis Smith

En vedette

Security zones: adding or removing websitesjollymedal7131

Algolia @ProductTank Paris #13 - Dec 2014Gaëtan Gachet

Advanced ad-opsShane Smith

Design PatternsSergio Ronchi

Praise explosion/ marathon de louangeBEATRICE LASSAIGNE

James Turner (Caplin) - Enterprise HTML5 Patternsakqaanoraks

Metodo d inferencia estadisticaCarlos fernando Mena Bonilla

Comercio internacionalmarcela_tinjaca

Assembly level languagePDFSHARE

Chronic pain: Role of tricyclic antidepressants, dolsulepinSudhir Kumar

WINK Calgary presents "Learn to love your money - basics of investing"Patty Auger, CA, CFP

Cecilia jaya larraga investigacionCecilia Jaya Larraga

Las teorías del liderazgoJoanny Ibarbia Pardo

Burning tongue webinar with Dr.Susan SklarWarren Blesofsky

Matemática proyecto de aulaSamuel López

Chatbots: It's like Uber for ConversationsLiam Boogar-Azoulay

En vedette (16)

Security zones: adding or removing websites

Algolia @ProductTank Paris #13 - Dec 2014

Advanced ad-ops

Design Patterns

Praise explosion/ marathon de louange

James Turner (Caplin) - Enterprise HTML5 Patterns

Metodo d inferencia estadistica

Comercio internacional

Assembly level language

Chronic pain: Role of tricyclic antidepressants, dolsulepin

WINK Calgary presents "Learn to love your money - basics of investing"

Cecilia jaya larraga investigacion

Las teorías del liderazgo

Burning tongue webinar with Dr.Susan Sklar

Matemática proyecto de aula

Chatbots: It's like Uber for Conversations

Similaire à Algolia's Fury Road to a Worldwide API - Take Off Conference 2016

03-03-2023 - APIForce (1).pdfAmir Khan

Triangle Devops Meetup 10/2015aspyker

OSOM Operations in the Cloudmstuparu

OSOM - Operations in the CloudMarcela Oniga

Geode Meetup Apacheconupthewaterspout

Event driven architectures with KinesisMark Harrison

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015Chris Jang

IBM: The Linux EcosystemKangaroot

Introduction to Google's Cloud TechnologiesChris Schalk

Devops - Microservice and KubernetesNodeXperts

Elastic Data Analytics Platform @DatadogC4Media

TechTalk_Cloud Performance Testing_0.6Sravanthi N

The Netflix Way to deal with Big Data ProblemsMonal Daxini

Night owl by Boyd Meyer of PROS Mark Kerzner

TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal

TiConf Australia 2013Jeff Haynie

Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Amazon Web Services

Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham

Flexible computePeter Clapham

From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent

Similaire à Algolia's Fury Road to a Worldwide API - Take Off Conference 2016 (20)

03-03-2023 - APIForce (1).pdf

Triangle Devops Meetup 10/2015

OSOM Operations in the Cloud

OSOM - Operations in the Cloud

Geode Meetup Apachecon

Event driven architectures with Kinesis

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

IBM: The Linux Ecosystem

Introduction to Google's Cloud Technologies

Devops - Microservice and Kubernetes

Elastic Data Analytics Platform @Datadog

TechTalk_Cloud Performance Testing_0.6

The Netflix Way to deal with Big Data Problems

Night owl by Boyd Meyer of PROS

TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...

TiConf Australia 2013

Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013

Sanger, upcoming Openstack for Bio-informaticians

Flexible compute

From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Install Stable Diffusion in windows machinePadma Pradeep

How to convert PDF to text with Nanonetsnaman860154

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Key Features Of Token Development (1).pptxLBM Solutions

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Understanding the Laravel MVC Architecture

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

SQL Database Design For Developers at php[tek] 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

A Domino Admins Adventures (Engage 2024)

Presentation on how to chat with PDF using ChatGPT code interpreter

Install Stable Diffusion in windows machine

How to convert PDF to text with Nanonets

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Benefits Of Flutter Compared To Other Frameworks

Scaling API-first – The story of a global engineering organization

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Human Factors of XR: Using Human Factors to Design XR Systems

Maximizing Board Effectiveness 2024 Webinar.pptx

Key Features Of Token Development (1).pptx

Azure Monitor & Application Insight to monitor Infrastructure & Application

08448380779 Call Girls In Civil Lines Women Seeking Men

Algolia's Fury Road to a Worldwide API - Take Off Conference 2016

1. ALGOLIA’S FURY ROAD TO A WORLDWIDE API Build Unique Search Experiences Olivier Lance Solutions Engineer olivier.lance@algolia.com @olance Take Off Conference  2016

2. @algolia A hosted search API that focuses on Developer and User Experience

4. @algolia With intuitive relevance A hosted search API From anywhereReplies in milliseconds

5. @algolia Algolia Today 15regions 47data centers 2000+customers in 100+ countries 30B+ Write operationsper month 15B+ User-generated queriesper month

6. @algolia

7. .1 March 2013 High Availability was designed… but not implemented A single machine in 2 diﬀerent locations: Canada/East and Europe/West Focus on performance, searching over indexing First customer in prod RAM: 32GB Proc: 4 cores, 3.4-3.8 GHz SSD: 2x 120 GB Raid-0 (Intel 320)

8. .2 June 2013 Implementation of high availability in our architecture 3 machines with a consensus on write… but in the same data center API clients handled automatic retries in case of error APPID-1.algolia.io, APPID-2.algolia.io, APPID-3.algolia.io RAM: 64GB Proc: 6 cores, 3.2-3.8 GHz SSD: 2x 300 GB Raid-0 (Intel 320)

9. .3 August 2013 Oﬃcial launch of the service Two locations: Europe/West and Canada/East Same provider but diﬀerent network equipment and power units (cheap multi-AZ) 10 API clients, developed manually (https keep alive, using TLS correctly, retry strategy…) RAM: 128GB Proc: 8 cores, 3.1-3.8 GHz SSD: 2x 300 GB Raid-0 (Intel S3500)

10. .4 January 2014 Deployment is a big risk for high availability Agile development, 6000+ unit tests, 200+ non-regression tests… But no instant rollback! Result: 8 minutes of indexing downtime ☂ From then on - start with test clusters - instant rollback

11. .5 October 2014 Automation via Chef Signiﬁcant increase in managed machines Shell Scripts -> Chef Automation is great but s**t happens… A typo in a cookbook nearly broke our prod! From then on: 2 versions of the cookbooks deployed to diﬀerent servers of the same cluster

12. .6 November 2014 DNS is a SPOF in the architecture Service was intermittently slow in Asia… Culprit =.io TLD Migration to .net TLD and a new DNS provider Extensive testing but… nothing goes as planed! ☁ Black Thursday ☁ (see http://bit.ly/algoliablackthursday)

13. .7 February 2015 Launch of our synchronized worldwide infrastructure 8 new regions! Low latency everywhere with automatic replication 12regions

14. @algolia Distributed Search Network - Worldwide Synchronization

15. @algolia Distributed Search Network - Worldwide Synchronization

16. .8 March 2015 Better high availability per region Spread our US clusters across two completely different providers • 2 different data centers in close locations (24 miles, 1ms latency) • 3 different machines • 2 completely different autonomous systems

17. .9 May 2015 Introducing several DNS providers Retry strategy in API clients, again! 1. APPID-dsn.algolia.net 2. Retry randomly, APPID-1.algolianet.com APPID-2.algolianet.com APPID-3.algolianet.com

18. .10 July 2015 Three completely independent providers per cluster With 2 providers we could still loose indexing Clusters spanning multiple data centers, autonomous systems and upstream providers.

19. .11 April 2016 Finer grained monitoring Our monitoring was at the minute granularity (with ServerDensity) Moved to Wavefront to enable drilling down at the second level (on demand) 500 metrics/server monitored

20. .12 September 2016 Algolia Vault Algolia’s response to security challenges of larger organizations Restrict API access to speciﬁc IP addresses to get your own « private cloud » Encryption at rest for sensible data, in addition to encrypting all communications

21. @algolia Design early Do not over engineer Focus on execution Building an HA architecture takes time

22. @algolia THANK YOU! QUESTIONS? olivier.lance@algolia.com Full version on Medium http://bit.ly/algoliafuryroad