SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Maintaining Spatial Data Infrastructures (SDIs)
using distributed task queues
Paolo Corti and Ben Lewis
Harvard Center for Geographic Analysis
2017 FOSS4G
Boston
Background
Harvard Center for Geographic Analysis
• WorldMap http://worldmap.harvard.edu
– Biggest GeoNode instance on the planet
– https://github.com/cga-harvard/cga-worldmap
• HHypermap http://hh.worldmap.harvard.edu
– Map service registry
– https://github.com/cga-harvard/HHypermap
Note
Billion Object Platform (BOP)
https://github.com/cga-harvard/hhypermap-bop
Demo of WorldMap / HHypermap
The need for an asynchronous processor
In WorldMap and HHypermap there are operations run by users which are
time consuming and cannot be handled in the context of a web request
● Harvest the metadata of a service and its layers
● Synchronize the metadata of a new or updated layer to the search
engine
● Feed a gazetteer when a new layer is uploaded or updated
● Upload a spatial datasets to the server
● Create a new layer using a table join
HTTP request/response cycle must be fast
● In web applications the HTTP
request/response cycle can be
synchronous as long as there are
very quick interactions between
the client and the server
● unfortunately there are cases
when the cycle become slower
● In these situations the best
practice for a web application is
to process asynchronously these
tasks using a task queue
Task Queues
Asynchronous processing in a web application can be
delegated to a task queue, which is a system for parallel
execution of tasks in a non-blocking fashion
Asynchronous processing model
Asynchronous processing model
● The asynchronous processing model is composed by services that
produce processing tasks (producers) and by services which
consume and process these tasks (consumers) accordingly
● A message queue is a broker which facilitates message passing by
providing a protocol or interface which other services can access.
Work can be distributed across threads or machines
● In the context of a web application the producer is the client
application that creates messages based on the user interaction. The
consumer is a daemon process that can consume the messages and
run the needed process
Glossary
● Task Queue: a system for parallel execution of tasks in a non-blocking
fashion
● Broker or Message Queue: provides a protocol or interface for messages
exchanging between different services and applications
● Producer: the code that places the tasks to be executed later in the broker
● Consumer or Worker: takes tasks from the broker and process them
● Exchange: takes a message from a producer and route it to zero or more
queues (messages routing)
Tasks must be consumed faster than being produced. If not, add more workers
Use cases for task queues
● in web applications some process is taking too much time
and must be processed asynchronously
● heterogeneous applications/services in a given system
architecture need an easy way to reliably communicate
between each other
● periodic operations (vs crontab)
● a way of parallelizing tasks in multi processors
● monitor processes and analyze failing tasks (and execute
them again)
Typical use cases for a task queue in a web application
● Thumbnails generation
● Sending bulk email
● Fetching large amounts of data from APIs
● Performing time-intensive calculations
● Expensive queries
● Search engine index synchronization
● Interaction with another application/service
● Replacing cron jobs (backups, maintenance, etc…)
Typical use cases for a task queue in a GIS Portal/SDI
● Upload a shapefile to the server (GeoNode)
● Thumbnails generation for layers and maps (GeoNode)
● OGC services harvesting (Harvard Hypermap)
● Geoprocessing operations
● Geospatial data maintenance
Producer, broker and consumer architecture
Producer
Consumer
Producer
Broker
Consumer
Producer
Broker
Consumer
Producer
Broker
Consumer
Producer
Message brokers implementations
Most of them are open source!
● RabbitMQ (AMQP, STOMP, JMS)
● Apache ActiveMQ (STOMP, JMS)
● Amazon Simple Queue Service (JMS)
● Apache Kafka
Several standard protocols:
● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)
Tasks (Jobs) queues implementations
● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper)
● Redis Queue (Redis)
● Resque (Redis)
● Kue (Redis)
And many others!
Celery
● asynchronous task queue based on distributed message
passing
● focused on real-time operation, but supports scheduling
as well
● the execution units, called tasks, are executed
concurrently on a single or more worker servers
● it supports many message brokers (RabbitMQ, Redis,
MongoDB, CouchDB, ...)
● written in Python but it can operate with other languages
● great integration with Django!
● great monitoring tools (Flower, django-celery-results)
RabbitMQ
● RabbitMQ is a message broker: it accepts and
forwards messages
● most widely deployed open source broker (35k+
deployments)
● support many message protocols
● supported by many operating systems and
languages
● Written in Erlang
Architecture of Celery/RabbitMQ
https://tests4geeks.com/python-celery-rabbitmq-tutorial/
A real use case: Harvard Hypermap
HHypermap (Harvard Hypermap)
Registry is a platform that manages
OWS, Esri REST, and other types of
map service harvesting, and
orchestration and maintains uptime
statistics for services and layers.
Where possible, layers are cached
by MapProxy.
HHypermap provides thousands of
remote layers to WorldMap users
Harvard
Hypermap
WorldMap
Architecture
HHypermap interface
Need for a task queue
SLOW!!!
Producer
Is the code that
places the tasks to
be executed later
in the broker
Celery messages
Consumer
Takes tasks from
the broker and
process them in a
worker
Replacing cron jobs
Replacing cron jobs
Workers and threads with htop
Monitoring
Monitoring a task
Thanks!
Question and Answer

Contenu connexe

Tendances

Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Shubham Tagra
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...Flink Forward
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
gRPC Design and Implementation
gRPC Design and ImplementationgRPC Design and Implementation
gRPC Design and ImplementationVarun Talwar
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Flink Forward
 
Dynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationDynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationAnamika Gupta
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYEKnoldus Inc.
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkKnoldus Inc.
 

Tendances (20)

Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
gRPC Design and Implementation
gRPC Design and ImplementationgRPC Design and Implementation
gRPC Design and Implementation
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
 
Dynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationDynamo db and Cross Region Migration
Dynamo db and Cross Region Migration
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYE
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
 

Similaire à Maintaining SDIs Using Distributed Task Queues

Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
HBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCHBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCmlai
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
Cloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemCloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemLukas Forer
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Globus
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftRX-M Enterprises LLC
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native EnvironmentsWSO2
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stackLuca Mattia Ferrari
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 

Similaire à Maintaining SDIs Using Distributed Task Queues (20)

Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
HBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCHBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYC
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Cloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemCloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management System
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
Flour
FlourFlour
Flour
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stack
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 

Plus de Paolo Corti

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019Paolo Corti
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Paolo Corti
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructurePaolo Corti
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016Paolo Corti
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitariePaolo Corti
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Paolo Corti
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Paolo Corti
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionPaolo Corti
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...Paolo Corti
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Paolo Corti
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Paolo Corti
 

Plus de Paolo Corti (13)

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data Infrastructure
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze Umanitarie
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk Reduction
 
Geonode 2.0
Geonode 2.0Geonode 2.0
Geonode 2.0
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1
 

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Maintaining SDIs Using Distributed Task Queues

  • 1. Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues Paolo Corti and Ben Lewis Harvard Center for Geographic Analysis 2017 FOSS4G Boston
  • 2. Background Harvard Center for Geographic Analysis • WorldMap http://worldmap.harvard.edu – Biggest GeoNode instance on the planet – https://github.com/cga-harvard/cga-worldmap • HHypermap http://hh.worldmap.harvard.edu – Map service registry – https://github.com/cga-harvard/HHypermap
  • 3. Note Billion Object Platform (BOP) https://github.com/cga-harvard/hhypermap-bop
  • 4. Demo of WorldMap / HHypermap
  • 5.
  • 6. The need for an asynchronous processor In WorldMap and HHypermap there are operations run by users which are time consuming and cannot be handled in the context of a web request ● Harvest the metadata of a service and its layers ● Synchronize the metadata of a new or updated layer to the search engine ● Feed a gazetteer when a new layer is uploaded or updated ● Upload a spatial datasets to the server ● Create a new layer using a table join
  • 7. HTTP request/response cycle must be fast ● In web applications the HTTP request/response cycle can be synchronous as long as there are very quick interactions between the client and the server ● unfortunately there are cases when the cycle become slower ● In these situations the best practice for a web application is to process asynchronously these tasks using a task queue
  • 8. Task Queues Asynchronous processing in a web application can be delegated to a task queue, which is a system for parallel execution of tasks in a non-blocking fashion
  • 10. Asynchronous processing model ● The asynchronous processing model is composed by services that produce processing tasks (producers) and by services which consume and process these tasks (consumers) accordingly ● A message queue is a broker which facilitates message passing by providing a protocol or interface which other services can access. Work can be distributed across threads or machines ● In the context of a web application the producer is the client application that creates messages based on the user interaction. The consumer is a daemon process that can consume the messages and run the needed process
  • 11. Glossary ● Task Queue: a system for parallel execution of tasks in a non-blocking fashion ● Broker or Message Queue: provides a protocol or interface for messages exchanging between different services and applications ● Producer: the code that places the tasks to be executed later in the broker ● Consumer or Worker: takes tasks from the broker and process them ● Exchange: takes a message from a producer and route it to zero or more queues (messages routing) Tasks must be consumed faster than being produced. If not, add more workers
  • 12. Use cases for task queues ● in web applications some process is taking too much time and must be processed asynchronously ● heterogeneous applications/services in a given system architecture need an easy way to reliably communicate between each other ● periodic operations (vs crontab) ● a way of parallelizing tasks in multi processors ● monitor processes and analyze failing tasks (and execute them again)
  • 13. Typical use cases for a task queue in a web application ● Thumbnails generation ● Sending bulk email ● Fetching large amounts of data from APIs ● Performing time-intensive calculations ● Expensive queries ● Search engine index synchronization ● Interaction with another application/service ● Replacing cron jobs (backups, maintenance, etc…)
  • 14. Typical use cases for a task queue in a GIS Portal/SDI ● Upload a shapefile to the server (GeoNode) ● Thumbnails generation for layers and maps (GeoNode) ● OGC services harvesting (Harvard Hypermap) ● Geoprocessing operations ● Geospatial data maintenance
  • 15. Producer, broker and consumer architecture Producer Consumer Producer Broker Consumer Producer Broker Consumer Producer Broker Consumer Producer
  • 16. Message brokers implementations Most of them are open source! ● RabbitMQ (AMQP, STOMP, JMS) ● Apache ActiveMQ (STOMP, JMS) ● Amazon Simple Queue Service (JMS) ● Apache Kafka Several standard protocols: ● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)
  • 17. Tasks (Jobs) queues implementations ● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper) ● Redis Queue (Redis) ● Resque (Redis) ● Kue (Redis) And many others!
  • 18. Celery ● asynchronous task queue based on distributed message passing ● focused on real-time operation, but supports scheduling as well ● the execution units, called tasks, are executed concurrently on a single or more worker servers ● it supports many message brokers (RabbitMQ, Redis, MongoDB, CouchDB, ...) ● written in Python but it can operate with other languages ● great integration with Django! ● great monitoring tools (Flower, django-celery-results)
  • 19. RabbitMQ ● RabbitMQ is a message broker: it accepts and forwards messages ● most widely deployed open source broker (35k+ deployments) ● support many message protocols ● supported by many operating systems and languages ● Written in Erlang
  • 21. A real use case: Harvard Hypermap HHypermap (Harvard Hypermap) Registry is a platform that manages OWS, Esri REST, and other types of map service harvesting, and orchestration and maintains uptime statistics for services and layers. Where possible, layers are cached by MapProxy. HHypermap provides thousands of remote layers to WorldMap users
  • 24. Need for a task queue SLOW!!!
  • 25. Producer Is the code that places the tasks to be executed later in the broker
  • 27. Consumer Takes tasks from the broker and process them in a worker
  • 30. Workers and threads with htop