SlideShare une entreprise Scribd logo
1  sur  60
Enabling Efficient Cloud Workflows
Eugene Cheipesh
echeipesh@azavea.com
GitHub: @echeipesh
www.azavea.com
Cloud Native Workflows
“Hey, let's not download the whole file every time.”
Cloud Native
● Network is between storage and computation
● Storage is chunked and probably Key/Value
● Computation must scale
● Coordination is limited and risky
Cloud Optimized GeoTiff
● Defines an internal structure of GeoTiffs that is
optimized for streaming reading from blob
storage like S3 or Google Cloud Storage.
● That is, it’s best utilized for requests that use
HTTP GET Range Requests
https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF
http://www.cogeo.org/
“A GeoTIFF file is a TIFF 6.0 file, and
inherits the file structure as described
in the corresponding portion of the
TIFF spec. All GeoTIFF specific
information is encoded in several
additional reserved TIFF tags, and
contains no private Image File
Directories (IFD's), binary structures
or other private information invisible
to standard TIFF readers.”
http://geotiff.maptools.org/spec/geotiff2.4.html#2.4
GeoTiff
TIFF
● Tagged Image File Format
● Binary image is separated out into
“segments”
● Contains metadata in a sequence of “tags”
● Can contain metadata for multiple images in
different “image file directories” (IDFs)
GeoTiff Custom Tags
● GeoTiff adds custom tag GeoKeys
○ non-TIFF keys for CRS map space etc.
● This is how we know where the pixels are on
the surface of the earth (the geospatial bit).
http://geotiff.maptools.org/spec/geotiff6.html#6.1
GeoTiff Streaming Read
● Step 1: Fetch the header
● Step 2: Decide what segments you want
● Step 3: Read the segments
Request: BBOX Mean
GeoTIFF: All of the above
COG: IFDs First - Rule #1
TIFF Segments
● Chunks of an image are stored at different
locations. This allows for parts of the image to
be decompressed and loaded separately.
● There are two methods for segment layout:
Striped and Tiled
TIFF: Strip Segments
COG: Strip Segments - Bad
TIFF: Tile Segments
COG: Use Tiled Segments - Rule #2
Request: BBOX Mean at 60m
Request: BBOX Mean at 60m
GeoTIFF: Has Overviews
COG: Provide Overviews - Rule #3
COG Rules
● Put IFDs first
● Sort IFDs by resolution
● Use Tiled segments
● Include overviews (in file)
COG Rules Continued ?
● Provenance Metadata
● Include Summary Statistics
● File naming convention
● Catalog suggestion
COG “Profiles”: Slippy Map
● Restrict CRS as EPSG:3857
● Align GeoTiff segments with slippy tile grid
● Align GeoTiff files with tile boundaries
● Serve tile endpoints directly from COG
COG World Coverage?
“You’re going to need more than one GeoTIFF.”
s3://[bucket]/[Shard]_[GeoHash]_[ISO_TIME]_[ID].tiff
GeoTrellis & COGs
COG: Tile Request
COG Tiler
● Application specific catalog
● Reprojection on read
● Memache gives responsiveness
● Raster metadata pushed into files
GeoTrellis Layers
GeoTrellis: Avro Layers
GeoTrellis Avro Layers
● Many small files, one per tile
○ PUT requests get expensive
● Data duplication hazzard
○ Probably not authoritative
● Limited access
○ Requires GeoTrellis code to read
GeoTrellis: COG Layers
GeoTrellis Strucuted COG
● Base zoom tiles are in base GeoTiff layer
● Introduces “DATE_TIME” tag
● Includes VRT for layer files
● GeoTiff segments match tile size and layout
● Additional zoom levels stored in overviews
● Pyramid up when overviews “run out”
GeoTrellis: COG Pyramid
Unstructured COGs
Cloud Native thinking
GeoTrellis Unstructured COG
● Lots of good raster sources are already COG
● They don’t have a strict layout though
● Lets “inline” ingest process into the batch job
● Slower but reduces data duplication
COG: Tile Request (Again)
GeoTrellis Unstructured COG
● Requires mechanism to query for raster names
○ WFS, STAC, PostgreSQL, Scan
● COG provies “gradual” wins over GeoTiff
● Reproject on read to match workflow
Coregistration Workflow
● Working with multiple raster layers with differing:
○ CRS, Resolution, Extent
● Must pick the working/target layout
● We can “push-down” this process into IO
Coregisteration Workflow
COGs for GeoTrellis
● Treating GeoTiff as a Key/Value tile store
○ With built in notion level of detail
● Turns an ingest workflow into query workflow
○ Use the metadata to plan the read
● Opens up the results
○ QGIS, GeoServer, GDAL
COGs in General
● GeoJSON of raster formats
● Gradated spec
● Set of best practices
● Driven by implementations
THANKS
Q: Is this a good idea ?
Q: So is this a standard ?
Q: Why not use MRF ?

Contenu connexe

Tendances

Terraform GitOps on Codefresh
Terraform GitOps on CodefreshTerraform GitOps on Codefresh
Terraform GitOps on CodefreshCodefresh
 
stupid-simple-kubernetes-final.pdf
stupid-simple-kubernetes-final.pdfstupid-simple-kubernetes-final.pdf
stupid-simple-kubernetes-final.pdfDaniloQueirozMota
 
NGINX Ingress Controller for Kubernetes
NGINX Ingress Controller for KubernetesNGINX Ingress Controller for Kubernetes
NGINX Ingress Controller for KubernetesNGINX, Inc.
 
Prometheus on NKS
Prometheus on NKSPrometheus on NKS
Prometheus on NKSJo Hoon
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)Weaveworks
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleJen Aman
 
Processing Rasters from Satellites, Drones, & More
Processing Rasters from Satellites, Drones, & MoreProcessing Rasters from Satellites, Drones, & More
Processing Rasters from Satellites, Drones, & MoreSafe Software
 
Introduction to Kubernetes and GKE
Introduction to Kubernetes and GKEIntroduction to Kubernetes and GKE
Introduction to Kubernetes and GKEOpsta
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps_Fest
 
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019Amazon Web Services Korea
 
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018Amazon Web Services Korea
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless SolutionRyan ZhangCheng
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar
 
Chcnav moblie mapping solution2
Chcnav moblie mapping solution2Chcnav moblie mapping solution2
Chcnav moblie mapping solution2GeoMedeelel
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composerBruce Kuo
 

Tendances (18)

Terraform GitOps on Codefresh
Terraform GitOps on CodefreshTerraform GitOps on Codefresh
Terraform GitOps on Codefresh
 
stupid-simple-kubernetes-final.pdf
stupid-simple-kubernetes-final.pdfstupid-simple-kubernetes-final.pdf
stupid-simple-kubernetes-final.pdf
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
NGINX Ingress Controller for Kubernetes
NGINX Ingress Controller for KubernetesNGINX Ingress Controller for Kubernetes
NGINX Ingress Controller for Kubernetes
 
Prometheus on NKS
Prometheus on NKSPrometheus on NKS
Prometheus on NKS
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
 
Processing Rasters from Satellites, Drones, & More
Processing Rasters from Satellites, Drones, & MoreProcessing Rasters from Satellites, Drones, & More
Processing Rasters from Satellites, Drones, & More
 
Introduction to Kubernetes and GKE
Introduction to Kubernetes and GKEIntroduction to Kubernetes and GKE
Introduction to Kubernetes and GKE
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
 
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019
Amazon EKS를 활용한 기계 학습 모델 서버 확장하기 - 유홍근, LG전자 :: AWS Summit Seoul 2019
 
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018
비용 관점에서 AWS 클라우드 아키텍처 디자인하기::류한진::AWS Summit Seoul 2018
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with Kubernetes
 
Chcnav moblie mapping solution2
Chcnav moblie mapping solution2Chcnav moblie mapping solution2
Chcnav moblie mapping solution2
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 

Similaire à Cloud Optimized GeotTIFFs: enabling efficient cloud workflows

GeoServer on Steroids
GeoServer on Steroids GeoServer on Steroids
GeoServer on Steroids GeoSolutions
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloudlohitvijayarenu
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupKaxil Naik
 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServerJody Garnett
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2Kaxil Naik
 
GeoServer on steroids
GeoServer on steroidsGeoServer on steroids
GeoServer on steroidsGeoSolutions
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016GeoSolutions
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container imagesAkihiro Suda
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integrationnguyenfilip
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
 

Similaire à Cloud Optimized GeotTIFFs: enabling efficient cloud workflows (20)

Git Presentation
Git PresentationGit Presentation
Git Presentation
 
GeoServer on Steroids
GeoServer on Steroids GeoServer on Steroids
GeoServer on Steroids
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServer
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
GeoServer on steroids
GeoServer on steroidsGeoServer on steroids
GeoServer on steroids
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016
 
week1slides1704202828322.pdf
week1slides1704202828322.pdfweek1slides1704202828322.pdf
week1slides1704202828322.pdf
 
Git-Basics
Git-BasicsGit-Basics
Git-Basics
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integration
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 

Dernier

Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 

Dernier (20)

Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 

Cloud Optimized GeotTIFFs: enabling efficient cloud workflows

Notes de l'éditeur

  1. This is original formalization of the idea by Even Rouault, its authoritative. Includes benchmarks using GDAL showing the difference in range read speeds.
  2. This is site managed by Chris Holmes, it is better overview and critical includes links to implementations and tools relevant to COG.
  3. TIFF header. Arrows are offsets in the TIFF file No order is implied Header is minimal does not have image information, IDFs have image information
  4. All of these are valid structures for TIFF. Left, easy to read, all the headers are at the top. (IDF should be seen as headers) Center, looks good but not clear what the advantage of this organization is. Right, easy to write if you don’t know when the image stream will end.
  5. Only the left organization is good for us because we get all the relevant header information at the top.
  6. COG spec is not firm about compression, deflate is common but slow, LZW is probably the most conservative choice. In any case the choice must be something that is understood by GDAL across all installs to preserve interoperability.
  7. In reality strips can even be one or two pixels high for large files.
  8. Fetching larger AOI will require a lot of IO time and network traffic
  9. Using overviews when downsampled information is sufficient, it very often is, saves the IO.
  10. Provenance is a huge problem, temptation is to use TIFF tags to keep track, but no standard exists yet. Histograms are super useful if you ever intend to view the file File naming convention seems tempting but impossible to enforce and thus expect Catalogs are the only other option
  11. This COG profile optimizes the IO to 1 request = 1 segment No reason to require this for all COGs Conclusion is that there is room to optimize COG layout for each system or application
  12. STAC static files provide reliant way to serve some index which is described using GeoJSON files REST and WFS 3.0 Services on top of static index provide an integration point for application
  13. You can play games with the names of the GeoTIFF files to come up with a scheme of finding the file we want only from request information, ID and geospatial bbox. You should consider sharding the scheme to avoid hotspotting your Blob store like S3. (See S3 best practices). People from GeoWave/GeoMesa/GeoTrellis project will have a lot of lessons about building static index like this that they would be happy to share. Also there are reusable components available from those projects.
  14. RasterFoundry is the raster management and manipulation system built on top of GeoTrellis.
  15. Level 0 support for COGs is serving tiles. RF does this using GeoTrellis.
  16. Tile request from RasterFoundry. Catalog is application specific COG Header is hot information, must be cached first Fetching COG segments may be too slow for quick screen refreshes
  17. An alternative view of tile request. Highlight here is that we’re reprojecting the COG on the fly per tile request.
  18. GeoTrellis layers are the constructs that represent a distributed raster collection that we can work over in cluster memory.
  19. Avro layers work really great on sorted stores like Accumulo but on S3 we end up with lots of small files.
  20. COG becomes our meta tile, its great. As a bonus the layers are now accessible from non GeoTrellis applications. This has potential to drastically reduce the data duplication.
  21. Note we had to add the DATE_TIME tag to keep track of the actual time. VRT files tie them together and make them accessible.
  22. There is only so much you can resample a set of tiles before you don’t have enough information to save At that point you need to collect the tops of the pyramid and reshuffle them into higher levels GeoTrellis handles this process when it writes out pyramided COG layers.
  23. Snapshot from OSM / RasterFrames / GeoTrellis machine learning project Best part is that now every layer written by GeoTrellis is accessible through QGIS This is a workflow where we combine source data, produces GeoTrellis layer and published OSM layer
  24. Thinking about COGs as mini-image databases
  25. We can put tile reading process in front of a distributed batch job. Instead of running it per request we run it per record. This effectively inlines the ingest process into the batch job itself.
  26. Same concerns that we talked about for making viewable COG catalogs are in place for making distributed imagery piplines sourcing COGs.
  27. Often we need more than one layer to do something interesting
  28. Fetch the metadata from the catalog to find what we want Fetch the metadata from the COGs to find out what it is Filter the potential IO using the COG headers Fetch the segments as tiles we know we need in a way we want
  29. Proof is in the pudding This GeoTrellis Landsat tiler coregisters scenes from multiple CRCs Reponsivness here gives us the idea that cost in a batch process is actually minimal.