This document discusses new features and approaches for Lucene/Solr spatial search in 2015. It summarizes new capabilities like heatmaps, GeoJSON support, and more accurate indexed geometries. It also covers new approaches like using dimensional values indexes for spatial data and a new GeoPointField. Some pending spatial TODOs are outlined, like JTS-free polygon support in Spatial4j and Geo3D point field adapters. The document concludes by providing contact information for the author to discuss Lucene/Solr guidance or custom development needs.
2. 2
About David Smiley
Freelance Search Developer/Consultant
Expert Lucene/Solr development skills,
advise (consulting), training
Java, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC member
Primary author of “Apache Solr Enterprise Search Server”
3. 3
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
5. Lucene’s Spatial Module
• Multiple approaches to index spatial data
abstract class SpatialStrategy
(5+ concrete implementations)
• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile
• Grid based
• Uses Spatial4j lib for shapes, distance calculations, and WKT
• Uses JTS Topology Suite lib for polygons
Shape
SpatialPrefixTree / Cell PrefixTreeStrategy
IntersectsPrefixTreeFilter
Contains…
Within…Geohash | Quad
6. 6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr
Surface-of-sphere shapes (Geo3d) — Lucene
Accurate indexed geometries — Lucene, Solr
GeoJSON read/write — Spatial4j
7. 7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting,
also useful for point-plotting search results
Usually rendered with a gradient radius
Lucene & Solr APIs
Scalable & fast usually…
v5.2
8. 8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based
Algorithm enumerates the underlying cell/terms and accumulates
the counter in a corresponding grid
Conceptually facet.method=enum for spatial
Works on non-point indexed shapes too
Complexity: O(cells * cellDepthFactor) not O(docs)
No/low memory; mainly the grid of integers
Solr will distribute to shards and merge
Could be faster still; a BFS (vs DFS) layout would be perfect
9. 9
Solr Heatmap Faceting
On an RPT field
(SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad” (optional)
Query:
/select?facet=true
&facet.heatmap=geo_rpt
&facet.heatmap.geom=
["-180 -90" TO "180 90”]
facet.heatmap.format=ints2D or png
// Normal Solr response...
"facet_counts":{
... // facet response fields
"facet_heatmaps":{
"loc_srpt":[
"gridLevel",2,
"columns",32,
"rows",32,
"minX",-180.0,
"maxX",180.0,
"minY",-90.0,
"maxY",90.0,
"counts_ints2D", [null, null, [0, 0, ... ]]
...
11. 11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis
Not a general 3D space geometry lib
Internally uses geocentric X, Y, Z coordinates (hence 3D) with
3D planar geometry mathematics
Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString)
with optional buffer
Distance computations: Arc (angular or surface), Linear (straight-
line), Normal
12. 12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies
path from
Anchorage to
Miami doesn’t
actually cross the
ocean!
13. 13
Geo3D, continued…
Benefits
Inherently more accurate than 2D projected spatial
especially for big shapes or near poles
Many computations are fast; no expensive trigonometry
An alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar file
Maven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
In progress!
14. 14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape
wrapper with RPT
In Lucene-spatial for now
Index Geo3d shapes
Limited to grid accuracy
Query by Geo3d shape
Limited distance sort
Heatmaps
Geo3DPointField &
PointInGeo3DShapeQuery
Based on a 3D BKD index
In spatial3d module
Index points-only
Query by Geo3d shape
No distance sort
Leaner & faster than RPT?
v5.4v5.2
15. 15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
Thus represents shapes as grid cells of varying precision by
prefix
Example, a point shape:
D, DR, DRT, DRT2, DRT2Y
More accuracy scales
Example, a polygon shape:
Too many to list… 508 cells
More accuracy does NOT scale
16. 16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)
SDV (SerializedDVStrategy) stores serialized geometry (accurate)
RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexes
Optimized intersects predicate avoids some geometry checks
> 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialField
Compatible with the Heatmaps feature
Includes a shape cache (per-segment); configurable
v5.2
18. New Lucene index type for numeric values
Including multi-dimensional values!
Old: IntField, FloatField etc., trie indexing is now legacy
New: DimensionalIntField, DimensonalFloatField, etc. with
DimensionalRangeQuery, …
Implemented using a BKD Index
Paper: https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
Much faster and compact than trie/prefix-tree based indexes
Wither term auto-prefixing? LUCENE-5879 Defunct?
v6.0
DimensionalValues (BKD Index)
19. 19
Multiple Fields/Queries using this:
(1D) DimensionalIntField
(2D) DimensionalLatLonField
(3D) Geo3DPointField (previously described)
And you can write your own
…continued
20. 20
Efficient range search on single/multi-valued numbers or terms
Could be used for numbers, dates, IPV6 bytes, …
Alternatives: LegacyIntField etc. (trie), DateRangeField (RPT)
Would love to see a benchmark!
How-To:
Dimensional___Field: Int, Long, Float, Double, Binary
DimensionalRangeQuery (or DimensionalQuery?)
v5.3
DimensionalValues 1D
21. 21
Efficient 2D geospatial point index
Alternative to RPT or GeoPointField
In lucene-sandbox
No Lucene-spatial module SpatialStrategy wrappers yet, thus no Spatial4j
Shape integration nor Solr integration yet
How-To:
Index: DimensionalLatLonField
Query:
DimensionalPointInBBoxQuery
DimensionalPointInPolygonQuery
point-radius (circle) — in-progress LUCENE-6698
v5.3
DimensionalValues 2D: DimensionalLatLonField
Cool video: https://www.youtube.com/watch?v=x9WnzOvsGKs
22. 22
GeoPointField
2D geospatial point field
Indexed point-only data, single/multi-valued
Spatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPT
Configurable 2x grid size (defaults to 512)
Compact bit interleaved Z-order encoding
Re-uses much of Lucene’s numeric precisionStep &
MultiTermQuery logic
2-phase grid/postings then doc-values algorithm
v5.3
23. …continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy
No Heatmaps, No custom Shape implementations
No Solr support yet
No dependencies
Easy to use compared to RPT; simpler internally too
How-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES))
GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or
GeoPointInPolygonQuery or GeoPointDistanceRangeQuery
Cool video: https://www.youtube.com/watch?v=l2zB9TDUAL4
24. 24
Topic: Some Pending Spatial TODOs
Spatial4j
JTS-free polygon API
(in-progress)
Geo3D adapter
Lucene
FlexPrefixTree — LUCENE-4922
Heatmap optimized FlexPrefixTree
(Breadth First Search layout)
SpatialStrategy adapters for
GeoPointField, DimensionalLatLonField,
Geo3DPointField
Solr
Better spatial Solr QParsers —
SOLR-4242
GeoJSON parsing
More FieldType adapters for
latest Lucene spatial
Nearest-neighbor search
DateRangeField faceting
25. 25
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apache.org
LinkedIn: http://www.linkedin.com/in/davidwsmiley
G+: +DavidSmiley
Twitter: @DavidWSmiley
Notes de l'éditeur
There was a “hit by a bus” syndrome until now.
I’m going to be presenting a lot of stuff I did not work on.
And list what this talk is *not*. Not a spatial overview
Also, “Spaceman Steve” is a freelancer offering to do heatmap and other Solr/Geo work.
Thanks to Karl Wright (Nokia/HERE)!
The only surface-of-sphere shape supported prior to Geo3D was a circle.
From https://www.reddit.com/r/MapPorn/comments/1p8dba/you_can_theoretically_drive_in_a_straight_line/
But don’t harp on this too much; 2D spatial is still useful.
Geo3dShape w/ RPT more flexible
Geo3DPointField is new & faster; more to come
Neither have Solr support yet.
Suggest QuadPrefixTree for non-point indices like this.
Also: Supports most spatial predicates
Theoretically could work well for point-data too; I haven’t tried.
This is for 6.0. Some BKD versions existed in recent 5.x releases in lucene-sandbox
Will include non-range (exact lookup) optimization / API convenience.
Unknown if this is faster for date ranges than DateRangePrefixTree. Likely smaller indices.
Will point-radius (circle) have a flat and surface-of-sphere version?
Perf? Likely faster than RPT. Indexes certainly via configuration of a high precisionStep cheaper than Quad or even GeoHash too
No promises!
Some of these are new, brought on by new features. (e.g. Lucene then Solr adapter).
This list is biased to my interests/awareness.