SlideShare une entreprise Scribd logo
1  sur  101
Advanced Procedural Rendering
in DirectX 11

Matt Swoboda
Principal Engineer, SCEE R&D PhyreEngine™ Team
Demo Coder, Fairlight
Aim
●   More dynamic game worlds.
Demoscene?
make demos
   ●   “Like games, crossed with music videos”


inear, non-interactive, scripted

ll generated in real-time
   ●   On consumer-level PC hardware
DirectX 11?
irectX 9 is very old
   ●   We are all very comfortable with it..
   ●   .. But does not map well to modern graphics hardware

irectX 11 lets you use same hardware smarter
   ●   Compute shaders
   ●   Much improved shading language
   ●   GPU-dispatched draw calls
   ●   .. And much more
Procedural Mesh Generation


A reasonable result from random formulae
          (Hopefully a good result from sensible formulae)
Signed Distance Fields (SDFs)

istance function:
  ●   Returns the closest distance to the surface from a
      given point


igned distance function:
  ●   Returns the closest distance from a point to the
      surface, positive if the point is outside the shape and
      negative if inside
Signed Distance Fields
●   Useful tool for procedural geometry creation
    ●   Easy to define in code ..
    ●   .. Reasonable results from “random formulae”
●   Can create from meshes, particles, fluids, voxels
●   CSG, distortion, repeats, transforms all easy
●   No concerns with geometric topology
    ●   Just define the field in space, polygonize later
A Box
Box(pos, size)
{
  a = abs(pos-size) - size;
  return max(a.x,a.y,a.z);
}




                              *Danger: may not actually compile
Cutting with Booleans
d = Box(pos)
c = fmod(pos * A, B)
subD = max(c.y,min(c.y,c.z))
d = max(d, -subD)
More Booleans
d = Box(pos)
c = fmod(pos * A, B)
subD = max(c.y,min(c.y,c.z))
subD = min(subD,cylinder(c))
subD = max(subD, Windows())
d = max(d, -subD)
Repeated Booleans
d = Box(pos)
e = fmod(pos + N, M)
floorD = Box(e)
d = max(d, -floorD)
Cutting Holes
d = Box(pos)
e = fmod(pos + N, M)
floorD = Box(e)
floorD = min(floorD,holes())
d = max(d, -floorD)
Combined Result
d = Box(pos)
c = fmod(pos * A, B)
subD = max(c.y,min(c.y,c.z))
subD = min(subD,cylinder(c))
subD = max(subD, Windows())
e = fmod(pos + N, M)
floorD = Box(e)
floorD = min(floorD,holes())
d = max(d, -subD)
d = max(d, -floorD)
Repeating the Space
pos.y = frac(pos.y)
d = Box(pos)
c = fmod(pos * A, B)
subD = max(c.y,min(c.y,c.z))
subD = min(subD,cylinder(c))
subD = max(subD, Windows())
e = fmod(pos + N, M)
floorD = Box(e)
floorD = min(floorD,holes())
d = max(d, -subD)
d = max(d, -floorD)
Repeating the Space
pos.xy = frac(pos.xy)
d = Box(pos)
c = fmod(pos * A, B)
subD = max(c.y,min(c.y,c.z))
subD = min(subD,cylinder(c))
subD = max(subD, Windows())
e = fmod(pos + N, M)
floorD = Box(e)
floorD = min(floorD,holes())
d = max(d, -subD)
d = max(d, -floorD)
Details
AddDetails()
Details
DoLighting()
ToneMap()
Details
AddDeferredTexture()
AddGodRays()
Details
MoveCamera()
MakeLookGood()




Ship It.
Procedural SDFs in Practice

enerated scenes probably won’t replace 3D artists
Procedural SDFs in Practice

enerated scenes probably won’t replace 3D artists




                        
Procedural SDFs in Practice

enerated scenes probably won’t replace 3D artists

enerated SDFs good proxies for real meshes
  ●   Code to combine a few primitives cheaper than art data


ombine with artist-built meshes converted to SDFs
  ●   Boolean, modify, cut, distort procedurally
Video
●   (Video Removed)
●   (It’s a cube morphing into a mesh. You know, just for fun etc.)
SDFs From Triangle Meshes
SDFs from Triangle Meshes

onvert triangle mesh to SDF in 3D texture
   ●   32^3 – 256^3 volume texture typical
   ●   SDFs interpolate well..           bicubic interpolation
   ●   .. Low resolution 3D textures still work well
   ●   Agnostic to poly count (except for processing time)


an often be done offline
SDFs from Triangle Meshes




         A mesh converted to a 64x64x64 SDF and polygonised.
                It’s two people doing yoga, by the way.
SDFs from Triangle Meshes

aïve approach?
 ●       Compute distance from every cell to every triangle
 ●       Very slow but accurate


oxelize mesh to grid, then sweep?  UGLY
     ●    Sweep to compute signed distance from voxels to cells
     ●    Voxelization too inaccurate near surface..
     ●    ..But near-surface distance is important - interpolation
Geometry Stages
●   Bind 3D texture target
●   VS transforms to SDF space
●   Geometry shader replicates
    triangle to affected slices
    ●   Flatten triangle to 2D
    ●   Output positions as TEXCOORDs..
    ●   .. All 3 positions for each vertex
Pixel Shader Stage

alculates distance from 3D pixel to triangle
   ●   Compute closest position on triangle
   ●   Evaluate vertex normal using barycentric

valuate distance sign using weighted normal

rite signed distance to output color, distance to depth

epth test keeps closest distance
Post Processing Step

ells around mesh surface now contain accurate
signed distance

est of grid is empty

ill out rest of the grid in post process CS

ast Sweeping algorithm
Fast Sweeping
●       Requires ability to read      d = maxPossibleDistance
        and write same buffer         for i = 0 to row length
●       One thread per row                  d += cellSize
    ●    Thread R/W doesn’t overlap         if(abs(cell[i]) > abs(d))
    ●    No interlock needed                       cell[i] = d
●       Sweep forwards then                 else
        backwards on same axis                     d = cell[i]
●       Sweep each axis in turn
SDFs from Particle Systems
SDFs From Particle Systems
●   Naïve: treat each particle as a sphere
        ●   Compute min distance from point to particles
●   Better: use metaball blobby equation
    ●   Density(P) = Sum[ (1 – (r2/R2))3 ] for all particles
            ●   R : radius threshold
            ●   r : distance from particle to point P

●   Problem: checking all particles per cell
Evaluating Particles Per Cell
●   Bucket sort particles into grid cells in CS
●   Evaluate a kernel around each cell
    ●   Sum potentials from particles in neighbouring cells
    ●   9x9x9 kernel typical
    ●   (729 cells, containing multiple particles per cell, evaluated for ~2 million grid cells)

●   Gives accurate result .. glacially
    ●   > 200ms on Geforce 570
Evaluating Particles, Fast
●   Render single points into grid
    ●   Write out particle position with additive blend
    ●   Sum particle count in alpha channel
●   Post process grid
    ●   Divide by count: get average position of particles in cell
●   Evaluate potentials with kernel - grid cells only
    ●   Use grid cell average position as proxy for particles
Evaluating Particles, Faster

valuating potentials accurately far too slow
 ●   Summing e.g. 9x9x9 cell potentials for each cell..
 ●   Still > 100 ms for our test cases


se separable blur to spread potentials instead
 ●   Not quite 100% accurate.. But close enough
 ●   Calculate blur weights with potential function to at least feign
     correctness
Visualising Distance Fields

     Ray Tracing & Polygonisation
Ray Casting
See: ray marching; sphere tracing

●   SDF(P) = Distance to closest
    point on surface
    ●   (Closest point’s actual location not known)

●   Step along ray by SDF(P)
    until SDF(P)~0
●   Skips empty space!
Ray Casting
●   Accuracy depends on iteration count
●   Primary rays require high accuracy
    ●   50-100 iterations -> slow
    ●   Result is transitory, view dependent
●   Useful for secondary rays
    ●   Can get away with fewer iterations
●   Do something else for primary hits
Polygonisation / Meshing
●   Generate triangle mesh from SDF
●   Rasterise as for any other mesh
    ●   Suits 3D hardware
    ●   Integrate with existing render pipeline
    ●   Reuse mesh between passes / frames
    ●   Speed not dependent on screen resolution
●   Use Marching Cubes
Marching Cubes In One Slide
●   Operates on a discrete grid
●   Evaluate field F() at 8 corners of each
    cubic cell
    ●   Generate sign flag per corner, OR together
●   Where sign(F) changes across corners,
    triangles are generated
    ●   5 per cell max
●   Lookup table defines triangle pattern
Marching Cubes Issues
●   Large number of grid cells
    ●   128x128x128 = 2 million cells
    ●   Only process whole grid when necessary
●   Triangle count varies hugely by field contents
    ●   Can change radically every frame
    ●   Upper bound very large: -> size of grid
    ●   Most cells empty: actual output count relatively small
●   Traditionally implemented on CPU
Geometry Shader Marching Cubes

PU submits a large, empty draw call
  ●    One point primitive per grid cell (i.e. a lot)
  ●    VS minimal: convert SV_VertexId to cell position

S evaluates marching cubes for cell
  ●    Outputs 0 to 5 triangles per cell

ar too slow: 10ms - 150ms                (128^3 grid, architecture-dependent)


      ●Workper GS instance varies greatly: poor parallelism
      ●Some GPU architectures handle GS very badly
Stream Compaction on GPU
Stream Compaction
●   Take a sparsely populated array




●   Push all the filled elements together
    ●   Remember count & offset mapping
●   Now only have to process filled part of array
Stream Compaction
●   Counting pass - parallel reduction
    ●   Iteratively halve array size (like mip chain)
    ●   Write out the sum of the count of parent cells
    ●   Until final step reached: 1 cell, the total count
●   Offset pass - iterative walk back up
    ●   Cell offset = parent position + sibling positions
●   Histopyramids: stream compaction in 3D
Histopyramids

um down mip chain in blocks




                              (Imagine it in 3D)
Histopyramids
●   Count up from base to calculate offsets
Histopyramids In Use
●   Fill grid volume texture with active mask
    ●   0 for empty, 1 for active
●   Generate counts in mip chain downwards
●   Use 2nd volume texture for cell locations
    ●   Walk up the mip chain
Compaction In Action

se histopyramid to compact active cells
   ●Active   cell count now known too


PU dispatches drawcall only for # active cells
   ●Use   DrawInstancesIndirect


S determines grid position from cell index
   ●Use   histopyramid for this
Compaction Reaction

uge improvement over brute force
  ●   ~5 ms – down from 11 ms
  ●   Greatly improves parallelism
  ●   Reduced draw call size


eometry still generated in GS
  ●   Runs again for each render pass
  ●   No indexing / vertex reuse
Geometry Generation
Generating Geometry
ish to pre-generate geometry (no GS)
   ●   Reuse geometry between passes; allow indexed vertices

irst generate active vertices
   ●   Intersection of grid edges with 0 potential contour
   ●   Remember vertex index per grid edge in lookup table
   ●   Vertex count & locations still vary by potential field contents

hen generate indices
   ●   Make use of the vertex index lookup
Generating Vertex Data
rocess potential grid in CS
   ●   One cell per thread
   ●   Find active edges in each cell

utput vertices per cell
   ●   IncrementCounter() on vertex buffer
        ●Returns   current num vertices written
   ●   Write vertex to end of buffer at current counter
   ●   Write counter to edge index lookup: scattered write

r use 2nd histopyramid for vertex data instead
Generating Geometry

ow generate index data with another CS
  ●   Histopyramid as before..
  ●   .. But use edge index grid lookup to locate indices
  ●   DispatchIndirect to limit dispatch to # active cells


ender geom: DrawIndexedInstancedIndirect
  ●   GPU draw call: index count copied from histopyramid
Meshing Improvements
Smoothing
Smoothing More
Smoothing

aplacian smooth
●   Average vertices along edge connections
●   Key for improving quality of fluid dynamics meshing


ust know vertex edge connections
●   Generate from index buffer in post process
Bucket Sorting Arrays
●   Need to bucket elements of an array?
    ●   E.g. Spatial hash; particles per grid cell;
        triangles connected to each vertex
●   Each bucket has varying # elements
●   Don’t want to over-allocate buckets
    ●   Allocate only # elements in array
Counting Sort
●   Use Counting Sort
●   Counting pass – count # elements per bucket
    ●   Use atomics for parallel op – InterlockedAdd()
●   Compute Parallel Prefix Sum
    ●   Like a 1d histopyramid.. See CUDA SDK
    ●   Finds offset for each bucket in element array
●   Then assign elements to buckets
    ●   Reuse counter buffer to track idx in bucket
Smoothing Process
●   Use Counting Sort: bucket triangles per
    vertex
●   Post-process: determine edges per vertex
●   Smooth vertices
    ●   (Original Vertex * 4 + Sum[Connected Vertices]) / (4
        + Connected Vertex Count)
    ●   Iterate smooth process to increase smoothness
0 Smoothing Iterations
4 Smoothing Iterations
8 Smoothing Iterations
16 Smoothing Iterations
Subdivision, Smooth Normals
●   Use existing vertex connectivity data
●   Subdivision: split edges, rebuild indices
    ●   1 new vertex per edge
    ●   4 new triangles replace 1 old triangle
●   Calc smooth vertex normals from final mesh
    ●   Use vertex / triangle connectivity data
    ●   Average triangle face normals per vertex
    ●   Very fast – minimal overhead on total generation cost
Performance
●   Same scene, 128^3 grid, Geforce 570
●   Brute force GS version: 11 ms per pass
    ●   No reuse – shadowmap passes add 11ms each
●   Generating geometry in CS: 2 ms + 0.4 ms per
    pass
    ●   2ms to generate geometry in CS; 0.4ms to render it
    ●   Generated geometry reused between shadow passes
Video
●   (Video Removed)
●   (A tidal wave thing through a city. It was well cool!!!!!1)
Wait, Was That Fluid Dynamics?

           Yes, It Was.
Smoothed Particle Hydrodynamics

olver works on particles
   ●   Particles represent point samples of fluid in space


ocate local neighbours for each particle
   ●   Find all particles inside a particle’s smoothing radius
   ●   Neighbourhood search – can be expensive


olve fluid forces between particles within radius
Neighbourhood Search

patially bucket particles using spatial hash

eturn of Counting Sort - with a histopyramid
   ●   In this case: hash is quantised 3D position
   ●   Bucket particles into hashed cells
SPH Process – Step by Step

ucket particles into cells

valuate all particles..

ind particle neighbours from cell structure
  ●   Must check all nearby cells inside search radius too


um forces on particles from all neighbours
SPH Performance

erformance depends on # neighbours evaluated
  ●   Determined by cell granularity, particle search radius,
      number of particles in system, area covered by system


avour small cell granularity
  ●   Easier to reduce # particles tested at cell level


alance particle radius by hand
SPH Performance
●   In practice this is still far, far too slow     (>200ms)
    ●   Can check > 100 cells, too many particle interactions
●   So we cheat..
    ●   Average particle positions + velocities in each cell
    ●   Use average value for particles vs distant cells
    ●   Force vectors produced close enough to real values..
●   Only use real particle positions for close cells
Illumination

  The Rendering Pipeline of the Future
Rendering Pipeline of the Future
●   Primary rays are rasterised
    ●   Fast: rasterisation still faster for typical game meshes
    ●   Use for camera / GBuffers, shadow maps
●   Secondary rays are traced
    ●   Use GBuffers to get starting point
    ●   Global illumination / ambient occlusion, reflections
    ●   Paths are complex – bounce, scatter, diverge
    ●   Needs full scene knowledge – hard for rasterisation
    ●   Tend to need less accuracy / sharpness..
Ambient Occlusion Ray Tracing
●   Cast many random rays out from surface
    ●   Monte-Carlo style
●   AO result = % of rays that reach sky
●   Slow..
    ●   Poor ray coherence
    ●   Lots of rays per pixel needed for good result
●   Some fakes available
    ●   SSAO & variants – largely horrible.. 
Ambient Occlusion with SDFs
●   Raytrace SDFs to calculate AO
●   Accuracy less important (than primary rays)
    ●   Less SDF iterations – < 20, not 50-100
●   Limit ray length
●   We don’t really “ray cast”..
    ●   Just sample multiple points along ray
    ●   Ray result is a function of SDF distance at points
4 Rays Per Pixel
16 Rays Per Pixel
64 Rays Per Pixel
256 Rays Per Pixel
Ambient Occlusion Ray Tracing
●   Good performance: 4 rays; Quality: 64 rays
●   Try to plug quality/performance gap
●   Could bilateral filter / blur
    ●   Few samples, smooth results spatially   (then add noise)

●   Or use temporal reprojection
    ●   Few samples, refine results temporally
    ●   Randomise rays differently every frame
Temporal Reprojection
●   Keep previous frame’s data
    ●   Previous result buffer, normals/depths, view matrix
●   Reproject current frame  previous frame
    ●   Current view position * view inverse * previous view
    ●   Sample previous frame’s result, blend with current
    ●   Reject sample if normals/depths differ too much
●   Problem: rejected samples / holes
Video
●   (Video Removed)
●   (Basically it looks noisy, then temporally refines, then when the camera moves you see holes)
Temporal Reprojection: Good
Temporal Reprojection: Holes
Hole Filling
●   Reprojection works if you can fill holes nicely
●   Easy to fill holes for AO: just cast more rays
    ●   Cast 16 rays for pixels in holes, 1 for the rest
●   Adversely affects performance
    ●   Work between local pixels differs greatly
    ●   CS thread groups wait on longest thread
    ●   Some threads take 16x longer than others to complete
Video
●   (Video Removed)
●   (It looks all good cos the holes are filled)
Rays Per Thread
Hole Filling
●   Solution: balance rays across threads in CS
●   16x16 pixel tiles: 256 threads in group
●   Compute & sum up required rays in tile
    ●   1 pixel per thread
    ●   1 for reprojected pixels; 16 for hole pixels
●   Spread ray evaluation across cores evenly
    ●   N rays per thread
Rays Per Thread - Tiles
Video
●   (Video Removed)
●   (It still looks all good cos the holes are filled, by way of proof I’m not lying about the technique)
Performance
●   16 rays per pixel: 30 ms
●   1 ray per pixel, reproject: 2 ms
●   1 + 16 in holes, reproject: 12 ms
●   1 + 16 rays, load balanced tiles: 4 ms
    ●   ~ 2 rays per thread typical!
Looking Forward
Looking Forward
●   Multiple representations of same world
    ●   Geometry + SDFs
    ●   Rasterise them
    ●   Trace them
    ●   Collide with them
●    World can be more dynamic.
http://directtovideo.wordpress.com
Thanks
●   Jani Isoranta, Kenny Magnusson for 3D
●   Angeldawn for the Fairlight logo
●   Jussi Laakonen, Chris Butcher for actually making this talk
    happen
●   SCEE R&D for allowing this to happen
●   Guillaume Werle, Steve Tovey, Rich Forster, Angelo Pesce,
    Dominik Ries for slide reviews
References
●   High-speed Marching Cubes using Histogram Pyramids; Dyken, Ziegler et al.
●   Sphere Tracing: a geometric method for the antialiased ray tracing of implicit
    surfaces; John C. Hart
●   Rendering Worlds With Two Triangles; Inigo Quilezles
●   Fast approximations for global illumination on dynamic scenes; Alex Evans

Contenu connexe

Tendances

FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
SPU Assisted Rendering
SPU Assisted RenderingSPU Assisted Rendering
SPU Assisted RenderingSteven Tovey
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsColin Barré-Brisebois
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Deferred shading
Deferred shadingDeferred shading
Deferred shadingFrank Chao
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesNarann29
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelGDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelUmbra Software
 
Modern Graphics Pipeline Overview
Modern Graphics Pipeline OverviewModern Graphics Pipeline Overview
Modern Graphics Pipeline Overviewslantsixgames
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 

Tendances (20)

FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
SPU Assisted Rendering
SPU Assisted RenderingSPU Assisted Rendering
SPU Assisted Rendering
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelGDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
 
Modern Graphics Pipeline Overview
Modern Graphics Pipeline OverviewModern Graphics Pipeline Overview
Modern Graphics Pipeline Overview
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 

En vedette

Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
Around the World in 80 Shaders
Around the World in 80 ShadersAround the World in 80 Shaders
Around the World in 80 Shadersstevemcauley
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
GPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and RenderingGPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and RenderingJoão Vicente P. Reis Fo.
 
WPF - the future of GUI is near
WPF - the future of GUI is nearWPF - the future of GUI is near
WPF - the future of GUI is nearBartlomiej Filipek
 
TokyoDemoFest2012 Raymarching Tutorial
TokyoDemoFest2012 Raymarching TutorialTokyoDemoFest2012 Raymarching Tutorial
TokyoDemoFest2012 Raymarching TutorialMasaki Sasaki
 
Hello, DirectCompute
Hello, DirectComputeHello, DirectCompute
Hello, DirectComputedasyprocta
 
GDC 2016 End-to-End Approach to Physically Based Rendering
GDC 2016 End-to-End Approach to Physically Based RenderingGDC 2016 End-to-End Approach to Physically Based Rendering
GDC 2016 End-to-End Approach to Physically Based RenderingWes McDermott
 
Manuale di scrittura creativa
Manuale di scrittura creativaManuale di scrittura creativa
Manuale di scrittura creativaDavide Nonino
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
物理ベースレンダラedupt解説
物理ベースレンダラedupt解説物理ベースレンダラedupt解説
物理ベースレンダラedupt解説h013
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)Electronic Arts / DICE
 
中級グラフィックス入門~シャドウマッピング総まとめ~
中級グラフィックス入門~シャドウマッピング総まとめ~中級グラフィックス入門~シャドウマッピング総まとめ~
中級グラフィックス入門~シャドウマッピング総まとめ~ProjectAsura
 
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介Drecom Co., Ltd.
 

En vedette (20)

Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Around the World in 80 Shaders
Around the World in 80 ShadersAround the World in 80 Shaders
Around the World in 80 Shaders
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
GPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and RenderingGPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and Rendering
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
 
WPF - the future of GUI is near
WPF - the future of GUI is nearWPF - the future of GUI is near
WPF - the future of GUI is near
 
TokyoDemoFest2012 Raymarching Tutorial
TokyoDemoFest2012 Raymarching TutorialTokyoDemoFest2012 Raymarching Tutorial
TokyoDemoFest2012 Raymarching Tutorial
 
3D User Interface
3D User Interface3D User Interface
3D User Interface
 
Hello, DirectCompute
Hello, DirectComputeHello, DirectCompute
Hello, DirectCompute
 
GDC 2016 End-to-End Approach to Physically Based Rendering
GDC 2016 End-to-End Approach to Physically Based RenderingGDC 2016 End-to-End Approach to Physically Based Rendering
GDC 2016 End-to-End Approach to Physically Based Rendering
 
Manuale di scrittura creativa
Manuale di scrittura creativaManuale di scrittura creativa
Manuale di scrittura creativa
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
GPU最適化入門
GPU最適化入門GPU最適化入門
GPU最適化入門
 
物理ベースレンダラedupt解説
物理ベースレンダラedupt解説物理ベースレンダラedupt解説
物理ベースレンダラedupt解説
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
中級グラフィックス入門~シャドウマッピング総まとめ~
中級グラフィックス入門~シャドウマッピング総まとめ~中級グラフィックス入門~シャドウマッピング総まとめ~
中級グラフィックス入門~シャドウマッピング総まとめ~
 
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
CEDEC 2016 Metal と Vulkan を用いた水彩画レンダリング技法の紹介
 

Similaire à GDC 2012: Advanced Procedural Rendering in DX11

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicschangehee lee
 
Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Sri Ambati
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space MarinePope Kim
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APISri Ambati
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014Jarosław Pleskot
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Aritra Sarkar
 
Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Matthias Trapp
 

Similaire à GDC 2012: Advanced Procedural Rendering in DX11 (20)

Pathfinding in games
Pathfinding in gamesPathfinding in games
Pathfinding in games
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Offscreenparticle
OffscreenparticleOffscreenparticle
Offscreenparticle
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

GDC 2012: Advanced Procedural Rendering in DX11

  • 1. Advanced Procedural Rendering in DirectX 11 Matt Swoboda Principal Engineer, SCEE R&D PhyreEngine™ Team Demo Coder, Fairlight
  • 2.
  • 3. Aim ● More dynamic game worlds.
  • 4. Demoscene? make demos ● “Like games, crossed with music videos” inear, non-interactive, scripted ll generated in real-time ● On consumer-level PC hardware
  • 5. DirectX 11? irectX 9 is very old ● We are all very comfortable with it.. ● .. But does not map well to modern graphics hardware irectX 11 lets you use same hardware smarter ● Compute shaders ● Much improved shading language ● GPU-dispatched draw calls ● .. And much more
  • 6. Procedural Mesh Generation A reasonable result from random formulae (Hopefully a good result from sensible formulae)
  • 7. Signed Distance Fields (SDFs) istance function: ● Returns the closest distance to the surface from a given point igned distance function: ● Returns the closest distance from a point to the surface, positive if the point is outside the shape and negative if inside
  • 8. Signed Distance Fields ● Useful tool for procedural geometry creation ● Easy to define in code .. ● .. Reasonable results from “random formulae” ● Can create from meshes, particles, fluids, voxels ● CSG, distortion, repeats, transforms all easy ● No concerns with geometric topology ● Just define the field in space, polygonize later
  • 9. A Box Box(pos, size) { a = abs(pos-size) - size; return max(a.x,a.y,a.z); } *Danger: may not actually compile
  • 10. Cutting with Booleans d = Box(pos) c = fmod(pos * A, B) subD = max(c.y,min(c.y,c.z)) d = max(d, -subD)
  • 11. More Booleans d = Box(pos) c = fmod(pos * A, B) subD = max(c.y,min(c.y,c.z)) subD = min(subD,cylinder(c)) subD = max(subD, Windows()) d = max(d, -subD)
  • 12. Repeated Booleans d = Box(pos) e = fmod(pos + N, M) floorD = Box(e) d = max(d, -floorD)
  • 13. Cutting Holes d = Box(pos) e = fmod(pos + N, M) floorD = Box(e) floorD = min(floorD,holes()) d = max(d, -floorD)
  • 14. Combined Result d = Box(pos) c = fmod(pos * A, B) subD = max(c.y,min(c.y,c.z)) subD = min(subD,cylinder(c)) subD = max(subD, Windows()) e = fmod(pos + N, M) floorD = Box(e) floorD = min(floorD,holes()) d = max(d, -subD) d = max(d, -floorD)
  • 15. Repeating the Space pos.y = frac(pos.y) d = Box(pos) c = fmod(pos * A, B) subD = max(c.y,min(c.y,c.z)) subD = min(subD,cylinder(c)) subD = max(subD, Windows()) e = fmod(pos + N, M) floorD = Box(e) floorD = min(floorD,holes()) d = max(d, -subD) d = max(d, -floorD)
  • 16. Repeating the Space pos.xy = frac(pos.xy) d = Box(pos) c = fmod(pos * A, B) subD = max(c.y,min(c.y,c.z)) subD = min(subD,cylinder(c)) subD = max(subD, Windows()) e = fmod(pos + N, M) floorD = Box(e) floorD = min(floorD,holes()) d = max(d, -subD) d = max(d, -floorD)
  • 21. Procedural SDFs in Practice enerated scenes probably won’t replace 3D artists
  • 22. Procedural SDFs in Practice enerated scenes probably won’t replace 3D artists 
  • 23. Procedural SDFs in Practice enerated scenes probably won’t replace 3D artists enerated SDFs good proxies for real meshes ● Code to combine a few primitives cheaper than art data ombine with artist-built meshes converted to SDFs ● Boolean, modify, cut, distort procedurally
  • 24. Video ● (Video Removed) ● (It’s a cube morphing into a mesh. You know, just for fun etc.)
  • 26. SDFs from Triangle Meshes onvert triangle mesh to SDF in 3D texture ● 32^3 – 256^3 volume texture typical ● SDFs interpolate well..  bicubic interpolation ● .. Low resolution 3D textures still work well ● Agnostic to poly count (except for processing time) an often be done offline
  • 27. SDFs from Triangle Meshes A mesh converted to a 64x64x64 SDF and polygonised. It’s two people doing yoga, by the way.
  • 28. SDFs from Triangle Meshes aïve approach? ● Compute distance from every cell to every triangle ● Very slow but accurate oxelize mesh to grid, then sweep?  UGLY ● Sweep to compute signed distance from voxels to cells ● Voxelization too inaccurate near surface.. ● ..But near-surface distance is important - interpolation
  • 29. Geometry Stages ● Bind 3D texture target ● VS transforms to SDF space ● Geometry shader replicates triangle to affected slices ● Flatten triangle to 2D ● Output positions as TEXCOORDs.. ● .. All 3 positions for each vertex
  • 30. Pixel Shader Stage alculates distance from 3D pixel to triangle ● Compute closest position on triangle ● Evaluate vertex normal using barycentric valuate distance sign using weighted normal rite signed distance to output color, distance to depth epth test keeps closest distance
  • 31. Post Processing Step ells around mesh surface now contain accurate signed distance est of grid is empty ill out rest of the grid in post process CS ast Sweeping algorithm
  • 32. Fast Sweeping ● Requires ability to read d = maxPossibleDistance and write same buffer for i = 0 to row length ● One thread per row d += cellSize ● Thread R/W doesn’t overlap if(abs(cell[i]) > abs(d)) ● No interlock needed cell[i] = d ● Sweep forwards then else backwards on same axis d = cell[i] ● Sweep each axis in turn
  • 34. SDFs From Particle Systems ● Naïve: treat each particle as a sphere ● Compute min distance from point to particles ● Better: use metaball blobby equation ● Density(P) = Sum[ (1 – (r2/R2))3 ] for all particles ● R : radius threshold ● r : distance from particle to point P ● Problem: checking all particles per cell
  • 35. Evaluating Particles Per Cell ● Bucket sort particles into grid cells in CS ● Evaluate a kernel around each cell ● Sum potentials from particles in neighbouring cells ● 9x9x9 kernel typical ● (729 cells, containing multiple particles per cell, evaluated for ~2 million grid cells) ● Gives accurate result .. glacially ● > 200ms on Geforce 570
  • 36. Evaluating Particles, Fast ● Render single points into grid ● Write out particle position with additive blend ● Sum particle count in alpha channel ● Post process grid ● Divide by count: get average position of particles in cell ● Evaluate potentials with kernel - grid cells only ● Use grid cell average position as proxy for particles
  • 37. Evaluating Particles, Faster valuating potentials accurately far too slow ● Summing e.g. 9x9x9 cell potentials for each cell.. ● Still > 100 ms for our test cases se separable blur to spread potentials instead ● Not quite 100% accurate.. But close enough ● Calculate blur weights with potential function to at least feign correctness
  • 38. Visualising Distance Fields Ray Tracing & Polygonisation
  • 39. Ray Casting See: ray marching; sphere tracing ● SDF(P) = Distance to closest point on surface ● (Closest point’s actual location not known) ● Step along ray by SDF(P) until SDF(P)~0 ● Skips empty space!
  • 40. Ray Casting ● Accuracy depends on iteration count ● Primary rays require high accuracy ● 50-100 iterations -> slow ● Result is transitory, view dependent ● Useful for secondary rays ● Can get away with fewer iterations ● Do something else for primary hits
  • 41. Polygonisation / Meshing ● Generate triangle mesh from SDF ● Rasterise as for any other mesh ● Suits 3D hardware ● Integrate with existing render pipeline ● Reuse mesh between passes / frames ● Speed not dependent on screen resolution ● Use Marching Cubes
  • 42. Marching Cubes In One Slide ● Operates on a discrete grid ● Evaluate field F() at 8 corners of each cubic cell ● Generate sign flag per corner, OR together ● Where sign(F) changes across corners, triangles are generated ● 5 per cell max ● Lookup table defines triangle pattern
  • 43. Marching Cubes Issues ● Large number of grid cells ● 128x128x128 = 2 million cells ● Only process whole grid when necessary ● Triangle count varies hugely by field contents ● Can change radically every frame ● Upper bound very large: -> size of grid ● Most cells empty: actual output count relatively small ● Traditionally implemented on CPU
  • 44. Geometry Shader Marching Cubes PU submits a large, empty draw call ● One point primitive per grid cell (i.e. a lot) ● VS minimal: convert SV_VertexId to cell position S evaluates marching cubes for cell ● Outputs 0 to 5 triangles per cell ar too slow: 10ms - 150ms (128^3 grid, architecture-dependent) ●Workper GS instance varies greatly: poor parallelism ●Some GPU architectures handle GS very badly
  • 46. Stream Compaction ● Take a sparsely populated array ● Push all the filled elements together ● Remember count & offset mapping ● Now only have to process filled part of array
  • 47. Stream Compaction ● Counting pass - parallel reduction ● Iteratively halve array size (like mip chain) ● Write out the sum of the count of parent cells ● Until final step reached: 1 cell, the total count ● Offset pass - iterative walk back up ● Cell offset = parent position + sibling positions ● Histopyramids: stream compaction in 3D
  • 48. Histopyramids um down mip chain in blocks (Imagine it in 3D)
  • 49. Histopyramids ● Count up from base to calculate offsets
  • 50. Histopyramids In Use ● Fill grid volume texture with active mask ● 0 for empty, 1 for active ● Generate counts in mip chain downwards ● Use 2nd volume texture for cell locations ● Walk up the mip chain
  • 51. Compaction In Action se histopyramid to compact active cells ●Active cell count now known too PU dispatches drawcall only for # active cells ●Use DrawInstancesIndirect S determines grid position from cell index ●Use histopyramid for this
  • 52. Compaction Reaction uge improvement over brute force ● ~5 ms – down from 11 ms ● Greatly improves parallelism ● Reduced draw call size eometry still generated in GS ● Runs again for each render pass ● No indexing / vertex reuse
  • 54. Generating Geometry ish to pre-generate geometry (no GS) ● Reuse geometry between passes; allow indexed vertices irst generate active vertices ● Intersection of grid edges with 0 potential contour ● Remember vertex index per grid edge in lookup table ● Vertex count & locations still vary by potential field contents hen generate indices ● Make use of the vertex index lookup
  • 55. Generating Vertex Data rocess potential grid in CS ● One cell per thread ● Find active edges in each cell utput vertices per cell ● IncrementCounter() on vertex buffer ●Returns current num vertices written ● Write vertex to end of buffer at current counter ● Write counter to edge index lookup: scattered write r use 2nd histopyramid for vertex data instead
  • 56. Generating Geometry ow generate index data with another CS ● Histopyramid as before.. ● .. But use edge index grid lookup to locate indices ● DispatchIndirect to limit dispatch to # active cells ender geom: DrawIndexedInstancedIndirect ● GPU draw call: index count copied from histopyramid
  • 60. Smoothing aplacian smooth ● Average vertices along edge connections ● Key for improving quality of fluid dynamics meshing ust know vertex edge connections ● Generate from index buffer in post process
  • 61. Bucket Sorting Arrays ● Need to bucket elements of an array? ● E.g. Spatial hash; particles per grid cell; triangles connected to each vertex ● Each bucket has varying # elements ● Don’t want to over-allocate buckets ● Allocate only # elements in array
  • 62. Counting Sort ● Use Counting Sort ● Counting pass – count # elements per bucket ● Use atomics for parallel op – InterlockedAdd() ● Compute Parallel Prefix Sum ● Like a 1d histopyramid.. See CUDA SDK ● Finds offset for each bucket in element array ● Then assign elements to buckets ● Reuse counter buffer to track idx in bucket
  • 63. Smoothing Process ● Use Counting Sort: bucket triangles per vertex ● Post-process: determine edges per vertex ● Smooth vertices ● (Original Vertex * 4 + Sum[Connected Vertices]) / (4 + Connected Vertex Count) ● Iterate smooth process to increase smoothness
  • 68. Subdivision, Smooth Normals ● Use existing vertex connectivity data ● Subdivision: split edges, rebuild indices ● 1 new vertex per edge ● 4 new triangles replace 1 old triangle ● Calc smooth vertex normals from final mesh ● Use vertex / triangle connectivity data ● Average triangle face normals per vertex ● Very fast – minimal overhead on total generation cost
  • 69. Performance ● Same scene, 128^3 grid, Geforce 570 ● Brute force GS version: 11 ms per pass ● No reuse – shadowmap passes add 11ms each ● Generating geometry in CS: 2 ms + 0.4 ms per pass ● 2ms to generate geometry in CS; 0.4ms to render it ● Generated geometry reused between shadow passes
  • 70. Video ● (Video Removed) ● (A tidal wave thing through a city. It was well cool!!!!!1)
  • 71. Wait, Was That Fluid Dynamics? Yes, It Was.
  • 72. Smoothed Particle Hydrodynamics olver works on particles ● Particles represent point samples of fluid in space ocate local neighbours for each particle ● Find all particles inside a particle’s smoothing radius ● Neighbourhood search – can be expensive olve fluid forces between particles within radius
  • 73. Neighbourhood Search patially bucket particles using spatial hash eturn of Counting Sort - with a histopyramid ● In this case: hash is quantised 3D position ● Bucket particles into hashed cells
  • 74. SPH Process – Step by Step ucket particles into cells valuate all particles.. ind particle neighbours from cell structure ● Must check all nearby cells inside search radius too um forces on particles from all neighbours
  • 75. SPH Performance erformance depends on # neighbours evaluated ● Determined by cell granularity, particle search radius, number of particles in system, area covered by system avour small cell granularity ● Easier to reduce # particles tested at cell level alance particle radius by hand
  • 76. SPH Performance ● In practice this is still far, far too slow (>200ms) ● Can check > 100 cells, too many particle interactions ● So we cheat.. ● Average particle positions + velocities in each cell ● Use average value for particles vs distant cells ● Force vectors produced close enough to real values.. ● Only use real particle positions for close cells
  • 77. Illumination The Rendering Pipeline of the Future
  • 78. Rendering Pipeline of the Future ● Primary rays are rasterised ● Fast: rasterisation still faster for typical game meshes ● Use for camera / GBuffers, shadow maps ● Secondary rays are traced ● Use GBuffers to get starting point ● Global illumination / ambient occlusion, reflections ● Paths are complex – bounce, scatter, diverge ● Needs full scene knowledge – hard for rasterisation ● Tend to need less accuracy / sharpness..
  • 79. Ambient Occlusion Ray Tracing ● Cast many random rays out from surface ● Monte-Carlo style ● AO result = % of rays that reach sky ● Slow.. ● Poor ray coherence ● Lots of rays per pixel needed for good result ● Some fakes available ● SSAO & variants – largely horrible.. 
  • 80. Ambient Occlusion with SDFs ● Raytrace SDFs to calculate AO ● Accuracy less important (than primary rays) ● Less SDF iterations – < 20, not 50-100 ● Limit ray length ● We don’t really “ray cast”.. ● Just sample multiple points along ray ● Ray result is a function of SDF distance at points
  • 81. 4 Rays Per Pixel
  • 82. 16 Rays Per Pixel
  • 83. 64 Rays Per Pixel
  • 84. 256 Rays Per Pixel
  • 85. Ambient Occlusion Ray Tracing ● Good performance: 4 rays; Quality: 64 rays ● Try to plug quality/performance gap ● Could bilateral filter / blur ● Few samples, smooth results spatially (then add noise) ● Or use temporal reprojection ● Few samples, refine results temporally ● Randomise rays differently every frame
  • 86. Temporal Reprojection ● Keep previous frame’s data ● Previous result buffer, normals/depths, view matrix ● Reproject current frame  previous frame ● Current view position * view inverse * previous view ● Sample previous frame’s result, blend with current ● Reject sample if normals/depths differ too much ● Problem: rejected samples / holes
  • 87. Video ● (Video Removed) ● (Basically it looks noisy, then temporally refines, then when the camera moves you see holes)
  • 90. Hole Filling ● Reprojection works if you can fill holes nicely ● Easy to fill holes for AO: just cast more rays ● Cast 16 rays for pixels in holes, 1 for the rest ● Adversely affects performance ● Work between local pixels differs greatly ● CS thread groups wait on longest thread ● Some threads take 16x longer than others to complete
  • 91. Video ● (Video Removed) ● (It looks all good cos the holes are filled)
  • 93. Hole Filling ● Solution: balance rays across threads in CS ● 16x16 pixel tiles: 256 threads in group ● Compute & sum up required rays in tile ● 1 pixel per thread ● 1 for reprojected pixels; 16 for hole pixels ● Spread ray evaluation across cores evenly ● N rays per thread
  • 94. Rays Per Thread - Tiles
  • 95. Video ● (Video Removed) ● (It still looks all good cos the holes are filled, by way of proof I’m not lying about the technique)
  • 96. Performance ● 16 rays per pixel: 30 ms ● 1 ray per pixel, reproject: 2 ms ● 1 + 16 in holes, reproject: 12 ms ● 1 + 16 rays, load balanced tiles: 4 ms ● ~ 2 rays per thread typical!
  • 98. Looking Forward ● Multiple representations of same world ● Geometry + SDFs ● Rasterise them ● Trace them ● Collide with them ●  World can be more dynamic.
  • 100. Thanks ● Jani Isoranta, Kenny Magnusson for 3D ● Angeldawn for the Fairlight logo ● Jussi Laakonen, Chris Butcher for actually making this talk happen ● SCEE R&D for allowing this to happen ● Guillaume Werle, Steve Tovey, Rich Forster, Angelo Pesce, Dominik Ries for slide reviews
  • 101. References ● High-speed Marching Cubes using Histogram Pyramids; Dyken, Ziegler et al. ● Sphere Tracing: a geometric method for the antialiased ray tracing of implicit surfaces; John C. Hart ● Rendering Worlds With Two Triangles; Inigo Quilezles ● Fast approximations for global illumination on dynamic scenes; Alex Evans

Notes de l'éditeur

  1. http://directtovideo.wordpress.com
  2. DirectX 9 is 9 years old Comfortable, well-understood API on PC &amp; console A PI itself largely an incremental update to DirectX 8 – released in 2000 Ingrained in our codebases (and minds) Shader language now has integers, atomics. Can now write to buffers by side effect in pixel shaders and a wealth of other new things besides.
  3. The separation of generation of the space and the topology is key in achieving flexibility – topology is evil.
  4. CSG subtract operator – just max with negative distance
  5. Repeat is just frac (or fmod) on the space. Always transform the space, not the shape. The same goes for twists, bends, distorts, transforms.
  6. “ Textures” are generated in realtime too – positions and normals are read from Gbuffers. The texture generation isnt dissimilar to the distance field generation, but separating them makes it easier to get performance (the detail is done in the texture so it doesn’t weigh down the distance field shader).
  7. Don’t expect everyone to fire all artists and generate worlds in code
  8. Bicubic interpolation – there’s a good gems articile and some nice shader code around which can do bicubic 3d texture interpolation with 8 taps (all linear filtered though) from a 3d texture. Pretty good actually!
  9. Geometry shaders suck, though. Will move this to compute-side expansion at some point.
  10. Can evaluate color if necessary by sampling texture OK – I lied. We don’t use a 3d texture target because depth stencil doesn’t work. We actually use a 2d texture array so we can have a matching depthstencil.
  11. The problem comes when rays “just miss” the surface – they have to march a lot of iterations to get past the surface because SDF(P) is very small near to the surface Because the result is transitory it can’t be cached between frames or passes – it has to be regenerated continuously.
  12. Out of patent since 2005! Wikipedia
  13. Look, GS is really shit on some architectures. On older cards like my geforce 9800 in a work machine, the performance was tragic (the 150ms figure quoted – much better on the same card with histopyramids). It seems better on more modern cards and on one vendor than another, though.
  14. We use stream compaction for many things .. Like particle depth sorting, SPH, triangles in grid cell, you name it.
  15. Histopyramids – thanks to Gernot Ziegler and Chris Dyken.
  16. Offset = parent offset in mip chain + cell’s sibling counts (that come before it in sequence)
  17. Append buffers vs counter buffers: append buffers do not return the count
  18. *your mileage may vary – a lot Note that it’s possible to use an appendbuffer instead of a histopyramid for generating indices here; performance vs histopyramid depends on architecture. Architecture-dependent is something that comes up a lot. Sucks, don’t it?
  19. 0 iterations, 1 iteration
  20. 4 iterations, 16 iterations
  21. We also use this algorithm for depth sorting particles.
  22. Counting sort is the first pass of radix sort. Atomics: I’ve found that performance of atomics depends on contention and differs greatly between hardware vendors. Hard to judge, hard to predict; definitely not free.
  23. Why bucket triangles not connected vertices? Because it causes less writes! I also used a linked list using a counter buffer and IncrementCounter(). This was much less successful because the linked list meant that the indices per vertex were scattered all around memory – it was much slower to walk the list. Changing from a linked list to buckets gave approximately 10x speedup on some hardware.
  24. GS approach scales linearly with grid size : 64^3 grid is 1.4ms The CS approach is without smoothing, subdivision or normal calculation to give a fair comparison (as the GS one doesn’t do it).
  25. The particle behaviours I describe here are interesting because they demonstrate uses of spatial databases and interaction between particles and the world, and particles with other particles.
  26. Particles are easier to collide with environment than grid based systems.
  27. First, pre-determine bucket sizes Then compact bucket buffer Finally, put particles into correct buckets We use a histopyramid for the compaction. We actually render particles into a grid using vertex,geometry and pixel shader and use a UAV to write out the counts – it’s faster than using CS for the lot because blending beats atomics here, although that’s a case by case thing (or so I’ve found) so you need to check both ways.
  28. 2x cell kernel radius = 5x5x5 cells = 125 cells! Using averaged values for distant cells works because the force direction produced doesn’t change much between the particles in the cell against the particle, and because the weight is low (1/d^2 falloff). This is something like what they do to solve N-body problems in astrophysics. 2 kernels – the inner is high quality and slow but small, the outer is large and approximated and fast.
  29. Use mip levels to reduce cost of texture sampling at distance
  30. About 7-8ms on a Geforce 570. We cast “nice” rays with a lot of taps and no mip filtering bias tricks. You could use a lot less taps and sacrifice some quality, but increase speed accordingly. 8 taps is pretty good and it’ll complete in more like 3ms.
  31. About 30ms on a Geforce 570 – we cast nice rays
  32. About 60ms on a Geforce 570
  33. About 3 dog years on a Geforce 570
  34. In reality what this means is, for every pixel in every tile – if the tile contains a white pixel, it will take the time of casting 16 rays for a pixel FOR THE WHOLE TILE. i.e. its slow.
  35. To explain via code, the Compute Shader does this: Process 1 pixel: read pixel, reproject, determine if in hole, add required rays to list in local memory Sync For(I = threadIndex; I &lt; numRays; I += NumThreads) { Process ray; } Sync Output lit pixel
  36. The rays are balanced across the tiles, so there’s no harsh peaks.
  37. Without balancing the 1+16 is only about twice as fast as doing 16 rays per pixel; with balancing its much faster.