Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
[Kgc2013] 모바일 엔진 개발기
Next
Download to read offline and view in fullscreen.

23

Share

Download to read offline

Z Buffer Optimizations

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Z Buffer Optimizations

  1. 1. Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
  2. 2. Overview  Z-Buffer Review  Hardware: Early-Z  Software: Front-to-Back Sorting  Hardware: Double-Speed Z-Only  Software: Early-Z Pass  Software: Deferred Shading  Hardware: Buffer Compression  Hardware: Fast Clear  Hardware: Z-Cull  Future: Programmable Culling Unit
  3. 3. Z-Buffer Review  Also called Depth Buffer  Fragment vs Pixel  Alternatives: Painter’s, Ray Casting, etc
  4. 4. Z-Buffer History  “Brute-force approach”  “Ridiculously expensive”  Sutherland, Sproull, and, Schumacker, “A Characterization of Ten Hidden-Surface Algorithms”, 1974
  5. 5. Z-Buffer Quiz  10 triangles cover a pixel. Rendering these in random order with a Z-buffer, what is the average number of times the pixel’s z-value is written? See Subtle Tools Slides: erich.realtimerendering.com
  6. 6. Z-Buffer Quiz  1st triangle writes depth  2nd triangle has 1/2 chance of writing depth  3rd triangle has 1/3 chance of writing depth  1 + 1/2 + 1/3 + …+ 1/10 = 2.9289… See Subtle Tools Slides: erich.realtimerendering.com
  7. 7. Z-Buffer Quiz Harmonic Series # Triangles # Depth Writes 1 1 4 2.08 11 3.02 31 4.03 83 5 12,367 10 See Subtle Tools Slides: erich.realtimerendering.com
  8. 8. Z-Test in the Pipeline  When is the Z-Test? Fragment Shader Fragment Shader Z-Test Z-Test or
  9. 9. Early-Z  Avoid expensive fragment shaders  Reduce bandwidth to frame buffer Writes not reads Fragment Shader Z-Test
  10. 10. Early-Z  Automatically enabled on GeForce (8?) unless1 Fragment shader discards or write depth Depth writes and alpha-test2 are enabled  Fine-grained as opposed to Z-Cull  ATI: “Top of the Pipe Z Reject” Fragment Shader Z-Test 1 See NVIDIA GPU Programming Guide for exact details 2 Alpha-test is deprecated in GL 3
  11. 11. Front-to-Back Sorting  Utilize Early-Z for opaque objects  Old hardware still has less z-buffer writes  CPU overhead. Need efficient sorting Bucket Sort Octtree  Conflicts with state sorting 0 - 0.25 0.25 – 0.5 0.5 – 0.75 0.75 - 1 0 1 1 2
  12. 12. Double Speed Z-Only  GeForce FX and later render at double speed when writing only depth or stencil  Enabled when Color writes are disabled Fragment shader discards or write depth Alpha-test is disabled See NVIDIA GPU Programming Guide for exact details
  13. 13. Early-Z Pass  Software technique to utilize Early-Z and Double Speed Z-Only  Two passes Render depth only. “Lay down depth” – Double Speed Z-Only Render with full shaders and no depth – Early-Z (and Z-Cull)
  14. 14. Early-Z Pass  Optimizations Depth pass • Coarse sort front-to-back • Only render major occluders Shade pass • Sort by state • Render non-occluders depth
  15. 15. Deferred Shading  Similar to Early-Z Pass 1st Pass: Visibility tests 2nd Pass: Shading  Different than Early-Z Pass Geometry is only transformed once
  16. 16. Deferred Shading  1st Pass Render geometry into G-Buffers: Images from Tabula Rasa. See Resources. Fragment Colors Normals Depth Edge Weight
  17. 17. Deferred Shading  2nd Pass Shading == post processing effects Render full screen quads that read from G-Buffers Objects are no longer needed
  18. 18. Deferred Shading  Light Accumulation Result Image from Tabula Rasa. See Resources.
  19. 19. Deferred Shading  Eliminates shading fragments that fail Z-Test  Increases video memory requirement  How does it affect bandwidth?
  20. 20. Buffer Compression  Reduce depth buffer bandwidth  Generally does not reduce memory usage of actual depth buffer  Same architecture applies to other buffers, e.g. color and stencil
  21. 21. Buffer Compression  Tile Table: Status for nxn tile of depths, e.g. n=8 [state, zmin, zmax] state is either compressed, uncompressed, or cleared 0.1 0.5 0.5 0.1 0.5 0.5 0.1 0.8 0.8 0.8 0.8 0.5 0.5 0.5 0.5 0.1 [uncompressed, 0.1, 0.8]
  22. 22. Buffer Compression Tile Table Decompress Compress Compressed Z-Buffer Rasterizer updated z-values updated z-max nxn uncompressed z values [zmin, zmax]
  23. 23. Buffer Compression  Depth Buffer Write Rasterizer modifies copy of uncompressed tile Tile is lossless compressed (if possible) and sent to actual depth buffer Update Tile Table • zmin and zmax • status: compressed or decompressed
  24. 24. Buffer Compression  Depth Buffer Read Tile Status • Uncompressed: Send tile • Compressed: Decompress and send tile • Cleared: See Fast Clear
  25. 25. Buffer Compression  ATI: Writing depth interferes with compression Render those objects last  Minimize far/near ratio Improves Zmin , Zmax precision
  26. 26. Fast Clear  Don’t touch depth buffer  glClear sets state of each tile to cleared  When the rasterizer reads a cleared buffer A tile filled with GL_DEPTH_CLEAR_VALUE is sent Depth buffer is not accessed
  27. 27. Fast Clear  Use glClear Not full screen quads Not the skybox No "one frame positive, one frame negative“ trick  Clear stencil together with depth – they are stored in the same buffer
  28. 28. Z-Cull  Cull blocks of fragments before shading  Coarse-grained as opposed to Early-Z  Also called Hierarchical Z Fragment Shader Z-Cull Ztriangle min > tile’s zmax ztriangle min
  29. 29. Z-Cull  Zmax-Culling Rasterizer fetches zmax for each tile it processes Compute ztriangle min for a triangle Culled if ztriangle min > zmax Fragment Shader Z-Cull Ztriangle min > tile’s zmax ztriangle min
  30. 30. Z-Cull  Zmin-Culling Support different depth tests Avoid depth buffer reads If triangle is in front of tile, depth tests for each pixel is unnecessary Fragment Shader Z-Cull Ztriangle max < tile’s zmin ztriangle max
  31. 31. Z-Cull  Automatically enabled on GeForce (6?) cards unless  glClear isn’t used  Fragment shader writes depth (or discards?)  Direction of depth test is changed. Why?  ATI: avoid = and != depth compares on old cards  ATI: avoid stencil fail and stencil depth fail operations  Less efficient when depth varies a lot within a few pixels See NVIDIA GPU Programming Guide for exact details
  32. 32. ATI HyperZ  HyperZ = Early Z + Z Compression + Fast Z clear + Hierarchical Z See ATI's Depth-in-depth
  33. 33. Programmable Culling Unit  Cull before fragment shader even if the shader writes depth or discards  Run part of shader over an entire tile to determine lower bound z value  Hasselgren and Akenine-Möller, “PCU: The Programmable Culling Unit,” 2007
  34. 34. Summary  What was once “ridiculously expensive” is now the primary visible surface algorithm for rasterization
  35. 35. Resources www.realtimerendering.com Sections 7.9.2 and 18.3
  36. 36. Resources developer.nvidia.com/object/gpu_programming_guide.html GeForce 8 Guide: sections 3.4.9, 3.6, and 4.8 GeForce 7 Guide: section 3.6
  37. 37. Resources http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf Depth In-depth
  38. 38. Resources http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf ATI Radeon HyperZ Technology Steve Morein
  39. 39. Resources http://ati.amd.com/developer/dx9/ATI-DX9_Optimization.pdf Performance Optimization Techniques for ATI Graphics Hardware with DirectX® 9.0 Guennadi Riguer Sections 6.5 and 8
  40. 40. Resources developer.nvidia.com/object/gpu_gems_home.html Chapter 28: Graphics Pipeline Performance
  41. 41. Resources developer.nvidia.com/object/gpu-gems-3.html Chapter 19: Deferred Shading in Tabula Rasa
  • KellieMbih

    Mar. 11, 2020
  • netpyoung

    Jun. 14, 2019
  • ssuserb0e71a

    Dec. 3, 2018
  • yumios

    Nov. 8, 2018
  • ChihChungCheng

    Mar. 15, 2018
  • gexiang

    Feb. 6, 2018
  • qq2862134638

    Jan. 27, 2018
  • SongHong6

    Jan. 20, 2018
  • FurkancanTuran

    Nov. 9, 2017
  • IvoYankulovski

    Sep. 16, 2017
  • YazKhabiri

    Apr. 10, 2017
  • ssuser4e9ae9

    Feb. 7, 2017
  • L1048576

    Jan. 7, 2017
  • yukiito7359

    Nov. 25, 2016
  • ssuser2e676d

    Nov. 25, 2016
  • tuankuranes

    Jan. 18, 2016
  • AndreyKopysov

    Jan. 6, 2016
  • hanecci

    Jan. 25, 2014
  • thomasschlogl5

    Sep. 19, 2013
  • bm82009072

    May. 20, 2013

Views

Total views

22,497

On Slideshare

0

From embeds

0

Number of embeds

1,207

Actions

Downloads

258

Shares

0

Comments

0

Likes

23

×