Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Clusters With Glusterfs
Next
Download to read offline and view in fullscreen.

Share

Lustre client performance comparison and tuning (1.8.x to 2.x)

Download to read offline

In this presentation from the Lustre User Group 2014, John Fragalla from Xyratex presents: Lustre Client Performance Comparison and Tuning (1.8.x to 2.x).

Watch the video presentation: http://wp.me/p3RLHQ-bJ3
Learn more: http://www.opensfs.org/lug-2014-agenda

Related Books

Free with a 30 day trial from Scribd

See all

Lustre client performance comparison and tuning (1.8.x to 2.x)

  1. 1. Lustre 1.8.9 - 2.x Client Performance Comparison and Tuning! John Fragalla! HPC Principal Architect! Xyratex, A Seagate Company! ! Lustre User Group Conference! April 8-10, 2014!
  2. 2. • Benchmark Setup! • IOR Parameters and Settings! • Lustre Clients and Methodology! • Single Thread Performance! • Throughput Performance Results and Data! • Summary! Agenda! !
  3. 3. • Storage Architecture! – A Cluster 6000 SSU with GridRaid OSTs! •  Rated Storage Performance ≥ 6 GB/s Read or Write with IOR! – InfiniBand FDR Interconnect! – Xyratex Lustre 2.1.0.x4-74! • Client Hardware! – 8 Clients, each configured with QDR IB, 48GB Memory and 12 Cores! – Scientific Linux 6.5 with Stock OFED! Benchmark Setup!
  4. 4. • Used mpirun to execute IOR with --byslot distribution! • IOR Parameters that were constant:! –  -F : File Per Process! –  -B : Direct IO Operation! –  -t 64m: 64m Transfer Size per Task! –  -b: 1024g for Single Thread and 512g for multiple tasks! –  -D: stonewall option, write for 4 minutes, and read for 2 minutes! • Lustre Settings! – Stripe Count of 1! – Stripe Size of 1m! IOR Parameters and Settings!
  5. 5. • Compared the following clients:! – 1.8.9, 2.1.6, 2.4.3, and 2.5.1! • Collected raw performance data using the following client settings with and without checksums enabled! – max_rpcs_in flight / max_dirty_mb! •  32 / 128! •  64 / 256! •  128 / 512! •  256 / 1024! Lustre Clients!
  6. 6. Single Thread Write Performance! 0 200 400 600 800 1,000 1,200 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Write Performance - Checksums Disabled 1.8.9 CS=0 Write (MB/s) 2.1.6 CS=0 Write (MB/s) 2.4.3 CS=0 Write (MB/s) 2.5.1 CS=0 Write (MB/s)
  7. 7. Single Thread Read Performance! - 200.00 400.00 600.00 800.00 1,000.00 1,200.00 1,400.00 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Read Performance Checksums Disabled 1.8.9 CS=0 Read (MB/s) 2.1.6 CS=0 Read (MB/s) 2.4.3 CS=0 Read (MB/s) 2.5.1 CS=0 Read (MB/s)
  8. 8. Impact on Single Thread Performance when Checksums were Enabled!  ! Checksums Impact - Single Thread!  Version!  ! Reads! Writes! 1.8.9! Performance Impact Difference (MB/s)! 217.80 ! 336.84 ! Percentage Impact Reduced! 19%! 35%! 2.1.6! Performance Impact Difference (MB/s)! 37.70 ! 27.81 ! Percentage Impact Reduced! 3%! 6%! 2.4.3! Performance Impact Difference (MB/s)! 45.51 ! 9.14 ! Percentage Impact Reduced! 6%! 1%! 2.5.1! Performance Impact Difference (MB/s)! 44.91 ! 6.50 ! Percentage Impact Reduced! 6%! 1%!
  9. 9. 1.8.9 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  10. 10. 2.1.6 Client Performance Results! 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  11. 11. 2.4.3 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  12. 12. 2.5.1 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  13. 13. Average Negative Impact on Performance when Checksums Enabled! Checksums Impact (max_rpcs_in_flight/max_dirty_mb) Reads 32/128 Write 32/128 Read 64/256 Write 65/256 Read 128/512 Write 128/512 Read 256/1024 Write 256/1024 1.8.9 Average Impact Difference (MB/s) 203.40 398.09 3.12 434.41 71.03 523.17 31.19 502.31 Average Impact Percentage 2% 10% -1% 10% 0% 13% 0% 12% 2.1.6 Average Impact Difference (MB/s) 541.64 269.93 446.17 196.06 445.79 189.33 453.34 180.07 Average Impact Percentage 22% 12% 19% 9% 18% 8% 9% 8% 2.4.3 Average Impact Difference (MB/s) 333.18 94.53 277.32 77.16 380.47 194.21 326.76 315.71 Average Impact Percentage 7% 3% 5% 2% 7% 4% 5% 6% 2.5.1 Average Impact Difference (MB/s) 486.11 161.09 245.24 154.50 366.05 256.85 346.78 423.12 Average Impact Percentage 12% 5% 5% 4% 6% 6% 6% 9% Average results using 4-96 Threads across 1-8 Clients
  14. 14. 1.8.9, 2.4.3, 2.5.1 Overall Write Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Writes Comparison - Checksums Disabled 1.8.9 Write 32/128/0 1.8.9 Write 64/256/0 1.8.9 Write 128/512/0 1.8.9 Write 256/1024/0 2.4.3 Write 32/128/0 2.4.3 Write 64/256/0 2.4.3 Write 128/512/0 2.4.3 Write 256/1024/0 2.5.1 Write 32/128/0 2.5.1 Write 64/256/0 2.5.1 Write 128/512/0 2.5.1 Write 256/1024/0
  15. 15. 1.8.9, 2.4.3, 2.5.1 Overall Read Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Reads Comparison - Checksums Disabled 1.8.9 Read 32/128/0 1.8.9 Read 64/256/0 1.8.9 Read 128/512/0 1.8.9 Read 256/1024/0 2.4.3 Read 32/128/0 2.4.3 Read 64/256/0 2.4.3 Read 128/512/0 2.4.3 Read 256/1024/0 2.5.1 Read 32/128/0 2.5.1 Read 64/256/0 2.5.1 Read 256/1024/0 2.5.1 Read 256/1024/0
  16. 16. Maximizing Performance for Each Client! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 32 32 96 96 32 32 32 32 Read 32/128/0 Write 32/128/0 Read 128/512/0 Write 128/512/0 Read 256/1024/0Write 256/1024/0Read 256/1024/0Write 256/1024/0 1.8.9 2.1.6 2.4.3 2.5.1 MB/s Client Reads and Writes RPCs/Dirty_MB/Checksums Max Performance Comparison - Checksums Disabled
  17. 17. • In general, 1.8.9 Client performed the same regardless of client settings! • 2.4.3 and 2.5.1 followed the same performance curve! – Clients settings need to be increased to achieve maximum storage throughput! – Both max_rpcs_in_flight and max_dirty_mb need to be increased to at lest 256! •  Anything less than 256 will result in less than optimal Storage performance! • The rule of thumb: max_dirty_mb = max_rpcs_in_flight * 4 
 is not holding true with Client versions 2.4.3 and 2.5.1! Interesting Results Analyzing Raw Data!
  18. 18. • 1.8.9 single thread performance results are the highest, but 2.4.3 and 2.5.1 improved over previous 2.x client versions! • With the right client settings, 2.4.3 and 2.5.1 client versions can maximize storage throughput, along with 1.8.9 clients! • 2.1.6 Client underperformed, regardless of client tuning! • Checksums impact varied with the number of threads, but, on average, not a “big” performance impact! – Biggest impact on performance with Checksums enabled was on the Single Thread tests! • 2.4.3 and 2.5.1 performance results were similar across all threads and client tuning parameters! Summary!
  19. 19. Thank You John_Fragalla@xyratex.com
  • WuHongGuang

    Jul. 6, 2016
  • ssuser15375d

    Jun. 22, 2016
  • sorril

    Jan. 31, 2015
  • vonfactory

    Oct. 23, 2014
  • tsubame9590206

    Apr. 9, 2014

In this presentation from the Lustre User Group 2014, John Fragalla from Xyratex presents: Lustre Client Performance Comparison and Tuning (1.8.x to 2.x). Watch the video presentation: http://wp.me/p3RLHQ-bJ3 Learn more: http://www.opensfs.org/lug-2014-agenda

Views

Total views

3,042

On Slideshare

0

From embeds

0

Number of embeds

29

Actions

Downloads

52

Shares

0

Comments

0

Likes

5

×