SlideShare une entreprise Scribd logo
1  sur  43
Little’s Law in 3D and Storage Performance
                                NorCal CMG Meeting


                                    Dr. Neil Gunther

                                    Performance Dynamics


                                      August 7, 2012



                                                                 SM




c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   1 / 34
Background


Outline

1     Background
        Review Little’s Law
        The Utilization Law
2     Throughput-Delay Plots
        Need for Speed
        Benchmarking Paradox
        Paradox Resolved
3     Storage Performance
        Throughput
        Latency
        Concurrency
4     Conclusion

    c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   2 / 34
Background


Little’s Law



                                                                     1    What is it?




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance    August 7, 2012   3 / 34
Background


Little’s Law



                                                                     1    What is it?
                                                                                 N = XR




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance      August 7, 2012   3 / 34
Background


Little’s Law



                                                                     1    What is it?
                                                                                 N = XR
                                                                                 An immutable law of
                                                                                 performance 1




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance          August 7, 2012   3 / 34
Background


Little’s Law



                                                                     1    What is it?
                                                                                 N = XR
                                                                                 An immutable law of
                                                                                 performance 1
                                                                     2    Why is it important?




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance          August 7, 2012   3 / 34
Background


Little’s Law



                                                                     1    What is it?
                                                                                 N = XR
                                                                                 An immutable law of
                                                                                 performance 1
                                                                     2    Why is it important?
                                                                                 L = λW proven 1961




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance          August 7, 2012   3 / 34
Background


Little’s Law



                                                                     1    What is it?
                                                                                 N = XR
                                                                                 An immutable law of
                                                                                 performance 1
                                                                     2    Why is it important?
                                                                                 L = λW proven 1961
                                                                                 Algebraic simplification




  1
      If your data don’t fit LL, change your data!
 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance           August 7, 2012   3 / 34
Background


Little’s Law



                                                                      1    What is it?
                                                                                  N = XR
                                                                                  An immutable law of
                                                                                  performance 1
                                                                      2    Why is it important?
                                                                                  L = λW proven 1961
                                                                                  Algebraic simplification
                                                                                  Cross-checking




J.D.C. Little’s lore (in his own words): perfdynamics.blogspot.com/2011/07/

   1
       If your data don’t fit LL, change your data!
  c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance           August 7, 2012   3 / 34
Background


A Little Historical Perspective

     LL is not based on queueing theory
     LL relates inventory and manufacturing cycle time
     John Little (now 84) is not a computer performance analyst
     Prof. Little did not invent his own law
     LL was known to A. K. Erlang more than 100 years ago
     There are actually two versions of Little’s law


A Paradox
 1   LL expresses the fact that R decreases with increasing X
 2   Benchmarks show R increases with increasing throughput X


 c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   4 / 34
Background


Purpose of This Talk




 1   Review LL (both versions)
 2   Resolve the XR paradox by introducing 3D version of LL
 3   Apply LL to understand IOPS bottleneck




 c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   5 / 34
Background     Review Little’s Law


Little’s Law at the System Level
In steady state, the mean rate of arrival (λ) of customers into a system is equal to the mean
output rate or throughput (X ) of customers departing the system.
                                                λ=X                                             (1)




The total number of customers, requests, processes, threads (N) in the system is given by:

                                           N = λR = XR                                          (2)
where R is the mean total time spent in the system.

Classic Little’s law
N is the mean number of customers/requests in residence.

  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance     August 7, 2012   6 / 34
Background                                                         Review Little’s Law


Little’s Law at the Device Level

If the system is like a grocery store, the device level is like a checkout lane.




At any device (labelled k = 1, 2, . . .), equation (2) yields the local number of customers/requests
(Qk ) enqueued:                                                                                                                                                           Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com




                                                Qk = λRk                                          (3)
where Rk is the time in residence at the device. Rk is defined as the sum of the service time                                 moc.topsgolb.sgolbakrap ot knil tsuj ro ,morf siht dedaolnwod uoy egap eht ot kcab knil esaelP


(Sk ) at the cashier and the time (Wk ) spent waiting to get serviced by the cashier:                       Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com




                                                                                                           Rk = Wk + Sk
                            Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com

                                                                                                                                                                                                     Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com                    (4)

The total number, N, in the global system (2) is the sum of all the customers/requests enqueued
at each device:
                                    N = Q1 + Q2 + · · · + Qk                                  (5)



  c 2012 Performance Dynamics                          Little’s Law in 3D and Storage Performance                                                                                                                                                                                                     August 7, 2012   7 / 34
Background     The Utilization Law


Little’s Law and Device Utilization
The utilization of the device comes from (3) by ignoring the waiting time contribution. Logically,
this is equivalent to letting W → 0:

                                         Qk = λRk
                                              = λ(Wk + Sk )
                                              → λSk                                               (6)

We changed the right side of (6), so the left side must also be changed. But to what? It has to be
number (like N) and Qk can be unbounded: Qk < ∞ (but not infinite).
Call the “new” number ρk (to agree with queueing literature) so that (6) becomes:

                                              ρk = λSk                                            (7)

Since the cashier cannot service more than one customer at a time:

                                                ρk < 1                                            (8)

or ρk < 100%, on average.

Little’s utilization law
The utilization ρk is the mean number of customers/requests in service at device k .

  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance       August 7, 2012   8 / 34
Throughput-Delay Plots


Outline

1     Background
        Review Little’s Law
        The Utilization Law
2     Throughput-Delay Plots
        Need for Speed
        Benchmarking Paradox
        Paradox Resolved
3     Storage Performance
        Throughput
        Latency
        Concurrency
4     Conclusion

    c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   9 / 34
Throughput-Delay Plots   Need for Speed


Speed, Distance and Time

Example
Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you
travel in 15 minutes?




  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   10 / 34
Throughput-Delay Plots   Need for Speed


Speed, Distance and Time

Example
Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you
travel in 15 minutes?


Answer
In a quarter of an hour you will travel one quarter the distance you would have covered in an
hour. Therefore, in 15 minutes you will travel 15 miles.

Congratulations! You just used LL without realizing it.
Let X be the speed, R the elapsed time and N the miles covered:

                                            N=XR
                                                              15
                                    15 miles = 60 mph ×          hours
                                                              60




  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   10 / 34
Throughput-Delay Plots   Need for Speed


Speed and Delay are Inversely Related
Example
Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
fast do you need to go?




  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   11 / 34
Throughput-Delay Plots   Need for Speed


Speed and Delay are Inversely Related
Example
Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
fast do you need to go?

The answer may not be so obvious, but not to worry. We can still use LL.

Answer

                                                N=XR
                                                           10
                                        15 miles = X ×        hours
                                                           60

Solving for X:
                                          X = 15 × 6 = 90 mph




  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   11 / 34
Throughput-Delay Plots   Need for Speed


Speed and Delay are Inversely Related
Example
Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
fast do you need to go?

The answer may not be so obvious, but not to worry. We can still use LL.

Answer

                                                N=XR
                                                           10
                                        15 miles = X ×        hours
                                                           60

Solving for X:
                                          X = 15 × 6 = 90 mph


Theorem (Inverse Proportion of LL)
To reduce the delay R (elapsed time), the speed X must be increased.

  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   11 / 34
Throughput-Delay Plots     Need for Speed


XR Plots of LL
       X                                                              R

      15                                                             15




      10                                                             10




                                N   50                                                         N   50

       5                                                              5

                       N   15                                                         N   15


               N   1                                                          N   1
       0                                                R             0                                                X
           0           5             10            15                     0           5             10            15




     Example was for the N = 15 miles curve
     Time for N = 15 miles is reduced by going from green to red dot
     Different distance means a different curve
     Curves are symmetric about the diagonal
     Can flip X and R axes w/o changing the curves
     Independent variable goes on x-axis

 c 2012 Performance Dynamics              Little’s Law in 3D and Storage Performance                     August 7, 2012    12 / 34
Throughput-Delay Plots                 Benchmarking Paradox


Benchmark XR Plots




     SPEC SFS: Performance Plots
                                                                                                 !"#$%&#'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#(   (

                                            NSPLab Dec 2007                                                                                 Hitachi Jan 2012
                     50

                     45
       Response Time (mSec)




                     40

                     35

                     30

                     25

                     20

                     15
                                                                            SC2000
                     10
                                                                            NS6000
                              5

                              0
                                  0   500    1000       1500    2000      2500       3000

                                                    NFSops/Second




                                              SPEC SFS97                                                                           Fusion-io SQLServer 2010
                                                                                                 @$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:(
                                                                                                 I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E(
                                                                                                   8
                                                                                                 &#(B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N<OM(O"$-=(+&"#'*(,*(%,*'(N<O"(B,2A/(7&8'(
                                                                                                 $-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M(
 c 2012 Performance Dynamics                                        Little’s Law in 3D and Storage Performance
                                                                                         Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M( 34
                                                                                                                                                    August 7, 2012             13 /
Throughput-Delay Plots   Benchmarking Paradox


LL is 3-Dimensional




                                                                                                        2.0
                       100

                                                                                                  1.5
                      N   50

                                                                                            1.0
                               0
                                                                                                  R s
                                   0

                                                                                      0.5
                                                   20

                                                X QPS
                                                                    40
                                                                                0.0




     Three variables (like PVT in chemistry)
     3D surface
     Like a cone but not rotationally symmetric about apex
     Square edges cause hyperbolic contours
 c 2012 Performance Dynamics             Little’s Law in 3D and Storage Performance                           August 7, 2012   14 / 34
Throughput-Delay Plots          Benchmarking Paradox


     Fusion-io Benchmark
!"#$%&#'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#(   (




                                                                                                    R

                                                                                                  1.5




                                                                                                  1.0




                                                                                                  0.5




                                                                                                  0.0                                                 X
                                                                                                        5   10      15      20          25       30

                               Actual data (with and without FIO)                                                  Extracted data
@$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:(
I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E(
&#(B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N<OM(O"$-=(+&"#'*(,*(%,*'(N<O"(B,2A/(7&8'(
$-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M(
                  SQL Server RDBMS: Measure X in QPS and R in s at each load (N)
Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M(
J#($"(&.7$'8$-=(.*&BA(&-/(12'*0(A,&/("')&*&#$,-(H0(2"$-=(&-(&//$#$,-&A("'#(,+("'*8'*"($-(&(
                  Two curves: before (red) and after (blue) application of FIO device
/'/$.&#'/("'&*.7(*,BM(F;:(='#"(",%'(/'=*&/&#$,-E(&"(.,-#'-#()*,.'""$-=(&-/(12'*$'"(.,%)'#'(
+,*(#7'("&%'(N<O(*'",2*.'"M(5#$AAE(F;:(&.7$'8'"(#7'("&%'(9:(I<5(&"(F?(2-/'*(#7'(7$=7'"#(A,&/(
.,-/$#$,-"M(4A",(-,#'(#7&-(#7'(.,-#'-#(.*&BA$-=(*&#'(,-(F;:($"(9:P(7$=7'*(#7&-(,-(F?(/2*$-=(
                  Manually extracted pertinent data points
#7$"(#'"#E(&"(#7'($-.*'&"'/(JK()'*+,*%&-.'(&AA,B"(+,*(H'##'*(7&-/A$-=(,+(#7'(.,-.2**'-#(
,)'*&#$,-"M(

"#$%!&$'()!
R(

*+,)-!,#$%!&$'()!
67'(#&HA'(H'A,B("7,B"(#7'(.,%H$-'/($-.*'&"'($-(/$"L(2"&='(,-(&AA(-,/'"(&+#'*(#7'(8&*$,2"(
.,-#'-#(",2*.'"(7&8'(H''-($-/'S'/M(
          c 2012 Performance Dynamics                                 Little’s Law in 3D and Storage Performance                    August 7, 2012        15 / 34
Throughput-Delay Plots     Paradox Resolved


Back to the Paradox
The XR Paradox
  1     LL says R decreases with increasing X (3D contour lines)
  2     Benchmarks show R increases with increasing throughput X

        R                                                        R
                                                              2.0
      1.5


                                                              1.5

      1.0

                                                              1.0


      0.5
                                                              0.5



      0.0                                              X      0.0                                                       X
            5   10    15        20        25      30                0    10         20       30        40          50

                     Extracted data                                           Data moves on LL contours



The Resolution
Superimpose LL 3D contours onto 2D benchmark data.

  c 2012 Performance Dynamics         Little’s Law in 3D and Storage Performance                  August 7, 2012            16 / 34
Throughput-Delay Plots        Paradox Resolved


2D Projection of 3D Surface
                          R s



                         1.4



                         1.2



                         1.0



                         0.8



                         0.6



                                                                                 X QPS
                                   22        24          26       28       30



Theorem (Gunther 2012)
All benchmark data “moves” along LL contours.

  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance             August 7, 2012   17 / 34
Storage Performance


Outline

1     Background
        Review Little’s Law
        The Utilization Law
2     Throughput-Delay Plots
        Need for Speed
        Benchmarking Paradox
        Paradox Resolved
3     Storage Performance
        Throughput
        Latency
        Concurrency
4     Conclusion

    c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   18 / 34
Storage Performance   Throughput


System Level Query Rate
Example
Suppose processing a query requires the execution of 100 K instructions on the CPU. The CPU can
execute 10 GIPS.


   1   IPQ: 100 K = 100 × 103 instruction per application query
   2   IPS: 10 GIPS = 10 × 109 cpu instructions per second
The throughput (or request rate) for queries is:
                                                  IPS
                                         λQPS =
                                                  IPQ
                                                   10 × 109
                                                =
                                                  100 × 103
                                                  1010
                                                =
                                                   105
                                                = 100, 000

The steady state assumption (1) tells us:

                                    λQPS = 100 KQPS = XQPS                                     (9)

A maximum of 100 KQPS can be processed
  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   19 / 34
Storage Performance   Throughput


Storage Device IO Rate

Example (cont’d)
Assume further that within the query instructions a single IO is issued.     The CPU thread must
wait before the rest of the query instructions can be completed.

This creates a nice convenience since λIOPS ≡ λQPS .


                                                     QPS
                                          λIOPS =
                                                    IOPQ
                                                    105
                                                  =
                                                     1
                                                  = 100, 000


                                   λIOPS = 100 KIOPS = XIOPS                                    (10)


Device IOPS
But this is aggregate IOPS. How many IOPS can a single disk do?

  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance     August 7, 2012   20 / 34
Storage Performance   Throughput


Device IOPS Rating

Example (Seagate IOPS)
A Seagate Barracuda 7200 RPM disk is capable of about 100 IOPS. Follows from combined
seek time and RPS time being on the order of 10 ms. Hence:

                                                    1
                                        IOPS =          = 100                                 (11)
                                                  0.010

Simple arithmetic suggests that 1000 Seagate Barracudas would needed to accommodate the
100 KIOPS aggregate throughput being considered here.


Caveat emptor
Note that (11) is a rearrangement of the LL utilization law (7):

                                                        ρ
                                            λIOPS =                                           (12)
                                                       Sdisk

with ρ = 1. Hence, it is the theoretical maximum possible IOPS that this disk can support. In
practice, the sustainable IOPS rate will be considerably lower.


  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012     21 / 34
Storage Performance   Latency


Storage Latency
Example (cont’d)
If the storage device is capable of responding to an IO request in 1 ms (10x Seagate
Barracuda), the processor needs to issue 100 concurrent IO requests to the storage system so
that it can complete 100 KQPS. If the storage device were 10 times faster (e.g., SSD), then the
processor would only need to be handing a 10th as many IO requests, or just 10 concurrent
requests.




                                              Sdisk = 10−3 s
                                              Sssd = 10−4 s
Applying the LL utilization law (7):
                                ρdisk = λIOPS Sdisk = 105 × 10−3 = 100                          (13)
Suggests we need more than 100 spindles. Similary, for faster SSD devices:
                                ρssd = λIOPS Sssd = 105 × 10−4 = 10                             (14)

Latency
Latency is an ill-defined word that means different things to different technical people. Need the
more exacting language of queueing theory to see where different latencies arise.

  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance   August 7, 2012   22 / 34
Storage Performance   Latency


Tandem Queue Model

Since computer systems are not deterministic, we represent CPU and storage as a queueing
network with two stages:


                         Src
                                !                Scpu                   Sdev
                                                                                 !   Snk




Queries are sourced by the application at an aggregate request rate of λ = 100 KQPS and the
CPU issues IO requests at the rate of 100 KIOPS.
However, from (13) we know ρdisk = 100 or 10,000% !!

Trouble
This violates the utilization bound ρdisk < 1 given by (8).

We already suspected we would need at least 100 spindles from (13).

But how should the disks be arranged to give the correct latencies?




  c 2012 Performance Dynamics       Little’s Law in 3D and Storage Performance             August 7, 2012   23 / 34
Storage Performance    Latency


Parallel Disk Queues
The message from LL (8) is that we need many (q) disks operating in parallel.

                                                          !/q


                                                          !/q
                                !                    !                           !
                       Source                 Scpu                                   Sink
                                                          !/q


                                                          !/q




Parallel disks divide the total throughput (λ) into q substreams, each load-balanced with equal
rate λ/q. Moreover, considering (13), we can write:

                                                          100
                                               ρdisk =        <1                                             (15)
                                                           q

LL tells us we actually need more than q = 100 disks to satisfy the utilization bound.

Disk Arrays
This is why typical storage subsystems are configured as arrays.

  c 2012 Performance Dynamics       Little’s Law in 3D and Storage Performance              August 7, 2012   24 / 34
Storage Performance    Latency


CPU Latency
CPU service time (i.e., execution time) for a query:

                                              IPQ
                                   SCPU =         = 10−5 seconds                               (16)
                                              IPS

i.e., 10 µs per query . The mean CPU utilization is:

                                ρCPU = λQPS SCPU = 105 × 10−5 = 1

which is right on the edge of the utilization bound.

                                                       Scpu




                                       Src             Scpu        Snk




                                                       Scpu




So, we need more than one core or execution unit.

Duo-core
LL tells us we need a duo-core, at least, to meet the utilization bound.

  c 2012 Performance Dynamics    Little’s Law in 3D and Storage Performance   August 7, 2012   25 / 34
Storage Performance   Concurrency


Multicore with Infinite IOPS

Example (cont’d)
If the storage system is capable of responding to an IO request in 1 ms
······
If the storage were 10 times faster in responding with I/O requests...

These numbers become Sdev in the following diagram.


                                                                      Sdev
                                                       !/q



                                         Scpu          !/q            Sdev

                                                                              !
                             !                     !                              Snk
                       Src                             !/q


                                         Scpu          !/q

                                                                      Sdev




We use this queueing model to examine both latency and concurrency effects.
“Infinite IOPS” is represented by 1000 parallel storage devices.


  c 2012 Performance Dynamics    Little’s Law in 3D and Storage Performance             August 7, 2012   26 / 34
Storage Performance    Concurrency


Queueing Model Results

Example (cont’d)
If the storage system is capable of responding to I/O requests in 1/1,000th of a second, then
the CPU will need to issue N = 100 concurrent requests
······
If the storage were 10 times faster then the processor would only need to be handing 1/10th as
many concurrent requests, or just N = 10 concurrent requests.



                                          Latency                           Concurrency
              Device (#)        Service        Residence                    Qk        N
              CPU (2)           0.00001        0.0000133333              1.33333   1.33333
              Disk (1000)       0.001000       0.001111                  0.1111    111.1
              SSD (1000)        0.000100       0.0001010                 0.01010   10.10
              FIOa (1000)       1.000 × 10−6   1.000 × 10−6              0.0001    0.1000
              FIOb (1)          1.000 × 10−6   1.111 × 10−6              0.1111    0.1111

The overall time in the system, per LL in eqn. (2), is the sum of the CPU residence time (1st row,
3rd column) and the residence time of an IO at the respective storage device.
With 1000 disks, N = 111.1 concurrent IOs.
With 1000 SSDs, N = 10.1 concurrent IOs.
With 1000 FIOs, N = 0.1 concurrent IOs. But wait! It gets even better...

  c 2012 Performance Dynamics     Little’s Law in 3D and Storage Performance        August 7, 2012   27 / 34
Conclusion


Outline

1     Background
        Review Little’s Law
        The Utilization Law
2     Throughput-Delay Plots
        Need for Speed
        Benchmarking Paradox
        Paradox Resolved
3     Storage Performance
        Throughput
        Latency
        Concurrency
4     Conclusion

    c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   28 / 34
Conclusion


Latency Trumps IOPS
 CPU     The residence time RCPU is 33% bigger than query execution time, SCPU .
         In general, this time can be reduced further with more cores.
 Disk    All 1000 disks have S = 1 ms service time.
         Residence time is twice the service time.
         Concurrent IO threads Nio = 111.
         These threads also have to be managed by the OS (not shown).
         Threads management also uses up CPU cycles (not shown).
         Response time = 0.000013 + 0.001111 is dominated by disk latency.
 SSD     Faster “SSD” (10x) with nominal S = 0.1 ms service time.
         Residence time is now close to service time.
         Concurrency is also reduced by 10x to N = 10 threads.
         Response time = 0.000013 + 0.0001010 still dominated by storage latency.
 FIOa    Fusion flash service time S = 1 microsecond.
         Residence time is equal to the device service time.
         Concurrent IO threads N = 0.1 are negligible.
         Response time = 0.000013 + 0.000001 is now CPU-bound.
 FIOb    Bigger message: Don’t need 1000 Fusion flash devices.
         Small NFIOa = 0.1 means a single FIO device has same IO concurrency.
         A single Fusion card can replace 1000 standard devices!
          SAN in your hand


 c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   29 / 34
Conclusion


ioDrive2 Duo 2.4TB
From the Fusion-io web site




      Read bandwidth 3.0 GB/s                                   Random read 285,000 IOPS
      Write bandwidth 2.5 GB/s                                  Random write 725,000 IOPS
      Sequential read 892,000 IOPS                              Read access latency 68 µs
      Sequential write 935,000 IOPS                             Write access latency 15 µs

  c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance        August 7, 2012   30 / 34
Conclusion


Summary


     LL is really 3D (3 variables: N, X , R).
     LL has 2 versions: N = XR (with waiting) and ρ = XS (no waiting).
     Assume no bandwidth limit and choose throughput target (here, 100 KQPS).
     With current tech, LL tells us we need parallel devices (disk array, multicore).
     Storage “latency” (service times) orders of magnitude longer than CPU execution times.
     The number of outstanding IOs determines the the total (response) time in the system to
     complete each application query: R = W + S.
     Rstor    Rcpu so, storage latency dominates system response time.
     If can make Rstor         Rcpu , then outstanding IOs become negligible.
     Application query times determined soley by the CPU execution time.
     A CPU-bound application is always the optimal goal.
     Fusion-io also eliminates IO controller latency: all data gets closer to CPU.




 c 2012 Performance Dynamics       Little’s Law in 3D and Storage Performance   August 7, 2012   31 / 34
Conclusion




                    Table: Comparative storage device attributes
                      Storage Type                  Relative Latency       Relative
                Technology Persistent              Controller Device        Cost
                Disk             Yes                 High        High        Low
                SSD              Yes                 High        Low        High
                Fusion-IO        Yes                 Low         Low        High
                RAM               No                 Low         Low       Highest




c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance        August 7, 2012   32 / 34
Conclusion




Guerrilla Training
            Wanna learn about more stuff like this? Come to class
 c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   33 / 34
Conclusion




                          Thank you for your participation




Performance Dynamics Company
Castro Valley, California
www.perfdynamics.com
perfdynamics.blogspot.com
twitter.com/DrQz
facebook.com/Performance-Dynamics -Company
info@perfdynamics.com
+1-510-537-5758




 c 2012 Performance Dynamics   Little’s Law in 3D and Storage Performance   August 7, 2012   34 / 34

Contenu connexe

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

En vedette (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Little's Law in 3D and Storage Performance

  • 1. Little’s Law in 3D and Storage Performance NorCal CMG Meeting Dr. Neil Gunther Performance Dynamics August 7, 2012 SM c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 1 / 34
  • 2. Background Outline 1 Background Review Little’s Law The Utilization Law 2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved 3 Storage Performance Throughput Latency Concurrency 4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 2 / 34
  • 3. Background Little’s Law 1 What is it? 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 4. Background Little’s Law 1 What is it? N = XR 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 5. Background Little’s Law 1 What is it? N = XR An immutable law of performance 1 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 6. Background Little’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 7. Background Little’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 8. Background Little’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 Algebraic simplification 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 9. Background Little’s Law 1 What is it? N = XR An immutable law of performance 1 2 Why is it important? L = λW proven 1961 Algebraic simplification Cross-checking J.D.C. Little’s lore (in his own words): perfdynamics.blogspot.com/2011/07/ 1 If your data don’t fit LL, change your data! c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34
  • 10. Background A Little Historical Perspective LL is not based on queueing theory LL relates inventory and manufacturing cycle time John Little (now 84) is not a computer performance analyst Prof. Little did not invent his own law LL was known to A. K. Erlang more than 100 years ago There are actually two versions of Little’s law A Paradox 1 LL expresses the fact that R decreases with increasing X 2 Benchmarks show R increases with increasing throughput X c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 4 / 34
  • 11. Background Purpose of This Talk 1 Review LL (both versions) 2 Resolve the XR paradox by introducing 3D version of LL 3 Apply LL to understand IOPS bottleneck c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 5 / 34
  • 12. Background Review Little’s Law Little’s Law at the System Level In steady state, the mean rate of arrival (λ) of customers into a system is equal to the mean output rate or throughput (X ) of customers departing the system. λ=X (1) The total number of customers, requests, processes, threads (N) in the system is given by: N = λR = XR (2) where R is the mean total time spent in the system. Classic Little’s law N is the mean number of customers/requests in residence. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 6 / 34
  • 13. Background Review Little’s Law Little’s Law at the Device Level If the system is like a grocery store, the device level is like a checkout lane. At any device (labelled k = 1, 2, . . .), equation (2) yields the local number of customers/requests (Qk ) enqueued: Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Qk = λRk (3) where Rk is the time in residence at the device. Rk is defined as the sum of the service time moc.topsgolb.sgolbakrap ot knil tsuj ro ,morf siht dedaolnwod uoy egap eht ot kcab knil esaelP (Sk ) at the cashier and the time (Wk ) spent waiting to get serviced by the cashier: Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Rk = Wk + Sk Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com (4) The total number, N, in the global system (2) is the sum of all the customers/requests enqueued at each device: N = Q1 + Q2 + · · · + Qk (5) c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 7 / 34
  • 14. Background The Utilization Law Little’s Law and Device Utilization The utilization of the device comes from (3) by ignoring the waiting time contribution. Logically, this is equivalent to letting W → 0: Qk = λRk = λ(Wk + Sk ) → λSk (6) We changed the right side of (6), so the left side must also be changed. But to what? It has to be number (like N) and Qk can be unbounded: Qk < ∞ (but not infinite). Call the “new” number ρk (to agree with queueing literature) so that (6) becomes: ρk = λSk (7) Since the cashier cannot service more than one customer at a time: ρk < 1 (8) or ρk < 100%, on average. Little’s utilization law The utilization ρk is the mean number of customers/requests in service at device k . c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 8 / 34
  • 15. Throughput-Delay Plots Outline 1 Background Review Little’s Law The Utilization Law 2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved 3 Storage Performance Throughput Latency Concurrency 4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 9 / 34
  • 16. Throughput-Delay Plots Need for Speed Speed, Distance and Time Example Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you travel in 15 minutes? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34
  • 17. Throughput-Delay Plots Need for Speed Speed, Distance and Time Example Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you travel in 15 minutes? Answer In a quarter of an hour you will travel one quarter the distance you would have covered in an hour. Therefore, in 15 minutes you will travel 15 miles. Congratulations! You just used LL without realizing it. Let X be the speed, R the elapsed time and N the miles covered: N=XR 15 15 miles = 60 mph × hours 60 c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34
  • 18. Throughput-Delay Plots Need for Speed Speed and Delay are Inversely Related Example Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How fast do you need to go? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
  • 19. Throughput-Delay Plots Need for Speed Speed and Delay are Inversely Related Example Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How fast do you need to go? The answer may not be so obvious, but not to worry. We can still use LL. Answer N=XR 10 15 miles = X × hours 60 Solving for X: X = 15 × 6 = 90 mph c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
  • 20. Throughput-Delay Plots Need for Speed Speed and Delay are Inversely Related Example Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How fast do you need to go? The answer may not be so obvious, but not to worry. We can still use LL. Answer N=XR 10 15 miles = X × hours 60 Solving for X: X = 15 × 6 = 90 mph Theorem (Inverse Proportion of LL) To reduce the delay R (elapsed time), the speed X must be increased. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34
  • 21. Throughput-Delay Plots Need for Speed XR Plots of LL X R 15 15 10 10 N 50 N 50 5 5 N 15 N 15 N 1 N 1 0 R 0 X 0 5 10 15 0 5 10 15 Example was for the N = 15 miles curve Time for N = 15 miles is reduced by going from green to red dot Different distance means a different curve Curves are symmetric about the diagonal Can flip X and R axes w/o changing the curves Independent variable goes on x-axis c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 12 / 34
  • 22. Throughput-Delay Plots Benchmarking Paradox Benchmark XR Plots SPEC SFS: Performance Plots !"#$%&#'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#( ( NSPLab Dec 2007 Hitachi Jan 2012 50 45 Response Time (mSec) 40 35 30 25 20 15 SC2000 10 NS6000 5 0 0 500 1000 1500 2000 2500 3000 NFSops/Second SPEC SFS97 Fusion-io SQLServer 2010 @$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:( I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E( 8 &#(B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N<OM(O"$-=(+&"#'*(,*(%,*'(N<O"(B,2A/(7&8'( $-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M( c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M( 34 August 7, 2012 13 /
  • 23. Throughput-Delay Plots Benchmarking Paradox LL is 3-Dimensional 2.0 100 1.5 N 50 1.0 0 R s 0 0.5 20 X QPS 40 0.0 Three variables (like PVT in chemistry) 3D surface Like a cone but not rotationally symmetric about apex Square edges cause hyperbolic contours c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 14 / 34
  • 24. Throughput-Delay Plots Benchmarking Paradox Fusion-io Benchmark !"#$%&#'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#( ( R 1.5 1.0 0.5 0.0 X 5 10 15 20 25 30 Actual data (with and without FIO) Extracted data @$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:( I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E( &#(B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N<OM(O"$-=(+&"#'*(,*(%,*'(N<O"(B,2A/(7&8'( $-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M( SQL Server RDBMS: Measure X in QPS and R in s at each load (N) Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M( J#($"(&.7$'8$-=(.*&BA(&-/(12'*0(A,&/("')&*&#$,-(H0(2"$-=(&-(&//$#$,-&A("'#(,+("'*8'*"($-(&( Two curves: before (red) and after (blue) application of FIO device /'/$.&#'/("'&*.7(*,BM(F;:(='#"(",%'(/'=*&/&#$,-E(&"(.,-#'-#()*,.'""$-=(&-/(12'*$'"(.,%)'#'( +,*(#7'("&%'(N<O(*'",2*.'"M(5#$AAE(F;:(&.7$'8'"(#7'("&%'(9:(I<5(&"(F?(2-/'*(#7'(7$=7'"#(A,&/( .,-/$#$,-"M(4A",(-,#'(#7&-(#7'(.,-#'-#(.*&BA$-=(*&#'(,-(F;:($"(9:P(7$=7'*(#7&-(,-(F?(/2*$-=( Manually extracted pertinent data points #7$"(#'"#E(&"(#7'($-.*'&"'/(JK()'*+,*%&-.'(&AA,B"(+,*(H'##'*(7&-/A$-=(,+(#7'(.,-.2**'-#( ,)'*&#$,-"M( "#$%!&$'()! R( *+,)-!,#$%!&$'()! 67'(#&HA'(H'A,B("7,B"(#7'(.,%H$-'/($-.*'&"'($-(/$"L(2"&='(,-(&AA(-,/'"(&+#'*(#7'(8&*$,2"( .,-#'-#(",2*.'"(7&8'(H''-($-/'S'/M( c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 15 / 34
  • 25. Throughput-Delay Plots Paradox Resolved Back to the Paradox The XR Paradox 1 LL says R decreases with increasing X (3D contour lines) 2 Benchmarks show R increases with increasing throughput X R R 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 X 0.0 X 5 10 15 20 25 30 0 10 20 30 40 50 Extracted data Data moves on LL contours The Resolution Superimpose LL 3D contours onto 2D benchmark data. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 16 / 34
  • 26. Throughput-Delay Plots Paradox Resolved 2D Projection of 3D Surface R s 1.4 1.2 1.0 0.8 0.6 X QPS 22 24 26 28 30 Theorem (Gunther 2012) All benchmark data “moves” along LL contours. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 17 / 34
  • 27. Storage Performance Outline 1 Background Review Little’s Law The Utilization Law 2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved 3 Storage Performance Throughput Latency Concurrency 4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 18 / 34
  • 28. Storage Performance Throughput System Level Query Rate Example Suppose processing a query requires the execution of 100 K instructions on the CPU. The CPU can execute 10 GIPS. 1 IPQ: 100 K = 100 × 103 instruction per application query 2 IPS: 10 GIPS = 10 × 109 cpu instructions per second The throughput (or request rate) for queries is: IPS λQPS = IPQ 10 × 109 = 100 × 103 1010 = 105 = 100, 000 The steady state assumption (1) tells us: λQPS = 100 KQPS = XQPS (9) A maximum of 100 KQPS can be processed c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 19 / 34
  • 29. Storage Performance Throughput Storage Device IO Rate Example (cont’d) Assume further that within the query instructions a single IO is issued. The CPU thread must wait before the rest of the query instructions can be completed. This creates a nice convenience since λIOPS ≡ λQPS . QPS λIOPS = IOPQ 105 = 1 = 100, 000 λIOPS = 100 KIOPS = XIOPS (10) Device IOPS But this is aggregate IOPS. How many IOPS can a single disk do? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 20 / 34
  • 30. Storage Performance Throughput Device IOPS Rating Example (Seagate IOPS) A Seagate Barracuda 7200 RPM disk is capable of about 100 IOPS. Follows from combined seek time and RPS time being on the order of 10 ms. Hence: 1 IOPS = = 100 (11) 0.010 Simple arithmetic suggests that 1000 Seagate Barracudas would needed to accommodate the 100 KIOPS aggregate throughput being considered here. Caveat emptor Note that (11) is a rearrangement of the LL utilization law (7): ρ λIOPS = (12) Sdisk with ρ = 1. Hence, it is the theoretical maximum possible IOPS that this disk can support. In practice, the sustainable IOPS rate will be considerably lower. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 21 / 34
  • 31. Storage Performance Latency Storage Latency Example (cont’d) If the storage device is capable of responding to an IO request in 1 ms (10x Seagate Barracuda), the processor needs to issue 100 concurrent IO requests to the storage system so that it can complete 100 KQPS. If the storage device were 10 times faster (e.g., SSD), then the processor would only need to be handing a 10th as many IO requests, or just 10 concurrent requests. Sdisk = 10−3 s Sssd = 10−4 s Applying the LL utilization law (7): ρdisk = λIOPS Sdisk = 105 × 10−3 = 100 (13) Suggests we need more than 100 spindles. Similary, for faster SSD devices: ρssd = λIOPS Sssd = 105 × 10−4 = 10 (14) Latency Latency is an ill-defined word that means different things to different technical people. Need the more exacting language of queueing theory to see where different latencies arise. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 22 / 34
  • 32. Storage Performance Latency Tandem Queue Model Since computer systems are not deterministic, we represent CPU and storage as a queueing network with two stages: Src ! Scpu Sdev ! Snk Queries are sourced by the application at an aggregate request rate of λ = 100 KQPS and the CPU issues IO requests at the rate of 100 KIOPS. However, from (13) we know ρdisk = 100 or 10,000% !! Trouble This violates the utilization bound ρdisk < 1 given by (8). We already suspected we would need at least 100 spindles from (13). But how should the disks be arranged to give the correct latencies? c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 23 / 34
  • 33. Storage Performance Latency Parallel Disk Queues The message from LL (8) is that we need many (q) disks operating in parallel. !/q !/q ! ! ! Source Scpu Sink !/q !/q Parallel disks divide the total throughput (λ) into q substreams, each load-balanced with equal rate λ/q. Moreover, considering (13), we can write: 100 ρdisk = <1 (15) q LL tells us we actually need more than q = 100 disks to satisfy the utilization bound. Disk Arrays This is why typical storage subsystems are configured as arrays. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 24 / 34
  • 34. Storage Performance Latency CPU Latency CPU service time (i.e., execution time) for a query: IPQ SCPU = = 10−5 seconds (16) IPS i.e., 10 µs per query . The mean CPU utilization is: ρCPU = λQPS SCPU = 105 × 10−5 = 1 which is right on the edge of the utilization bound. Scpu Src Scpu Snk Scpu So, we need more than one core or execution unit. Duo-core LL tells us we need a duo-core, at least, to meet the utilization bound. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 25 / 34
  • 35. Storage Performance Concurrency Multicore with Infinite IOPS Example (cont’d) If the storage system is capable of responding to an IO request in 1 ms ······ If the storage were 10 times faster in responding with I/O requests... These numbers become Sdev in the following diagram. Sdev !/q Scpu !/q Sdev ! ! ! Snk Src !/q Scpu !/q Sdev We use this queueing model to examine both latency and concurrency effects. “Infinite IOPS” is represented by 1000 parallel storage devices. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 26 / 34
  • 36. Storage Performance Concurrency Queueing Model Results Example (cont’d) If the storage system is capable of responding to I/O requests in 1/1,000th of a second, then the CPU will need to issue N = 100 concurrent requests ······ If the storage were 10 times faster then the processor would only need to be handing 1/10th as many concurrent requests, or just N = 10 concurrent requests. Latency Concurrency Device (#) Service Residence Qk N CPU (2) 0.00001 0.0000133333 1.33333 1.33333 Disk (1000) 0.001000 0.001111 0.1111 111.1 SSD (1000) 0.000100 0.0001010 0.01010 10.10 FIOa (1000) 1.000 × 10−6 1.000 × 10−6 0.0001 0.1000 FIOb (1) 1.000 × 10−6 1.111 × 10−6 0.1111 0.1111 The overall time in the system, per LL in eqn. (2), is the sum of the CPU residence time (1st row, 3rd column) and the residence time of an IO at the respective storage device. With 1000 disks, N = 111.1 concurrent IOs. With 1000 SSDs, N = 10.1 concurrent IOs. With 1000 FIOs, N = 0.1 concurrent IOs. But wait! It gets even better... c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 27 / 34
  • 37. Conclusion Outline 1 Background Review Little’s Law The Utilization Law 2 Throughput-Delay Plots Need for Speed Benchmarking Paradox Paradox Resolved 3 Storage Performance Throughput Latency Concurrency 4 Conclusion c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 28 / 34
  • 38. Conclusion Latency Trumps IOPS CPU The residence time RCPU is 33% bigger than query execution time, SCPU . In general, this time can be reduced further with more cores. Disk All 1000 disks have S = 1 ms service time. Residence time is twice the service time. Concurrent IO threads Nio = 111. These threads also have to be managed by the OS (not shown). Threads management also uses up CPU cycles (not shown). Response time = 0.000013 + 0.001111 is dominated by disk latency. SSD Faster “SSD” (10x) with nominal S = 0.1 ms service time. Residence time is now close to service time. Concurrency is also reduced by 10x to N = 10 threads. Response time = 0.000013 + 0.0001010 still dominated by storage latency. FIOa Fusion flash service time S = 1 microsecond. Residence time is equal to the device service time. Concurrent IO threads N = 0.1 are negligible. Response time = 0.000013 + 0.000001 is now CPU-bound. FIOb Bigger message: Don’t need 1000 Fusion flash devices. Small NFIOa = 0.1 means a single FIO device has same IO concurrency. A single Fusion card can replace 1000 standard devices! SAN in your hand c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 29 / 34
  • 39. Conclusion ioDrive2 Duo 2.4TB From the Fusion-io web site Read bandwidth 3.0 GB/s Random read 285,000 IOPS Write bandwidth 2.5 GB/s Random write 725,000 IOPS Sequential read 892,000 IOPS Read access latency 68 µs Sequential write 935,000 IOPS Write access latency 15 µs c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 30 / 34
  • 40. Conclusion Summary LL is really 3D (3 variables: N, X , R). LL has 2 versions: N = XR (with waiting) and ρ = XS (no waiting). Assume no bandwidth limit and choose throughput target (here, 100 KQPS). With current tech, LL tells us we need parallel devices (disk array, multicore). Storage “latency” (service times) orders of magnitude longer than CPU execution times. The number of outstanding IOs determines the the total (response) time in the system to complete each application query: R = W + S. Rstor Rcpu so, storage latency dominates system response time. If can make Rstor Rcpu , then outstanding IOs become negligible. Application query times determined soley by the CPU execution time. A CPU-bound application is always the optimal goal. Fusion-io also eliminates IO controller latency: all data gets closer to CPU. c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 31 / 34
  • 41. Conclusion Table: Comparative storage device attributes Storage Type Relative Latency Relative Technology Persistent Controller Device Cost Disk Yes High High Low SSD Yes High Low High Fusion-IO Yes Low Low High RAM No Low Low Highest c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 32 / 34
  • 42. Conclusion Guerrilla Training Wanna learn about more stuff like this? Come to class c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 33 / 34
  • 43. Conclusion Thank you for your participation Performance Dynamics Company Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz facebook.com/Performance-Dynamics -Company info@perfdynamics.com +1-510-537-5758 c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 34 / 34