Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 191

Distributed Systems Are a UX Problem

0

Share

Download to read offline

Distributed systems are not strictly an engineering problem. It’s far too easy to assume a backend development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way.

Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.

Tyler Treat looks at distributed systems through the lens of user experience, observing how architecture, design patterns, and business problems all coalesce into UX. Tyler also shares system design anti-patterns and alternative patterns for building reliable and scalable systems with respect to business outcomes.

Topic include:
- The “truth” can be prohibitively expensive: When does strong consistency make sense, and when does it not? How do we reconcile this with application UX?
- Failure as an inevitability: If we can’t build perfect systems, what is “good enough”?
- Dealing with partial knowledge: Systems usually operate in the real world (e.g., an inventory application for a widget warehouse). How do we design for the “disconnect” between the real world and the system?

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Distributed Systems Are a UX Problem

  1. 1. @tyler_treat Distributed Systems Are a
 UX Problem Tyler Treat / O’Reilly Software Architecture Conference / October 30, 2018
  2. 2. @tyler_treat Tyler Treat
 tyler.treat@realkinetic.com
  3. 3. @tyler_treat I like distributed systems.
  4. 4. @tyler_treat
  5. 5. @tyler_treat
  6. 6. @tyler_treat Disclaimer:
 I know approximately nothing about UX…
  7. 7. @tyler_treat …other than when I’m the user, I know when my experience is good and when it’s bad.
  8. 8. @tyler_treat
  9. 9. @tyler_treat UX
  10. 10. @tyler_treat UX Systems
  11. 11. @tyler_treat UX Systems
  12. 12. @tyler_treat UX Systems Business
  13. 13. @tyler_treat UX Systems Business This
 Talk
  14. 14. @tyler_treat The Yin and Yang of UX and Architecture
  15. 15. @tyler_treat Monolith
  16. 16. @tyler_treat Monolith
  17. 17. @tyler_treat Service Service Service Service Service Service Service ServService
  18. 18. @tyler_treat Service Service Service Service Service Service Service ServService
  19. 19. @tyler_treat Service Service Service Service Service Service Service ServService
  20. 20. @tyler_treat Implications
  21. 21. @tyler_treat
  22. 22. @tyler_treat book trip Trip Service Trip Database transaction Good old days
  23. 23. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transaction
  24. 24. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transaction ACID ACID ACID
  25. 25. @tyler_treat UX Implications of Microservices • Data consistency
  26. 26. @tyler_treat Service Service Service Service Service Service Service ServService
  27. 27. @tyler_treat Service Service Service Service Service Service Service ServService
  28. 28. @tyler_treat UX Implications of Microservices • Data consistency • Race conditions
  29. 29. @tyler_treat
  30. 30. @tyler_treat UX Implications of Microservices • Data consistency • Race conditions • Performance
  31. 31. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transaction
  32. 32. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transaction
  33. 33. @tyler_treat UX Implications of Microservices • Data consistency • Race conditions • Performance • Partial failure
  34. 34. @tyler_treat So are microservices bad?
  35. 35. @tyler_treat Microservices are about
 people scale.
  36. 36. @tyler_treat Transparency
  37. 37. @tyler_treat A Study of Transparency and Adaptability of Heterogeneous Computer Networks with TCP/IP and IPv6 Protocols
 Das, 2012 “Any change in a computing system, such as a new feature or new component, is transparent if the system after change adheres to previous external interface as much as possible while changing its internal behavior.”
  38. 38. @tyler_treat System
  39. 39. @tyler_treat System
  40. 40. @tyler_treat High TransparencyLow Transparency
  41. 41. @tyler_treat NFS High TransparencyLow Transparency
  42. 42. @tyler_treat NFSFTP High TransparencyLow Transparency
  43. 43. @tyler_treat Types of Transparencies Access transparency Location transparency Migration transparency Relocation transparency Replication transparency Concurrent transparency Failure transparency Persistence transparency Security transparency
  44. 44. @tyler_treat Transparency is about usability.
  45. 45. @tyler_treat Usability Control
  46. 46. @tyler_treat Usability Control
  47. 47. @tyler_treat Usability Control
  48. 48. @tyler_treat Simplicity Flexibility, Performance,
 Correctness RPC
  49. 49. @tyler_treat Simplicity Flexibility, Performance,
 Correctness Erlang Message Passing
  50. 50. @tyler_treat RPCErlang
 Message Passing High TransparencyLow Transparency
  51. 51. @tyler_treat Translating UX for developers: APIs
  52. 52. @tyler_treat Transparencies simplify the API of a system.
  53. 53. @tyler_treat UX is about deciding what knobs to expose.
  54. 54. @tyler_treat The Truth is Prohibitively Expensive Balancing Consistency and UX
  55. 55. @tyler_treat book trip Trip Service Trip Database transaction Good old days
  56. 56. @tyler_treat book trip Trip Service Trip Database transaction Good old days Transparency
  57. 57. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transactionTransparency
  58. 58. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service Trip Service transaction transaction transaction ACID ACID ACID Transparency
  59. 59. @tyler_treat
  60. 60. @tyler_treat
  61. 61. @tyler_treat
  62. 62. @tyler_treat Spreadsheet service
  63. 63. @tyler_treat Spreadsheet service Document service
  64. 64. @tyler_treat Spreadsheet service Document service Presentation service
  65. 65. @tyler_treat Spreadsheet service Document service Presentation service IAM service
  66. 66. @tyler_treat Spreadsheet service Document service Presentation service IAM service consistent
  67. 67. @tyler_treat Consistency is about ordering of events in a distributed system.
  68. 68. @tyler_treat Why is this hard?
  69. 69. @tyler_treat So what can we do?
  70. 70. @tyler_treat Coordinate
  71. 71. @tyler_treat Two-Phase Commit
  72. 72. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car Service Trip Service propose propose propose
  73. 73. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car Service Trip Service vote vote vote
  74. 74. @tyler_treat book trip 2PC Commit Airline Service Hotel Service Car Service Trip Service commit/abort commit/abort commit/abort
  75. 75. @tyler_treat book trip 2PC Commit Airline Service Hotel Service Car Service Trip Service done done done
  76. 76. @tyler_treat Problems with 2PC • Chatty protocol: beholden to network latency • Limited throughput • Transaction coordinator: single point of failure • Blocking protocol: susceptible to deadlock
  77. 77. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car Service Trip Service propose propose propose
  78. 78. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car Service Trip Service propose propose propose
  79. 79. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car Service Trip Service propose propose propose
  80. 80. @tyler_treat Add more phases!
  81. 81. @tyler_treat Three-Phase Commit
  82. 82. @tyler_treat
  83. 83. @tyler_treat atomic clocks NTP GPS TrueTime
  84. 84. @tyler_treat Good news:
 we solved physics.
  85. 85. @tyler_treat Bad news:
 it costs all the money.
  86. 86. @tyler_treat Not exactly…
  87. 87. @tyler_treat Spanner: Google’s Globally-Distributed Database
 Corbett et al.
  88. 88. @tyler_treat TrueTime forces that uncertainty to the surface, and Spanner provides a transparency over it.
  89. 89. @tyler_treat Spanner doesn’t avoid trade-offs, it just minimizes their probability.
  90. 90. @tyler_treat Spanner is expensive and proprietary.
  91. 91. @tyler_treat But it’s not the end of the story…
  92. 92. @tyler_treat Unless every service is backed by the same database, you probably still have to deal with consistency problems.
  93. 93. @tyler_treat Challenges to Adopting Stronger Consistency at Scale
 Ajoux et al., 2015 “The biggest barrier to providing stronger consistency guarantees…is that the consistency mechanism must integrate consistency across many stateful services.”
  94. 94. @tyler_treat Coordination is expensive because processes can’t make progress independently.
  95. 95. @tyler_treat
  96. 96. @tyler_treat
  97. 97. @tyler_treat Peter Bailis, 2015 https://speakerdeck.com/pbailis/silence-is-golden-coordination-avoiding-systems-design
  98. 98. @tyler_treat And what about partial failure?
  99. 99. @tyler_treat
  100. 100. @tyler_treat
  101. 101. @tyler_treat
  102. 102. @tyler_treat
  103. 103. @tyler_treat
  104. 104. @tyler_treat Memories, Guesses, and Apologies Dealing with Partial Knowledge
  105. 105. @tyler_treat The cost of knowing the “truth” can be prohibitively expensive.
  106. 106. @tyler_treat And partial failure means the “truth” is also fragile.
  107. 107. @tyler_treat Where does this leave us?
  108. 108. @tyler_treat We could go back to the monolith.
  109. 109. @tyler_treat We could build expensive data centers with fancy hardware… @tyler_treat
  110. 110. @tyler_treat …or we could rethink our transparencies.
  111. 111. @tyler_treat@tyler_treat
  112. 112. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
  113. 113. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
  114. 114. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
  115. 115. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
  116. 116. @tyler_treat Exception Handling in Asynchronous Systems
  117. 117. @tyler_treat
  118. 118. @tyler_treat Exception Handling in Asynchronous Systems • Write-off
  119. 119. @tyler_treat
  120. 120. @tyler_treat Exception Handling in Asynchronous Systems • Write-off • Retry
  121. 121. @tyler_treat
  122. 122. @tyler_treat Exception Handling in Asynchronous Systems • Write-off • Retry • Compensating action
  123. 123. @tyler_treat Revisiting Two-Phase Commit
  124. 124. @tyler_treat Sagas
  125. 125. @tyler_treat Sagas
 Garcia-Molina & Salem, 1987 “A long-lived transaction is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions…Either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution.”
  126. 126. @tyler_treat Sagas
 Garcia-Molina & Salem, 1987 “A long-lived transaction is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions…Either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution.”
  127. 127. @tyler_treat Sagas split long-lived transactions into individual, interleaved sub-transactions: T = T1, T2, . . . , Tn
  128. 128. @tyler_treat And each sub-transaction has a compensating transaction: C1, C2, . . . , Cn
  129. 129. @tyler_treat T1, T2, . . . , Tn T1, T2, . . . , Tj, Cj, . . . , C2, C1 Sagas guarantee one of two execution sequences:
  130. 130. @tyler_treat book trip Airline Service Hotel Service Car Service Trip Service transaction transaction transaction
  131. 131. @tyler_treat • Book flight • Book hotel • Book car • Charge money T = T1, T2, . . . , Tn
  132. 132. @tyler_treat • Cancel flight • Cancel hotel • Cancel car • Refund money C1, C2, . . . , Cn
  133. 133. @tyler_treat Compensating transactions must be idempotent.
  134. 134. @tyler_treat Sagas trade off isolation for availability.
  135. 135. @tyler_treat Event-Driven
  136. 136. @tyler_treat book trip Airline Service Hotel Service Car Service Trip Service transaction transaction transaction
  137. 137. @tyler_treat event Airline Service Hotel Service Car Service Trip Service event event event
  138. 138. @tyler_treat event Airline Service Hotel Service Car Service Trip Service event event event
  139. 139. @tyler_treat System Properties Business Rules
  140. 140. @tyler_treat Sean T. Allen “People don’t want distributed transactions, they just want the guarantees that distributed transactions give them.”
  141. 141. @tyler_treat CAP theorem
  142. 142. @tyler_treat CAP Theorem • Consistency, Availability, Partition Tolerance • When a partition occurs, do we: • Choose availability and give up consistency?
 
 - or - • Choose consistency and give up availability?
  143. 143. @tyler_treat CAP Theorem • Consistency, Availability, Partition Tolerance • When a partition occurs, do we: • Choose availability and give up consistency?
 
 - or - • Choose consistency and give up availability? (or YOLO it)
  144. 144. @tyler_treat The CAP theorem is a UX question…
  145. 145. @tyler_treat When a partial failure occurs, how do you want the application to behave?
  146. 146. @tyler_treat
  147. 147. @tyler_treat
  148. 148. @tyler_treat We can choose consistency and sacrifice availability…
  149. 149. @tyler_treat …or we can choose availability by making local decisions with the knowledge at hand and designing the UX accordingly.
  150. 150. @tyler_treat Managing partial failure is a matter of dealing with partial knowledge…
  151. 151. @tyler_treat …and managing risk.
  152. 152. @tyler_treat Check value
 < $10,000? Our risk appetite can drive business rules. Clear locally Double check with
 all replicas before
 clearing yes no
  153. 153. @tyler_treat Memories, guesses, and apologies
  154. 154. @tyler_treat Computers operate with partial knowledge.
  155. 155. @tyler_treat Either there’s a disconnect with the “real world”…
  156. 156. @tyler_treat …or there’s a disconnect between systems.
  157. 157. @tyler_treat Systems don’t make decisions, they make guesses.
  158. 158. @tyler_treat Systems have memory.
  159. 159. @tyler_treat Memories help systems make better guesses in the future.
  160. 160. @tyler_treat Forgetfulness is a business decision.
  161. 161. @tyler_treat Sometimes the system guesses wrong.
  162. 162. @tyler_treat Systems need the capacity to apologize.
  163. 163. @tyler_treat Customers judge you not by your failures, but by how you handle your failures.
  164. 164. @tyler_treat Are you building systems that never fail or systems that fail gracefully?
  165. 165. @tyler_treat
  166. 166. @tyler_treat Businesses need both code and people to manage apologies.
  167. 167. @tyler_treat It becomes less about trying to build the perfect system and more about how we cope with an imperfect one.
  168. 168. @tyler_treat Wrapping Up Summary and Observations
  169. 169. @tyler_treat
  170. 170. @tyler_treat@tyler_treat
  171. 171. @tyler_treat ACID distributed transactions exactly-once delivery ordered delivery serializable isolationlinearizability System Properties
  172. 172. @tyler_treat ACID distributed transactions exactly-once delivery ordered delivery serializable isolationlinearizability System Properties negative account balance Business Rules / Application Invariants two users sharing same IDroom double-booked balance reconciles
  173. 173. @tyler_treat
  174. 174. @tyler_treat We put ourselves at the mercy of our infrastructure and hope it makes good on its promises.
  175. 175. @tyler_treat Kyle Kingsbury, 2015 http://jepsen.io It often doesn’t.
  176. 176. @tyler_treat When do we actually need consistency?
  177. 177. @tyler_treat
  178. 178. @tyler_treat We can use consistency when the stakes are high and the cost is worth it.
  179. 179. @tyler_treat And design our transparencies accordingly.
  180. 180. @tyler_treat We could try to build perfect systems.
  181. 181. @tyler_treat Should we build perfect systems or pragmatic systems?
  182. 182. @tyler_treat Systems that can compensate.
  183. 183. @tyler_treat Systems that can recover.
  184. 184. @tyler_treat Systems that can apologize.
  185. 185. @tyler_treat UX Systems Business
  186. 186. @tyler_treat Data Consistency Race Conditions Performance Partial Failure
  187. 187. @tyler_treat Data Consistency Race Conditions Performance Partial Failure Transparency Informs
  188. 188. @tyler_treat Thank You bravenewgeek.com
 realkinetic.com
  189. 189. @tyler_treat References • https://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf • http://ijcsits.org/papers/vol2no62012/42vol2no6.pdf • http://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf • https://queue.acm.org/detail.cfm?id=2745385 • https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf • http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_133.pdf • https://bravenewgeek.com/distributed-systems-are-a-ux-problem/ • http://www.cs.princeton.edu/~wlloyd/papers/challenges-hotos15.pdf • https://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf • https://www.youtube.com/watch?v=lsKaNDj4TrE • Starbucks photo - https://www.geekwire.com/2015/starbucks-mobile-ordering-now-blankets-the-u-s-with-coverage-in-san-francisco-new-york-and-more-coming-today/ • Friction image - https://byjus.com/physics/friction-in-automobiles/ • Carbon copy forms - http://www.rainiercopy.com/forms.html • Rosetta Stone photo - https://en.wikipedia.org/wiki/Rosetta_Stone#/media/File:Rosetta_Stone.JPG

×