SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
FPGAs	
  as	
  Components	
  in	
  Heterogeneous	
  	
  
High-­‐Performance	
  Compu8ng	
  Systems:	
  
Raising	
  the	
  Abstrac8on	
  Level	
  	
  
Wim	
  Vanderbauwhede	
  
School	
  of	
  Compu6ng	
  Science	
  
University	
  of	
  Glasgow	
  
A	
  trip	
  down	
  memory	
  lane	
  
80	
  Years	
  ago:	
  	
  
The	
  Theory	
  
Turing,	
  Alan	
  Mathison.	
  "On	
  computable	
  numbers,	
  with	
  an	
  applica6on	
  to	
  the	
  Entscheidungsproblem."	
  J.	
  of	
  Math	
  58,	
  no.	
  345-­‐363	
  (1936):	
  5.	
  
1936:	
  Universal	
  machine	
  (Alan	
  Turing)	
  
1936:	
  Lambda	
  calculus	
  	
  (Alonzo	
  Church)	
  
1936:	
  Stored-­‐program	
  concept	
  (Konrad	
  Zuse)	
  
1937:	
  Church-­‐Turing	
  thesis	
  
1945:	
  The	
  Von	
  Neumann	
  architecture	
  
Church,	
  Alonzo.	
  "A	
  set	
  of	
  postulates	
  for	
  the	
  founda6on	
  of	
  logic."	
  Annals	
  of	
  mathema0cs	
  (1932):	
  346-­‐366.	
  
60-­‐40	
  Years	
  ago:	
  	
  
The	
  Founda8ons	
  
The	
  first	
  working	
  integrated	
  circuit,	
  1958.	
  	
  ©	
  Texas	
  Instruments.	
  
1957:	
  Fortran,	
  John	
  Backus,	
  IBM	
  
1958:	
  First	
  IC,	
  Jack	
  Kilby,	
  Texas	
  Instruments	
  
1965:	
  Moore’s	
  law	
  
1971:	
  First	
  microprocessor,	
  Texas	
  Instruments	
  
1972:	
  C,	
  	
  Dennis	
  Ritchie,	
  Bell	
  Labs	
  
1977:	
  Fortran-­‐77	
  
1977:	
  von	
  Neumann	
  bocleneck,	
  John	
  Backus	
  	
  
30	
  Years	
  ago:	
  	
  
HDLs	
  and	
  FPGAs	
  
Algotronix	
  CAL1024	
  FPGA,	
  1989.	
  ©	
  Algotronix	
  	
  
1984:	
  Verilog	
  	
  
1984:	
  First	
  reprogrammable	
  logic	
  device,	
  Altera	
  
1985:	
  First	
  FPGA,Xilinx	
  
1987:	
  VHDL	
  Standard	
  IEEE	
  1076-­‐1987	
  
1989:	
  Algotronix	
  CAL1024,	
  the	
  first	
  FPGA	
  to	
  
offer	
  random	
  access	
  to	
  its	
  control	
  memory	
  
20	
  Years	
  ago:	
  	
  
High-­‐level	
  Synthesis	
  
Page,	
  Ian.	
  "Closing	
  the	
  gap	
  between	
  hardware	
  and	
  soiware:	
  hardware-­‐soiware	
  cosynthesis	
  at	
  Oxford."	
  (1996):	
  2-­‐2.	
  
1996:	
  Handel-­‐C,	
  	
  Oxford	
  University	
  
2001:	
  Mitrion-­‐C,	
  Mitrionics	
  	
  
2003:	
  Bluespec,	
  MIT	
  
2003:	
  MaxJ,	
  Maxeler	
  Technologies	
  
2003:	
  Impulse-­‐C,	
  Impulse	
  Accelerated	
  Technologies	
  
2004:	
  Catapult	
  C,	
  Mentor	
  Graphics	
  
2005:	
  SystemVerilog,	
  BSV	
  	
  
2006:	
  AutoPilot,	
  AutoESL	
  (Vivado)	
  
2007:	
  DIME-­‐C,	
  Nallatech	
  	
  
2011:	
  LegUp,	
  University	
  of	
  Toronto	
  
2014:	
  Catapult,	
  Microsoi	
  	
  
10	
  Years	
  ago:	
  
	
  Heterogeneous	
  
Compu8ng	
  
2007:	
  CUDA,	
  NVIDIA 	
  	
  
2009:	
  OpenCL	
  ,Apple	
  Inc.,	
  Khronos	
  Group	
  
2010:	
  Intel	
  Many	
  Integrated	
  Cores	
  	
  	
  
2011:	
  Altera	
  	
  OpenCL	
  
2015:	
  Xilinx	
  	
  OpenCL	
  
FPGA	
  programming	
  evolu6on	
  
Dow	
  Jones	
  	
  index,	
  1985-­‐2015	
  
Where	
  to	
  next?	
  
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
High-­‐Level	
  Synthesis	
  
• 	
  For	
  many	
  years,	
  Verilog/VHDL	
  were	
  good	
  enough	
  
• 	
  Then	
  the	
  complexity	
  gap	
  created	
  the	
  need	
  for	
  HLS.	
  	
  
• 	
  This	
  reflects	
  the	
  ra6onale	
  behind	
  VHDL:	
  
"A	
  language	
  with	
  a	
  wide	
  range	
  of	
  descrip0ve	
  capability	
  that	
  
was	
  independent	
  of	
  technology	
  or	
  design	
  methodology."	
  
• 	
  What	
  is	
  lacking	
  in	
  this	
  requirement	
  is	
  "capability	
  for	
  
scalable	
  abstrac6on".	
  
“C	
  to	
  Gates”	
  
• 	
  "C-­‐to-­‐Gates"	
  offered	
  that	
  higher	
  abstrac6on	
  level	
  
• 	
  But	
  it	
  was	
  in	
  a	
  way	
  a	
  return	
  to	
  the	
  days	
  before	
  
standardised	
  VHDL/Verilog:	
  	
  
the	
  various	
  components	
  making	
  up	
  a	
  system	
  were	
  
designed	
  and	
  verified	
  using	
  a	
  wide	
  range	
  of	
  
different	
  and	
  incompa0ble	
  languages	
  and	
  tools.	
  
The	
  Choice	
  of	
  C	
  
• 	
  C	
  was	
  designed	
  by	
  Ritchie	
  for	
  the	
  specific	
  
purpose	
  of	
  wri6ng	
  the	
  UNIX	
  opera6ng	
  system.	
  
• 	
  i.e.	
  to	
  create	
  a	
  control	
  system	
  for	
  a	
  RAM-­‐
based	
  single-­‐threaded	
  system.	
  	
  
• 	
  It	
  is	
  basically	
  a	
  syntac6c	
  layer	
  over	
  assembly	
  
language.	
  
• 	
  Very	
  different	
  seman6cs	
  from	
  HDLs	
  
• 	
  But	
  it	
  became	
  the	
  lingua	
  franca	
  for	
  engineers,	
  
and	
  hence	
  the	
  de-­‐facto	
  language	
  for	
  HLS	
  tools.	
  
Really	
  C?	
  
• 	
  None	
  of	
  them	
  was	
  ever	
  really	
  C	
  though:	
  	
  
• 	
  "C	
  with	
  restric6ons	
  and	
  pragmas”	
  (e.g.	
  DIME-­‐C)	
  
• 	
  “C	
  with	
  restric6ons	
  and	
  a	
  CSP	
  API”	
  (e.g.	
  Impulse-­‐C)	
  
• 	
  "C-­‐syntax	
  language	
  with	
  parallel	
  and	
  CSP	
  seman6cs”	
  	
  
(e.g.	
  Handel-­‐C)	
  	
  
• 	
  Typically,	
  no	
  recursion,	
  func6on	
  pointers	
  (no	
  stack)	
  
and	
  dynamic	
  alloca6on	
  (no	
  OS)	
  
The	
  Odd	
  Ones	
  Out	
  
• 	
  Mitrion-­‐C	
  is	
  	
  func6onal	
  dataflow	
  language,	
  
very	
  different	
  from	
  C	
  in	
  abstrac6on	
  level	
  and	
  
seman6cs;	
  
• 	
  Bluespec	
  too	
  was	
  a	
  radical	
  departure	
  from	
  "C-­‐
to-­‐Gates".	
  
• 	
  Both	
  were	
  inspired	
  by	
  func6onal	
  languages	
  
like	
  Haskell.	
  	
  
Mitrion-­‐C	
  Example	
  
Heterogeneous	
  	
  
Compu8ng	
  
GPUs,	
  Manycores	
  and	
  FPGAs	
  
• 	
  Accelerators	
  acached	
  to	
  host	
  systems	
  have	
  
become	
  increasingly	
  popular	
  
• 	
  Mainly	
  GPUs,	
  
• 	
  But	
  increasingly	
  manycores	
  (MIC,	
  Tilera)	
  	
  
• 	
  And	
  FPGAs	
  
FPGA	
  Size	
  Evolu6on	
  Since	
  2003	
  
Heterogeneous	
  Programming	
  
• 	
  State	
  of	
  affairs	
  today:	
  	
  
• 	
  Programmer	
  must	
  decide	
  what	
  to	
  offload	
  
• 	
  Write	
  host-­‐accelerator	
  control	
  and	
  data	
  movement	
  code	
  
using	
  dedicated	
  API	
  
• 	
  Write	
  accelerator	
  code	
  using	
  dedicated	
  language	
  	
  
• 	
  Many	
  approaches	
  (CUDA,	
  OpenCL,	
  MaxJ,	
  C++	
  AMP)	
  
Programming	
  Model	
  
• 	
  All	
  solu6ons	
  assume	
  data	
  parallelism:	
  	
  
• 	
  Each	
  kernel	
  is	
  single-­‐threaded,	
  works	
  on	
  a	
  por6on	
  of	
  the	
  
data	
  
• 	
  Programmer	
  must	
  iden6fy	
  these	
  por6ons	
  and	
  the	
  amount	
  
of	
  parallelism	
  
• 	
  So	
  not	
  ideal	
  for	
  FPGAs	
  
• 	
  Recent	
  OpenCL	
  specifica6ons	
  have	
  kernel	
  pipes	
  allowing	
  
construc6on	
  of	
  pipelines	
  	
  
• 	
  Also	
  support	
  for	
  unified	
  	
  memory	
  space	
  
Performance	
  
Nearest	
  neighbour	
   Lava	
  MD	
   Document	
  Classifica6on	
  
Kernel	
  Speed	
   5.345454545	
   4.317035156	
   1.306521951	
  
Total	
  Speed	
   1.603603604	
   4.232313328	
   1.037712032	
  
0	
  
1	
  
2	
  
3	
  
4	
  
5	
  
6	
  
Speed-­‐up	
  
OpenCL-­‐FPGA	
  Speed-­‐up	
  vs	
  OpenCL-­‐CPU	
  
Segal,	
  Oren,	
  Nasibeh	
  Nasiri,	
  Mar6n	
  Margala,	
  and	
  Wim	
  Vanderbauwhede.	
  "High	
  level	
  programming	
  of	
  FPGAs	
  for	
  HPC	
  and	
  data	
  centric	
  applica6ons.”	
  	
  Proc.	
  IEEE	
  HPEC	
  2014,	
  pp.	
  1-­‐3.	
  
Power	
  Savings	
  
Nearest	
  neighbour	
   Lava	
  MD	
   Document	
  Classifica6on	
  
Kernel	
  Power	
   5.241358852	
   4.232966577	
   1.281079155	
  
Total	
  Power	
   1.572375533	
   4.149894595	
   1.017503956	
  
0	
  
1	
  
2	
  
3	
  
4	
  
5	
  
6	
  
Power	
  Saving	
  
CPU/FPGA	
  Power	
  Consump8on	
  Ra8o	
  
Segal,	
  Oren,	
  Nasibeh	
  Nasiri,	
  Mar6n	
  Margala,	
  and	
  Wim	
  Vanderbauwhede.	
  "High	
  level	
  programming	
  of	
  FPGAs	
  for	
  HPC	
  and	
  data	
  centric	
  applica6ons.”	
  	
  Proc.	
  IEEE	
  HPEC	
  2014,	
  pp.	
  1-­‐3.	
  
FPGAs	
  as	
  
Components	
  in	
  
Heterogeneous	
  	
  
Systems	
  
Heterogeneous	
  HPC	
  Systems	
  
• 	
  Modern	
  HPC	
  cluster	
  node:	
  	
  
• 	
  Mul6core/manycore	
  host	
  
• 	
  Accelerators:	
  GPGPU,	
  MIC	
  and	
  increasingly,	
  FPGAs	
  
• 	
  HPC	
  workloads	
  
• 	
  Very	
  complex	
  codebase	
  
• 	
  Legacy	
  code	
  
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
Example:	
  WRF	
  
• 	
  Weather	
  Research	
  and	
  Forecas6ng	
  Model	
  
• 	
  Fortran-­‐90,	
  support	
  for	
  MPI	
  and	
  OpenMP	
  
• 	
  1,263,320	
  lines	
  of	
  code	
  
• 	
  So	
  about	
  ten	
  thousand	
  pages	
  of	
  code	
  lis6ngs	
  
• 	
  Parts	
  of	
  it	
  have	
  been	
  accelerated	
  manually	
  on	
  GPU	
  (a	
  few	
  
thousands	
  of	
  lines)	
  
• 	
  Changing	
  the	
  code	
  for	
  a	
  GPU/FPGA	
  system	
  would	
  be	
  a	
  huge	
  
task,	
  and	
  the	
  result	
  would	
  not	
  be	
  portable.	
  
FPGAs	
  in	
  HPC	
  
• 	
  FPGAs	
  are	
  good	
  at	
  some	
  tasks,	
  e.g.:	
  
• 	
  Bit	
  level,	
  integer	
  and	
  string	
  	
  opera6ons	
  
• 	
  Pipeline	
  parallelism	
  rather	
  than	
  data	
  parallelism	
  
• 	
  Superior	
  internal	
  memory	
  bandwidth	
  
• 	
  Streaming	
  dataflow	
  computa6ons	
  
• 	
  But	
  	
  not	
  so	
  good	
  at	
  others 	
   	
  	
  
• 	
  Double-­‐precision	
  floa6ng	
  point	
  computa6ons	
  
• 	
  Random	
  memory	
  access	
  computa6ons	
  
"On the Capability and Achievable
Performance of FPGAs for HPC
Applications"
Wim Vanderbauwhede
School of Computing Science, University of Glasgow, UK
hcp://www.slideshare.net/WimVanderbauwhede	
  
Raising	
  the	
  	
  
Abstrac8on	
  
Level	
  
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
One	
  Codebase,	
  Many	
  Components	
  
• 	
  For	
  complex	
  HPC	
  applica6ons,	
  FPGAs	
  will	
  never	
  be	
  
op6mal	
  for	
  the	
  whole	
  codebase	
  
• 	
  But	
  neither	
  will	
  mul6cores	
  or	
  GPUs	
  
• 	
  So	
  we	
  need	
  to	
  be	
  able	
  to	
  split	
  the	
  codebase	
  
automa6cally	
  over	
  the	
  different	
  components	
  in	
  the	
  
heterogeneous	
  system.	
  
• 	
  Therefore,	
  we	
  need	
  to	
  raise	
  the	
  abstrac6on	
  level	
  
beyond	
  “heterogeneous	
  programming”	
  and	
  “high-­‐
level	
  synthesis”	
  
Today’s	
  
high-­‐level	
  language	
  
is	
  tomorrow’s	
  
compiler	
  target	
  
• 	
  Device-­‐specific	
  high-­‐level	
  abstrac6on	
  is	
  no	
  
longer	
  good	
  enough	
  
• 	
  OpenCL	
  is	
  rela6vely	
  high-­‐level	
  and	
  device-­‐
independent,	
  but	
  it	
  is	
  s6ll	
  not	
  good	
  enough	
  
• 	
  High-­‐level	
  synthesis	
  languages	
  and	
  
heterogeneous	
  programming	
  frameworks	
  
should	
  be	
  compila6on	
  targets!	
  
• 	
  Just	
  like	
  assembly	
  /IR	
  languages	
  and	
  HDLs	
  
Automa8c	
  Program	
  
Transforma8ons	
  and	
  
Cost	
  models	
  
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
• 	
  Star6ng	
  from	
  a	
  complete,	
  unop6mised	
  program	
  
• 	
  Compiler-­‐based	
  program	
  transforma6ons	
  
• 	
  Correct-­‐by-­‐construc6on	
  
• 	
  Component-­‐based,	
  hierarchical	
  cost	
  model	
  for	
  
the	
  full	
  system	
  
• 	
  Op6miza6on	
  problem:	
  	
  
find	
  the	
  op6mal	
  program	
  variant	
  given	
  the	
  
system	
  cost	
  model	
  
A	
  Func6onal-­‐Programming	
  Approach	
  
• 	
  For	
  the	
  par6cular	
  case	
  of	
  scien6fic	
  HPC	
  codes	
  
• 	
  Focus	
  on	
  array	
  computa6ons	
  
• 	
  Express	
  the	
  program	
  using	
  higher-­‐order	
  
func6ons	
  
• 	
  Type	
  Transforma6on	
  based	
  program	
  
transforma6on	
  
Func6onal	
  Programming	
  
• 	
  There	
  are	
  only	
  func6ons	
  
• 	
  Func6ons	
  can	
  operate	
  on	
  func6ons	
  
• 	
  Func6ons	
  can	
  return	
  func6ons	
  
• 	
  Syntac6c	
  sugar	
  over	
  the	
  λ-­‐calculus	
  
Types	
  in	
  Func6onal	
  Programming	
  
• 	
  Types	
  are	
  just	
  labels	
  to	
  help	
  us	
  reason	
  about	
  
the	
  values	
  in	
  a	
  computa6on	
  
• 	
  More	
  general	
  than	
  types	
  in	
  e.g	
  C	
  
• 	
  For	
  our	
  purpose,	
  we	
  focus	
  on	
  types	
  of	
  
func6ons	
  that	
  perform	
  array	
  opera6ons	
  
• 	
  Func6ons	
  are	
  values,	
  so	
  they	
  need	
  a	
  type	
  
Examples	
  of	
  Types	
  
-­‐-­‐	
  a	
  func6on	
  f	
  taking	
  a	
  vector	
  of	
  n	
  values	
  of	
  type	
  a	
  and	
  
returing	
  a	
  vector	
  of	
  m	
  values	
  of	
  type	
  b	
  
f : Vec a n -> Vec b m
-­‐-­‐	
  a	
  func6on	
  map	
  taking	
  a	
  func0on	
  from	
  a	
  to	
  b	
  and	
  a	
  
vector	
  of	
  type	
  a,	
  and	
  returning	
  a	
  vector	
  of	
  type	
  b	
  
map : (a -> b) -> Vec a n -> Vec b n
Type	
  Transforma6ons	
  
• 	
  Transform	
  the	
  type	
  of	
  a	
  func6on	
  into	
  another	
  
type	
  
• 	
  The	
  func6on	
  transforma6on	
  can	
  be	
  derived	
  
automa6cally	
  from	
  the	
  type	
  transforma6on	
  
• 	
  The	
  type	
  transforma6ons	
  are	
  provably	
  correct	
  
• 	
  Thus	
  the	
  transformed	
  program	
  is	
  correct	
  by	
  
construc6on!	
  
Array	
  Type	
  Transforma6ons	
  
• 	
  For	
  this	
  talk,	
  focus	
  on	
  	
  
• 	
  Vector	
  (array)	
  types	
  
• 	
  FPGA	
  cost	
  model	
  
• 	
  Programs	
  must	
  be	
  composed	
  using	
  par6cular	
  
higher-­‐order	
  func6ons	
  (correctness	
  condi6ons)	
  
• 	
  Transforma6ons	
  essen6ally	
  reshape	
  the	
  arrays	
  
Higher-­‐order	
  Func6ons	
  
• 	
  map:	
  perform	
  a	
  computa6on	
  on	
  all	
  elements	
  of	
  an	
  
array	
  independently,	
  e.g.	
  square	
  all	
  values.	
  
• 	
  can	
  be	
  done	
  sequen6ally,	
  in	
  parallel	
  or	
  using	
  a	
  
pipeline	
  if	
  the	
  computa6on	
  is	
  pipelined	
  
• 	
  foldl:	
  	
  reduce	
  an	
  array	
  to	
  a	
  value	
  using	
  an	
  
accumulator,	
  e.g.	
  sum	
  all	
  values.	
  
• 	
  can	
  be	
  done	
  sequen6ally	
  or,	
  if	
  the	
  computa6on	
  is	
  
associa6ve,	
  using	
  a	
  binary	
  tree	
  
Example:	
  SOR	
  
• 	
  Successive	
  Over-­‐Relaxa6on	
  (SOR)	
  kernel	
  from	
  a	
  
Large-­‐Eddy	
  simulator	
  (weather	
  simula6on)	
  in	
  Fortran:	
  
Example:	
  SOR	
  using	
  map	
  
• 	
  Fortran	
  code	
  rewricen	
  as	
  a	
  map	
  of	
  a	
  func6on	
  over	
  a	
  
1-­‐D	
  vector	
  
Example:	
  Type	
  Transforma6on	
  
• 	
  Transform	
  the	
  1-­‐D	
  vector	
  into	
  a	
  2-­‐D	
  vector	
  
• 	
  The	
  program	
  transforma6on	
  is	
  derived	
  
FPGA	
  Cost	
  Modeling	
  
• 	
  Based	
  on	
  an	
  overlay	
  architecture	
  
Cost	
  Calcula6on	
  
• 	
  Uses	
  an	
  Intermediate	
  Representa6on	
  Language,	
  the	
  
TyTra-­‐IR	
  
• 	
  TyTra-­‐IR	
  uses	
  LLVM	
  syntax	
  but	
  can	
  express	
  
sequen6al,	
  parallel	
  and	
  pipeline	
  seman6cs	
  
• 	
  Thus	
  a	
  direct	
  mapping	
  to	
  the	
  higher-­‐order	
  func6ons	
  
• 	
  But	
  the	
  cost	
  of	
  computa6ons	
  and	
  communica6on	
  
can	
  be	
  computed	
  directly	
  from	
  the	
  TyTra-­‐IR	
  program	
  
• 	
  No	
  need	
  for	
  synthesis	
  
code4paperSORc1.tirl
1 ; **** COMPUTE-IR ****
2 @main.p0 = addrSpace(12) ui18,
3 !"istream", !"CONT", !0, !"strobj_p"
4 @main.p1 = ...
5 @main.p2 = ...
6 @main.p3 = ...
7 ;...[other inputs]...
8 define void @f0(...args...) pipe {...}
9 define void @f1 (...args...) par {
10 call @f0(...args...) pipe
11 call @f0(...args...) pipe
12 call @f0(...args...) pipe
13 call @f0(...args...) pipe }
14 define void @main () {
15 call @f1(..args...) par }
16
17
TyTra-­‐IR	
  Example	
  
Cost	
  Space	
  and	
  Cost	
  Es6ma6on	
  
Logic and
Memory
Resources
Communication
Bandwidth
(local memory, global
memory, host)
Performance
(Throughput)
The Resource-Wall
(computation-bound)
The Bandwidth-Wall
(communication-bound)
Cost	
  Model	
  Accuracy	
  
Resource 1-lane(E) 1-lane(G) 4-lane(E) 4-lane(G)
ALUTs 239 164 148K 146K
REGs 725 572 76628 77260
BRAM(bits) 186K 186K 449K 682K
DSPs 9 12 36 24
Cycles/Kernel 1746 1742 436 446
EWGT 190K 222K 763K 488K
Design	
  Space	
  Explora6on	
  
Full-­‐Program	
  Transforma6on	
  
• 	
  Type	
  Transforma6ons	
  are	
  not	
  FPGA-­‐specific	
  
• 	
  Compiler	
  can	
  create	
  variants	
  for	
  full	
  program	
  
• 	
  Then	
  separate	
  out	
  parts	
  of	
  the	
  program	
  based	
  on	
  
minimal	
  cost	
  on	
  given	
  components	
  of	
  the	
  system	
  
• 	
  For	
  parallelisa6on	
  over	
  a	
  cluster,	
  use	
  Mul6-­‐Party	
  
Session	
  Types	
  to	
  transform	
  the	
  program	
  into	
  
communica6ng	
  processes	
  
Conclusions	
  
• 	
  FPGAs	
  have	
  reached	
  maturity	
  as	
  HPC	
  plauorms	
  
• 	
  High-­‐Level	
  Synthesis	
  and	
  Heterogeneous	
  
Programming	
  are	
  both	
  very	
  important	
  steps	
  forward,	
  
and	
  performance	
  is	
  already	
  impressive	
  
• 	
  But	
  we	
  need	
  to	
  raise	
  the	
  abstrac6on	
  level	
  even	
  more	
  
• 	
  Full-­‐system	
  compilers	
  for	
  heterogeneous	
  systems	
  
• 	
  FPGAs	
  are	
  merely	
  components	
  in	
  such	
  systems	
  
• 	
  Type	
  Transforma6ons	
  are	
  one	
  possible	
  way	
  
Thank you!

Contenu connexe

Tendances

The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCinside-BigData.com
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesCloudify Community
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmapinside-BigData.com
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
PhD SDN Projects
PhD SDN ProjectsPhD SDN Projects
PhD SDN ProjectsPhdtopiccom
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIinside-BigData.com
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...Edge AI and Vision Alliance
 
ScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded SolutionsScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded SolutionsScilab
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503Linaro
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...Edge AI and Vision Alliance
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsRISC-V International
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
gRPC stack supporting Intel Resource Director technology (RDT)
gRPC stack supporting Intel Resource Director technology (RDT)gRPC stack supporting Intel Resource Director technology (RDT)
gRPC stack supporting Intel Resource Director technology (RDT)Michelle Holley
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...Edge AI and Vision Alliance
 
Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro
 

Tendances (20)

The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACC
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different Pieces
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmap
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
PhD SDN Projects
PhD SDN ProjectsPhD SDN Projects
PhD SDN Projects
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPI
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
 
Shrilesh kathe 2017
Shrilesh kathe 2017Shrilesh kathe 2017
Shrilesh kathe 2017
 
ScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded SolutionsScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded Solutions
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutions
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
gRPC stack supporting Intel Resource Director technology (RDT)
gRPC stack supporting Intel Resource Director technology (RDT)gRPC stack supporting Intel Resource Director technology (RDT)
gRPC stack supporting Intel Resource Director technology (RDT)
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
 
Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)
 

Similaire à FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraMasaharu Munetomo
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...KRamasamy2
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...VAISHNAVI MADHAN
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Datainside-BigData.com
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 

Similaire à FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) (20)

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Data
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
HPC Performance tools, on the road to Exascale
HPC Performance tools, on the road to ExascaleHPC Performance tools, on the road to Exascale
HPC Performance tools, on the road to Exascale
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 

Plus de Wim Vanderbauwhede

Haku, a toy functional language based on literary Japanese
Haku, a toy functional language  based on literary JapaneseHaku, a toy functional language  based on literary Japanese
Haku, a toy functional language based on literary JapaneseWim Vanderbauwhede
 
A few thoughts on work life-balance
A few thoughts on work life-balanceA few thoughts on work life-balance
A few thoughts on work life-balanceWim Vanderbauwhede
 
List-based Monadic Computations for Dynamically Typed Languages (Python version)
List-based Monadic Computations for Dynamically Typed Languages (Python version)List-based Monadic Computations for Dynamically Typed Languages (Python version)
List-based Monadic Computations for Dynamically Typed Languages (Python version)Wim Vanderbauwhede
 
It was finally Christmas: Perl 6 is here!
It was finally Christmas: Perl 6 is here!It was finally Christmas: Perl 6 is here!
It was finally Christmas: Perl 6 is here!Wim Vanderbauwhede
 
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)Wim Vanderbauwhede
 
List-based Monadic Computations for Dynamic Languages
List-based Monadic Computations for Dynamic LanguagesList-based Monadic Computations for Dynamic Languages
List-based Monadic Computations for Dynamic LanguagesWim Vanderbauwhede
 
Why I do Computing Science Research
Why I do Computing Science ResearchWhy I do Computing Science Research
Why I do Computing Science ResearchWim Vanderbauwhede
 

Plus de Wim Vanderbauwhede (9)

Haku, a toy functional language based on literary Japanese
Haku, a toy functional language  based on literary JapaneseHaku, a toy functional language  based on literary Japanese
Haku, a toy functional language based on literary Japanese
 
Frugal computing
Frugal computingFrugal computing
Frugal computing
 
A few thoughts on work life-balance
A few thoughts on work life-balanceA few thoughts on work life-balance
A few thoughts on work life-balance
 
List-based Monadic Computations for Dynamically Typed Languages (Python version)
List-based Monadic Computations for Dynamically Typed Languages (Python version)List-based Monadic Computations for Dynamically Typed Languages (Python version)
List-based Monadic Computations for Dynamically Typed Languages (Python version)
 
Greener Search
Greener SearchGreener Search
Greener Search
 
It was finally Christmas: Perl 6 is here!
It was finally Christmas: Perl 6 is here!It was finally Christmas: Perl 6 is here!
It was finally Christmas: Perl 6 is here!
 
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)
 
List-based Monadic Computations for Dynamic Languages
List-based Monadic Computations for Dynamic LanguagesList-based Monadic Computations for Dynamic Languages
List-based Monadic Computations for Dynamic Languages
 
Why I do Computing Science Research
Why I do Computing Science ResearchWhy I do Computing Science Research
Why I do Computing Science Research
 

Dernier

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 

Dernier (20)

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 

FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)

  • 1. FPGAs  as  Components  in  Heterogeneous     High-­‐Performance  Compu8ng  Systems:   Raising  the  Abstrac8on  Level     Wim  Vanderbauwhede   School  of  Compu6ng  Science   University  of  Glasgow  
  • 2. A  trip  down  memory  lane  
  • 3. 80  Years  ago:     The  Theory   Turing,  Alan  Mathison.  "On  computable  numbers,  with  an  applica6on  to  the  Entscheidungsproblem."  J.  of  Math  58,  no.  345-­‐363  (1936):  5.  
  • 4. 1936:  Universal  machine  (Alan  Turing)   1936:  Lambda  calculus    (Alonzo  Church)   1936:  Stored-­‐program  concept  (Konrad  Zuse)   1937:  Church-­‐Turing  thesis   1945:  The  Von  Neumann  architecture   Church,  Alonzo.  "A  set  of  postulates  for  the  founda6on  of  logic."  Annals  of  mathema0cs  (1932):  346-­‐366.  
  • 5. 60-­‐40  Years  ago:     The  Founda8ons   The  first  working  integrated  circuit,  1958.    ©  Texas  Instruments.  
  • 6. 1957:  Fortran,  John  Backus,  IBM   1958:  First  IC,  Jack  Kilby,  Texas  Instruments   1965:  Moore’s  law   1971:  First  microprocessor,  Texas  Instruments   1972:  C,    Dennis  Ritchie,  Bell  Labs   1977:  Fortran-­‐77   1977:  von  Neumann  bocleneck,  John  Backus    
  • 7. 30  Years  ago:     HDLs  and  FPGAs   Algotronix  CAL1024  FPGA,  1989.  ©  Algotronix    
  • 8. 1984:  Verilog     1984:  First  reprogrammable  logic  device,  Altera   1985:  First  FPGA,Xilinx   1987:  VHDL  Standard  IEEE  1076-­‐1987   1989:  Algotronix  CAL1024,  the  first  FPGA  to   offer  random  access  to  its  control  memory  
  • 9. 20  Years  ago:     High-­‐level  Synthesis   Page,  Ian.  "Closing  the  gap  between  hardware  and  soiware:  hardware-­‐soiware  cosynthesis  at  Oxford."  (1996):  2-­‐2.  
  • 10. 1996:  Handel-­‐C,    Oxford  University   2001:  Mitrion-­‐C,  Mitrionics     2003:  Bluespec,  MIT   2003:  MaxJ,  Maxeler  Technologies   2003:  Impulse-­‐C,  Impulse  Accelerated  Technologies   2004:  Catapult  C,  Mentor  Graphics  
  • 11. 2005:  SystemVerilog,  BSV     2006:  AutoPilot,  AutoESL  (Vivado)   2007:  DIME-­‐C,  Nallatech     2011:  LegUp,  University  of  Toronto   2014:  Catapult,  Microsoi    
  • 12. 10  Years  ago:    Heterogeneous   Compu8ng  
  • 13. 2007:  CUDA,  NVIDIA     2009:  OpenCL  ,Apple  Inc.,  Khronos  Group   2010:  Intel  Many  Integrated  Cores       2011:  Altera    OpenCL   2015:  Xilinx    OpenCL  
  • 14. FPGA  programming  evolu6on   Dow  Jones    index,  1985-­‐2015  
  • 17. High-­‐Level  Synthesis   •   For  many  years,  Verilog/VHDL  were  good  enough   •   Then  the  complexity  gap  created  the  need  for  HLS.     •   This  reflects  the  ra6onale  behind  VHDL:   "A  language  with  a  wide  range  of  descrip0ve  capability  that   was  independent  of  technology  or  design  methodology."   •   What  is  lacking  in  this  requirement  is  "capability  for   scalable  abstrac6on".  
  • 18. “C  to  Gates”   •   "C-­‐to-­‐Gates"  offered  that  higher  abstrac6on  level   •   But  it  was  in  a  way  a  return  to  the  days  before   standardised  VHDL/Verilog:     the  various  components  making  up  a  system  were   designed  and  verified  using  a  wide  range  of   different  and  incompa0ble  languages  and  tools.  
  • 19. The  Choice  of  C   •   C  was  designed  by  Ritchie  for  the  specific   purpose  of  wri6ng  the  UNIX  opera6ng  system.   •   i.e.  to  create  a  control  system  for  a  RAM-­‐ based  single-­‐threaded  system.     •   It  is  basically  a  syntac6c  layer  over  assembly   language.   •   Very  different  seman6cs  from  HDLs   •   But  it  became  the  lingua  franca  for  engineers,   and  hence  the  de-­‐facto  language  for  HLS  tools.  
  • 20. Really  C?   •   None  of  them  was  ever  really  C  though:     •   "C  with  restric6ons  and  pragmas”  (e.g.  DIME-­‐C)   •   “C  with  restric6ons  and  a  CSP  API”  (e.g.  Impulse-­‐C)   •   "C-­‐syntax  language  with  parallel  and  CSP  seman6cs”     (e.g.  Handel-­‐C)     •   Typically,  no  recursion,  func6on  pointers  (no  stack)   and  dynamic  alloca6on  (no  OS)  
  • 21. The  Odd  Ones  Out   •   Mitrion-­‐C  is    func6onal  dataflow  language,   very  different  from  C  in  abstrac6on  level  and   seman6cs;   •   Bluespec  too  was  a  radical  departure  from  "C-­‐ to-­‐Gates".   •   Both  were  inspired  by  func6onal  languages   like  Haskell.    
  • 24. GPUs,  Manycores  and  FPGAs   •   Accelerators  acached  to  host  systems  have   become  increasingly  popular   •   Mainly  GPUs,   •   But  increasingly  manycores  (MIC,  Tilera)     •   And  FPGAs  
  • 25. FPGA  Size  Evolu6on  Since  2003  
  • 26. Heterogeneous  Programming   •   State  of  affairs  today:     •   Programmer  must  decide  what  to  offload   •   Write  host-­‐accelerator  control  and  data  movement  code   using  dedicated  API   •   Write  accelerator  code  using  dedicated  language     •   Many  approaches  (CUDA,  OpenCL,  MaxJ,  C++  AMP)  
  • 27. Programming  Model   •   All  solu6ons  assume  data  parallelism:     •   Each  kernel  is  single-­‐threaded,  works  on  a  por6on  of  the   data   •   Programmer  must  iden6fy  these  por6ons  and  the  amount   of  parallelism   •   So  not  ideal  for  FPGAs   •   Recent  OpenCL  specifica6ons  have  kernel  pipes  allowing   construc6on  of  pipelines     •   Also  support  for  unified    memory  space  
  • 28. Performance   Nearest  neighbour   Lava  MD   Document  Classifica6on   Kernel  Speed   5.345454545   4.317035156   1.306521951   Total  Speed   1.603603604   4.232313328   1.037712032   0   1   2   3   4   5   6   Speed-­‐up   OpenCL-­‐FPGA  Speed-­‐up  vs  OpenCL-­‐CPU   Segal,  Oren,  Nasibeh  Nasiri,  Mar6n  Margala,  and  Wim  Vanderbauwhede.  "High  level  programming  of  FPGAs  for  HPC  and  data  centric  applica6ons.”    Proc.  IEEE  HPEC  2014,  pp.  1-­‐3.  
  • 29. Power  Savings   Nearest  neighbour   Lava  MD   Document  Classifica6on   Kernel  Power   5.241358852   4.232966577   1.281079155   Total  Power   1.572375533   4.149894595   1.017503956   0   1   2   3   4   5   6   Power  Saving   CPU/FPGA  Power  Consump8on  Ra8o   Segal,  Oren,  Nasibeh  Nasiri,  Mar6n  Margala,  and  Wim  Vanderbauwhede.  "High  level  programming  of  FPGAs  for  HPC  and  data  centric  applica6ons.”    Proc.  IEEE  HPEC  2014,  pp.  1-­‐3.  
  • 30. FPGAs  as   Components  in   Heterogeneous     Systems  
  • 31. Heterogeneous  HPC  Systems   •   Modern  HPC  cluster  node:     •   Mul6core/manycore  host   •   Accelerators:  GPGPU,  MIC  and  increasingly,  FPGAs   •   HPC  workloads   •   Very  complex  codebase   •   Legacy  code  
  • 33. Example:  WRF   •   Weather  Research  and  Forecas6ng  Model   •   Fortran-­‐90,  support  for  MPI  and  OpenMP   •   1,263,320  lines  of  code   •   So  about  ten  thousand  pages  of  code  lis6ngs   •   Parts  of  it  have  been  accelerated  manually  on  GPU  (a  few   thousands  of  lines)   •   Changing  the  code  for  a  GPU/FPGA  system  would  be  a  huge   task,  and  the  result  would  not  be  portable.  
  • 34. FPGAs  in  HPC   •   FPGAs  are  good  at  some  tasks,  e.g.:   •   Bit  level,  integer  and  string    opera6ons   •   Pipeline  parallelism  rather  than  data  parallelism   •   Superior  internal  memory  bandwidth   •   Streaming  dataflow  computa6ons   •   But    not  so  good  at  others       •   Double-­‐precision  floa6ng  point  computa6ons   •   Random  memory  access  computa6ons  
  • 35. "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK hcp://www.slideshare.net/WimVanderbauwhede  
  • 36. Raising  the     Abstrac8on   Level  
  • 38. One  Codebase,  Many  Components   •   For  complex  HPC  applica6ons,  FPGAs  will  never  be   op6mal  for  the  whole  codebase   •   But  neither  will  mul6cores  or  GPUs   •   So  we  need  to  be  able  to  split  the  codebase   automa6cally  over  the  different  components  in  the   heterogeneous  system.   •   Therefore,  we  need  to  raise  the  abstrac6on  level   beyond  “heterogeneous  programming”  and  “high-­‐ level  synthesis”  
  • 39. Today’s   high-­‐level  language   is  tomorrow’s   compiler  target  
  • 40. •   Device-­‐specific  high-­‐level  abstrac6on  is  no   longer  good  enough   •   OpenCL  is  rela6vely  high-­‐level  and  device-­‐ independent,  but  it  is  s6ll  not  good  enough   •   High-­‐level  synthesis  languages  and   heterogeneous  programming  frameworks   should  be  compila6on  targets!   •   Just  like  assembly  /IR  languages  and  HDLs  
  • 41. Automa8c  Program   Transforma8ons  and   Cost  models  
  • 43. •   Star6ng  from  a  complete,  unop6mised  program   •   Compiler-­‐based  program  transforma6ons   •   Correct-­‐by-­‐construc6on   •   Component-­‐based,  hierarchical  cost  model  for   the  full  system   •   Op6miza6on  problem:     find  the  op6mal  program  variant  given  the   system  cost  model  
  • 44. A  Func6onal-­‐Programming  Approach   •   For  the  par6cular  case  of  scien6fic  HPC  codes   •   Focus  on  array  computa6ons   •   Express  the  program  using  higher-­‐order   func6ons   •   Type  Transforma6on  based  program   transforma6on  
  • 45. Func6onal  Programming   •   There  are  only  func6ons   •   Func6ons  can  operate  on  func6ons   •   Func6ons  can  return  func6ons   •   Syntac6c  sugar  over  the  λ-­‐calculus  
  • 46. Types  in  Func6onal  Programming   •   Types  are  just  labels  to  help  us  reason  about   the  values  in  a  computa6on   •   More  general  than  types  in  e.g  C   •   For  our  purpose,  we  focus  on  types  of   func6ons  that  perform  array  opera6ons   •   Func6ons  are  values,  so  they  need  a  type  
  • 47. Examples  of  Types   -­‐-­‐  a  func6on  f  taking  a  vector  of  n  values  of  type  a  and   returing  a  vector  of  m  values  of  type  b   f : Vec a n -> Vec b m -­‐-­‐  a  func6on  map  taking  a  func0on  from  a  to  b  and  a   vector  of  type  a,  and  returning  a  vector  of  type  b   map : (a -> b) -> Vec a n -> Vec b n
  • 48. Type  Transforma6ons   •   Transform  the  type  of  a  func6on  into  another   type   •   The  func6on  transforma6on  can  be  derived   automa6cally  from  the  type  transforma6on   •   The  type  transforma6ons  are  provably  correct   •   Thus  the  transformed  program  is  correct  by   construc6on!  
  • 49. Array  Type  Transforma6ons   •   For  this  talk,  focus  on     •   Vector  (array)  types   •   FPGA  cost  model   •   Programs  must  be  composed  using  par6cular   higher-­‐order  func6ons  (correctness  condi6ons)   •   Transforma6ons  essen6ally  reshape  the  arrays  
  • 50. Higher-­‐order  Func6ons   •   map:  perform  a  computa6on  on  all  elements  of  an   array  independently,  e.g.  square  all  values.   •   can  be  done  sequen6ally,  in  parallel  or  using  a   pipeline  if  the  computa6on  is  pipelined   •   foldl:    reduce  an  array  to  a  value  using  an   accumulator,  e.g.  sum  all  values.   •   can  be  done  sequen6ally  or,  if  the  computa6on  is   associa6ve,  using  a  binary  tree  
  • 51. Example:  SOR   •   Successive  Over-­‐Relaxa6on  (SOR)  kernel  from  a   Large-­‐Eddy  simulator  (weather  simula6on)  in  Fortran:  
  • 52. Example:  SOR  using  map   •   Fortran  code  rewricen  as  a  map  of  a  func6on  over  a   1-­‐D  vector  
  • 53. Example:  Type  Transforma6on   •   Transform  the  1-­‐D  vector  into  a  2-­‐D  vector   •   The  program  transforma6on  is  derived  
  • 54. FPGA  Cost  Modeling   •   Based  on  an  overlay  architecture  
  • 55. Cost  Calcula6on   •   Uses  an  Intermediate  Representa6on  Language,  the   TyTra-­‐IR   •   TyTra-­‐IR  uses  LLVM  syntax  but  can  express   sequen6al,  parallel  and  pipeline  seman6cs   •   Thus  a  direct  mapping  to  the  higher-­‐order  func6ons   •   But  the  cost  of  computa6ons  and  communica6on   can  be  computed  directly  from  the  TyTra-­‐IR  program   •   No  need  for  synthesis  
  • 56. code4paperSORc1.tirl 1 ; **** COMPUTE-IR **** 2 @main.p0 = addrSpace(12) ui18, 3 !"istream", !"CONT", !0, !"strobj_p" 4 @main.p1 = ... 5 @main.p2 = ... 6 @main.p3 = ... 7 ;...[other inputs]... 8 define void @f0(...args...) pipe {...} 9 define void @f1 (...args...) par { 10 call @f0(...args...) pipe 11 call @f0(...args...) pipe 12 call @f0(...args...) pipe 13 call @f0(...args...) pipe } 14 define void @main () { 15 call @f1(..args...) par } 16 17 TyTra-­‐IR  Example  
  • 57. Cost  Space  and  Cost  Es6ma6on   Logic and Memory Resources Communication Bandwidth (local memory, global memory, host) Performance (Throughput) The Resource-Wall (computation-bound) The Bandwidth-Wall (communication-bound)
  • 58. Cost  Model  Accuracy   Resource 1-lane(E) 1-lane(G) 4-lane(E) 4-lane(G) ALUTs 239 164 148K 146K REGs 725 572 76628 77260 BRAM(bits) 186K 186K 449K 682K DSPs 9 12 36 24 Cycles/Kernel 1746 1742 436 446 EWGT 190K 222K 763K 488K
  • 60. Full-­‐Program  Transforma6on   •   Type  Transforma6ons  are  not  FPGA-­‐specific   •   Compiler  can  create  variants  for  full  program   •   Then  separate  out  parts  of  the  program  based  on   minimal  cost  on  given  components  of  the  system   •   For  parallelisa6on  over  a  cluster,  use  Mul6-­‐Party   Session  Types  to  transform  the  program  into   communica6ng  processes  
  • 61. Conclusions   •   FPGAs  have  reached  maturity  as  HPC  plauorms   •   High-­‐Level  Synthesis  and  Heterogeneous   Programming  are  both  very  important  steps  forward,   and  performance  is  already  impressive   •   But  we  need  to  raise  the  abstrac6on  level  even  more   •   Full-­‐system  compilers  for  heterogeneous  systems   •   FPGAs  are  merely  components  in  such  systems   •   Type  Transforma6ons  are  one  possible  way