Soumettre la recherche
Mettre en ligne
GTC Japan 2014
•
2 j'aime
•
2,028 vues
Hitoshi Sato
Suivre
Presentation slides for GTC Japan 2014 (http://www.gputechconf.jp/page/home.html).
Lire moins
Lire la suite
Logiciels
Affichage du diaporama
Signaler
Partager
Affichage du diaporama
Signaler
Partager
1 sur 25
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
Japan Lustre User Group 2014
Japan Lustre User Group 2014
Hitoshi Sato
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
Ryousei Takano
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
Ceph Community
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Gerrit van Vuuren
Supermicro cloudera hadoop
Supermicro cloudera hadoop
Supermicro_SMCI
Recommandé
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
Japan Lustre User Group 2014
Japan Lustre User Group 2014
Hitoshi Sato
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
Ryousei Takano
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
Ceph Community
Hadoop Installation and basic configuration
Hadoop Installation and basic configuration
Gerrit van Vuuren
Supermicro cloudera hadoop
Supermicro cloudera hadoop
Supermicro_SMCI
Unix v6 セミナー vol. 5
Unix v6 セミナー vol. 5
magoroku Yamamoto
Bluestore
Bluestore
Patrick McGarry
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Andrey Kudryavtsev
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
templedf
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
Alluxio in MOMO
Alluxio in MOMO
Alluxio, Inc.
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
Alluxio, Inc.
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
Vacuum more efficient than ever
Vacuum more efficient than ever
Masahiko Sawada
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
Postgres-BDR with Google Cloud Platform
Postgres-BDR with Google Cloud Platform
SungJae Yun
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
計算機性能の限界点とその考え方
計算機性能の限界点とその考え方
Naoto MATSUMOTO
pgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
Oracle cluster installation with grid and iscsi
Oracle cluster installation with grid and iscsi
Chanaka Lasantha
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012
Accenture
LUG 2014
LUG 2014
Hitoshi Sato
Contenu connexe
Tendances
Unix v6 セミナー vol. 5
Unix v6 セミナー vol. 5
magoroku Yamamoto
Bluestore
Bluestore
Patrick McGarry
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Andrey Kudryavtsev
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
templedf
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
Alluxio in MOMO
Alluxio in MOMO
Alluxio, Inc.
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
Alluxio, Inc.
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Kohei KaiGai
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
Vacuum more efficient than ever
Vacuum more efficient than ever
Masahiko Sawada
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
Postgres-BDR with Google Cloud Platform
Postgres-BDR with Google Cloud Platform
SungJae Yun
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
Kohei KaiGai
計算機性能の限界点とその考え方
計算機性能の限界点とその考え方
Naoto MATSUMOTO
pgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
Oracle cluster installation with grid and iscsi
Oracle cluster installation with grid and iscsi
Chanaka Lasantha
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Kohei KaiGai
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
Tendances
(20)
Unix v6 セミナー vol. 5
Unix v6 セミナー vol. 5
Bluestore
Bluestore
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Alluxio in MOMO
Alluxio in MOMO
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
The Practice of Alluxio in Near Real-Time Data Platform at VIPShop [Chinese]
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Vacuum more efficient than ever
Vacuum more efficient than ever
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Postgres-BDR with Google Cloud Platform
Postgres-BDR with Google Cloud Platform
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
計算機性能の限界点とその考え方
計算機性能の限界点とその考え方
pgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Oracle cluster installation with grid and iscsi
Oracle cluster installation with grid and iscsi
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
Similaire à GTC Japan 2014
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012
Accenture
LUG 2014
LUG 2014
Hitoshi Sato
Dell Technologies Dell EMC Data Protection Solutions On One Single Page - POS...
Dell Technologies Dell EMC Data Protection Solutions On One Single Page - POS...
Dell Technologies
Open Source Data Deduplication
Open Source Data Deduplication
RedWireServices
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra
General commands for navisphere cli
General commands for navisphere cli
msaleh1234
Australian Bureau of Meteorology moves to a new Data Production Service
Australian Bureau of Meteorology moves to a new Data Production Service
inside-BigData.com
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Netgear Italia
JetStor NAS series 2016
JetStor NAS series 2016
Gene Leyzarovich
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
Pekka Männistö
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflows
jasonajohnson
QNAP TS-832PX-4G.pdf
QNAP TS-832PX-4G.pdf
GustavoLippera1
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.
Ontico
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
Qnap nas TS 1679 introduction_info tech Middle east
Qnap nas TS 1679 introduction_info tech Middle east
Ali Shoaee
Qnap nas ts 1679 introduction-02
Qnap nas ts 1679 introduction-02
CarrierDigit
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
Ceph Community
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystem
HungWei Chiu
Linux configer
Linux configer
MD. AL AMIN
Similaire à GTC Japan 2014
(20)
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012
LUG 2014
LUG 2014
Dell Technologies Dell EMC Data Protection Solutions On One Single Page - POS...
Dell Technologies Dell EMC Data Protection Solutions On One Single Page - POS...
Open Source Data Deduplication
Open Source Data Deduplication
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
General commands for navisphere cli
General commands for navisphere cli
Australian Bureau of Meteorology moves to a new Data Production Service
Australian Bureau of Meteorology moves to a new Data Production Service
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
JetStor NAS series 2016
JetStor NAS series 2016
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflows
QNAP TS-832PX-4G.pdf
QNAP TS-832PX-4G.pdf
Marian Marinov, 1H Ltd.
Marian Marinov, 1H Ltd.
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Qnap nas TS 1679 introduction_info tech Middle east
Qnap nas TS 1679 introduction_info tech Middle east
Qnap nas ts 1679 introduction-02
Qnap nas ts 1679 introduction-02
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystem
Linux configer
Linux configer
Dernier
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Ahmed Mohamed
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
Ortus Solutions, Corp
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
VICTOR MAESTRE RAMIREZ
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Tier1 app
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
Livetecs LLC
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
StefanoLambiase
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
qr0udbr0
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
Philip Schwarz
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
Envertis Software Solutions
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Angel Borroy López
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
umasea
EY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
Neo4j
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Cizo Technology Services
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Stefano Stabellini
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Diego Iván Oliveros Acosta
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
jennyeacort
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
bntitsolutionsrishis
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
AnoyGreter
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Dernier
(20)
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
EY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
GTC Japan 2014
1.
2.
3.
TSUBAME2 System Overview 11PB
(7PB HDD, 4PB Tape, 200TB SSD) “Global'Work'Space”'#1 SFA10k'#5 “Global'Work' Space”'#2 “Global'Work'Space”'#3 SFA10k'#4SFA10k'#3SFA10k'#2SFA10k'#1 /data0' /work0 /work1'''''/gscr “cNFS/Clusterd'Samba'w/'GPFS”'' HOME System' applicaJon “NFS/CIFS/iSCSI'by'BlueARC”'' HOME iSCSI Infiniband'QDR'Networks SFA10k'#6 GPFS#1 GPFS#2 GPFS#3 GPFS#4 Parallel'File'System'Volumes Home'Volumes QDR'IB(×4)'×'20 10GbE'×'2QDR'IB'(×4)'×'8 1.2PB 3.6(PB /data1' ' ' ' ' Thin'nodes 1408nodes'''(32nodes'x44'Racks)' HP'Proliant'SL390s'G7'1408nodes CPU:'Intel'Westmere`EP''2.93GHz'' '''''''''6cores'×'2'='12cores/node' GPU:'NVIDIA'Tesla'K20X,'3GPUs/node' Mem:'54GB'(96GB)' SSD:''60GB'x'2'='120GB'(120GB'x'2'='240GB) ' Medium'nodes HP'Proliant'DL580'G7'24nodes'' CPU:'Intel'Nehalem`EX'2.0GHz' '''''''''8cores'×'2'='32cores/node' GPU:'NVIDIA''Tesla'S1070,'' ''''''''''NextIO'vCORE'Express'2070' Mem:128GB' SSD:'120GB'x'4'='480GB' ' ' Fat'nodes HP'Proliant'DL580'G7'10nodes' CPU:'Intel'Nehalem`EX'2.0GHz' '''''''''8cores'×'2'='32cores/node'' GPU:'NVIDIA'Tesla'S1070' Mem:'256GB'(512GB)' SSD:'120GB'x'4'='480GB' Compu.ng(Nodes 17.1PFlops(SFP),(5.76PFlops(DFP),(224.69TFlops(CPU),(~100TB(MEM,(~200TB(SSD( Interconnets: FullKbisec.on(Op.cal(QDR(Infiniband(Network ' ' Voltaire'Grid'Director'4700''×12' IB'QDR:'324'ports' Core'Switch' ' ' Edge'Switch' ' ' Edge'Switch'(/w'10GbE'ports)' Voltaire'Grid'Director'4036'×179' IB'QDR':'36'ports' Voltaire''Grid'Director'4036E'×6' IB'QDR:34ports''' 10GbE:''2port' 12switches' 6switches'179switches' 2.4(PB(HDD(+(( 4PB(Tape
4.
例 ! TEPS(Traversed Edges
Per Second) ! (Cybersecurity, Medical Informatics, Social Networks, Data Enrichment, Symbolic Networks) ! ! concurrent search(Breadth First Search : BFS) ! optimization (Single Source Shortest Path) ! edge-oriented (Maximal Independent Set) ! ! Green Graph500 ! http://green.graph500.org/ • Kronecker'Graph' '(BFS)' ' – '16'(=m/n)' '32' ' – SCALE' '2SCALE'' '2SCALE'+'4' ' – SCALE30' 10' '172' 344' ' • ' – ' Input parameters • SCALE • edgefactor (=16) Graph' GeneraJon Graph' ConstrucJon BFS ValidaJon results 64 iterations
5.
• – • – • – • – • – –
6.
• – • – • – • – • – –
7.
• – – ' • – • – ' I/O
8.
• – – ' • – • – ' I/O ' '
9.
• ' ' – – ' • – • • – • – • • – •
10.
Hamar'Overview Map Distributed'Array Rank'0 Rank'1 Rank'n Local'Array
Local'Array Local'Array Local'Array Reduce Map Reduce Map Reduce Shuffle Shuffle Data'Transfer'between'ranks Shuffle Shuffle Local'Array Local'Array Local'Array Local'Array Device(GPU)' Data Host(CPU)' Data Memcpy'' (H2D,'D2H) Virtualized'Data'Object
11.
Map/Reduce'code'sample class'MapImpl':'public'hamar::funcJon::cuda::Map<MapContext>'{' ''public:' ''''''__host__'__device__'Operate(MapContext'*context)'{' '''''''''KeyType'key'='context`>input_key();' '''''''''ValueType'value'='context`>input_value();' ''''''''context`>Emit(key,'value);' '''''}' }' ' class'ReduceImpl':'public'hamar::funcJon::cuda::Reduce<ReduceContext>'{' ''public:' '''''___host__'__device__''Operate(ReduceContext'*context)'{' '''''''''KeyType'key'='context`>input_key();' '''''''''ValueType'values'='context`>input_values();' '''''''''int'n'='context`>num_input_values();' '''''''''ValueType'sum'='values[0]'+'…'+'values[n];' '''''''''context`>Emit(key,'sum);' '''''}' }'
12.
Map/Reduce'code'sample'(cont’d) int'main()'{' ' '''MapImpl'map;' '''ReduceImple'reduce;' ' '''Environment'env;' '''env.Init();''//'MPI/CUDA'IniJalizaJon' ' '''Directory'object(&env);' '''object.Init(path);' ' '''object.Map(map);' '''object.Reduce(reduce);' ' '''object.Destroy();' ' '''env.Destroy();''//'MPI/'CUDA'FinalizaJon' ' }
13.
Highly'Accelerated'MapReduce'with'' Out`of`core'support'on'GPUs Map Reduce Map Reduce Map Reduce • Hierarchical'memory'management'for'large`scale'' data'parallel'processing'using'mulJ`GPUs' – Support'out`of`core'processing'on'GPU'devices' –
Overlapping'computaJon'and'communicaJon' Map Reduce GPU CPU Memcpy'' (H2D,'D2H) Processing'' for'each'chunk Shuffle Shuffle
14.
Map/Reduce'ImplementaJon • IniJalizaJon'before'each'operaJon' – Remove'unnecessary'keys' –
Reordering'data'structures' • OpJmizaJons'for'GPU'accelerators' – Assign'a'warp'(32'threads)'per'key'for'avoiding'warp'divergence'in' Map/Reduce' – Overlapping'computaJon'on'GPU'and'data'transfer'between'CPU'and' GPU' Map/' Reduce Map/' Reduce SortSort Scan Sort'key`value'for'Scan Compact'keys'to'unique Overlap'computaJon'and' data'transfer
15.
GPU`based'External'Sort'ImplementaJon CPUGPU 1.'Divide'input'data'into'chunks,'then'sort'on'GPU'for'each'chunk 2.'Swap'intermediate'' ''''data'on'CPU GPU 3.'Sort'intermediate'data'on'GPU *1:'Y.'Ye'et'al.,'“GPUMemSort:'A'High'Performance'Graphics'Co`processors'SorJng'Algorithm'for'Large'' '''''''Scale'In`Memory'Data”,'GSTF'InternaJonal'Journal'on'CompuJng,'2011' • Out`of`core'GPU'sorJng'algorithm'*1' – Adopted'Sample`based'Parallel'SorJng'Algorithm' –
Overlapping'computaJon'on'GPU'and'data'transfer'between'CPU' and'GPU'
16.
ApplicaJon'Example':'GIM`V' Generalized'IteraJve'Matrix`Vector'mulJplicaJon*1 • Easy'descripJon'of'various'graph'algorithms'by'implemenJng' combine2,'combineAll,'assign'funcJons' • PageRank,'Random'Walk'Restart,'Connected'Component' –
v’#=#M#×G#v''where' v’i'='assign(vj','combineAllj'({xj#|'j#='1..n,'xj#='combine2(mi,j,'vj)}))''(i'='1..n)' – IteraJve'2'phases'MapReduce'operaJons' ×Gv’i mi,j vj v’ M combineAll( and(assign((stage2) combine2((stage1) assign v *1':'Kang,'U.'et'al,'“PEGASUS:'A'Peta`Scale'Graph'Mining'System`'ImplementaJon'' and'ObservaJons”,'IEEE'INTERNATIONAL'CONFERENCE'ON'DATA'MINING'2009 Straigh|orward'implementaJon'using'Hamar
17.
Weak'Scaling'Performance'' [Sato,'Shirahata'et'al.'Cluster2014]' • PageRank'applicaJon'on'TSUBAME'2.5' • Data'size'is'larger'than'GPU'memory'capacity 0' 500' 1000' 1500' 2000' 2500' 3000' 0'
200' 400' 600' 800' 1000' 1200' Performance([MEdges/sec] Number(of(Compute(Nodes SCALE(23(K(24(per(Node 1CPU'(S23'per'node)' 1GPU'(S23'per'node)' 2CPUs'(S24'per'node)' 2GPUs'(S24'per'node)' 3GPUs'(S24'per'node)' 2.81'GE/s'on'3072'GPUs' (SCALE'34) 2.10x'Speedup' (3'GPU'v'2CPU)
18.
Breakdown • Performance'on'3'GPUs'compared'with'2'CPUs' – SCALE'33,'1024'nodes' –
Map:'2.82x,'Reduce:'1.11x,'Sort:'5.04x'speedup' • Overlapping'communicaJon'effecJvely 0' 10000' 20000' 30000' 40000' 50000' 60000' 70000' 1CPU' 1GPU' 2CPUs' 2GPUs' 3GPUs' Elapsed(.me([ms] Map' Shuffle' Reduce' Sort' Others'
19.
Towards(Mul.level(data(management(( on(Hamar(using(GPUs(and(NVMs([GTC2014] Mother'board '''''''''''''''''''''''''''''''''''''RAID'card mSATA mSATA mSATA
mSATA 0' 1000' 2000' 3000' 4000' 5000' 6000' 7000' 8000' 9000' 0' 5' 10' 15' 20' Bandwidth([MB/s] #(mSATAs Raw'mSATA'4KB' RAID0'1MB' RAID0'64KB' 0' 0.5' 1' 1.5' 2' 2.5' 3' 3.5' 0.274'0.547'1.09' 2.19' 4.38' 8.75' 17.5' 35' 70' 140' Throughuput([GB/s] Matrix(Size([GB] Raw'8'mSATA' 8'mSATA'RAID0'(1MB)' 8'mSATA'RAID0'(64KB)' I/O'performance'of'mulJple'mSATA'SSD I/O'performance'from'GPU'to'mulJple'mSATA'SSDs (7.39(GB/s(from(( 16(mSATA(SSDs((Enabled(RAID0)( (3.06(GB/s(from(( 8(mSATA(SSDs(to(GPU( How(to(design(local(storage(for(nextKgen(supercomputers(?( K(Designed(a(local(I/O(prototype(using(16(mSATA(SSDs( Capacity:((4TB( Read(bandwidth:(8(GB/s(
20.
SorJng'for'Rapidly'Increasing'Datasets' [Shamoto,'Sato'et'al]' • The'need'to'process'huge'datasets'is'increasing' due'to'growth'of'data'collecJon'in'various'fields' – Sensor'data' –
SNS'network' • Fast'sorJng'methods' – Distributed'SorJng:'SorJng'for'distributed'system' • Spli~er`based'parallel'sort' • Radix'sort' • Merge'sort' – SorJng'on'heterogeneous'architectures' • Many'sorJng'algorithms'are'accelerated'by'many'cores' and'high'memory'bandwidth.' • SorJng'for'large`scale'heterogeneous'systems' remains'unclear'
21.
ExisJng'SorJng'Algorithms SpligerKbased(parallel(sor.ng( – The'flow'of'the'algorithm' 1. local'sort:'Each'process'sorts'its'own'array' 2.
Select'spli0ers:'Choose'criteria'for'data'segmentaJon' 3. Data'transfer:'Transfer'data'segments' 4. Local'merge:'Merge'sorted'arrays' – Low'communicaJon'costs' 'ComputaJon'costs'starts'dominaJng'the'overall'performance( ( Sor.ng(on(GPU( – There'are'many'a~empts'to'accelerate'sorJng' • Thrust'sort[D.merrill'et'al.,'2011]' – Fast'sorJng'for'one'compute'node' • A'GPU'external'sort[Y.'Ye'et'al.,'2010]' – Handle'GPU'memory'overflows' • A'mulFGnode'GPU'sort[K.'L.'Spafford'et'al.,'2011]' – Does'not'sort'huge'data'sets' U.lize(GPU(accelerators(for(spligerKbased(parallel(sor.ng
22.
GPU'implementaJon'for' Spli~er`based'Parallel'SorJng • Offloading'the'most'Jme`consuming'phase'to' GPU'accelerators 0 20 40 4 8 16 32 64 128 256 512 1024 2048 # of
proccesses (2 proccesses per node) Elapsedtime[s] synchronization costs data transfer and Merge local sort (original) merge (remaining arrays) select splitters select'spli~ers data'transfer merge ' ' GPU local'sort ' unsorted sorted ' '
23.
• 2'~'1024'nodes'(4'~'2048'GPUs)'on'TSUBAME2.5' • 2'processes'per'node'and'each'node'has'2GB'64bit'integer Weak'Scaling'Performance 0 10000 20000 30000 0
500 1000 1500 2000 # of proccesses (2 proccesses per node) Keys/second(millions) HykSort 1thread HykSort 6threads HykSort GPU + 6threads GPU(implementa.on( based(on(mul.Kthreaded( implementa.on Mul.Kthreaded( implementa.on SingleKthreaded( implementa.on x1.4 x3.6 When'the'#'of'processes'is'2048
24.
K20x x4 faster
than K20x 0 20000 40000 60000 0 500 1000 1500 2000 0 500 1000 1500 2000 # of proccesses (2 proccesses per node) Keys/second(millions) HykSort 6threads HykSort GPU + 6threads PCIe_10 PCIe_100 PCIe_200 PCIe_50 Prediction of our implementation Performance'PredicJon • PCIe_#:'#GB/s' bandwidth'of' interconnect'between' CPU'and'GPU' 8.8%'reducJon'of'overall' runJme'when'the'accelerators' work'4'Jmes'faster'than'K20x x2.2'speedup'when'the'#'of'PCI' bandwidth'increase'to'50GB/s
25.
• – – • – • –
Télécharger maintenant