SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Extreme Big Data (EBD) 
Convergence of Extreme Computing 
and Big Data Technologies 
7RUIHRCJPLTPBJTMU:SHPUTHTK3USV=PTN3LTL: 
DUQ@UTXP=LUMDLJOTURUN@ 
 
8PUXOPCHU
Big Data Examples 
Rates and Volumes are extremely immense 
Social NW 
• Facebook 
– 1 billion users 
– Average 130 friends 
– 30 billion pieces of content 
shared per month 
• Twitter 
– 500 million active users 
– 340 million tweets per day 
• Internet 
– 300 million new websites per year 
– 48 hours of video to YouTube per minute 
– 30,000 YouTube videos played per second 
Genomics 
Social Simulation
Sequencing)data)(bp)/$) 
)becomes)x4000)per)5)years) 
c.f.,)HPC)x33)in)5)years 
 
).121
,.1(,120,
.2252	  
• Applications 
– Target Area: Planet 
(Open Street Map) 
– 7 billion people 
• Input Data 
– Road Network for Planet: 
300GB (XML) 
– Trip data for 7 billion people 
10KB (1trip) x 7 billion = 70TB 
– Real-Time Streaming Data 
(e.g., Social sensor, physical data) 
• Simulated Output for 1 Iteration 
– 700TB 
Weather 
A#1.'Quality'Control 
A#2.'Data'Processing 
30#sec'Ensemble' 
Forecast'Simulations 
2'PFLOP 
Ensemble 
Data'Assimilation 
2'PFLOP 
Himawari 
500MB/2.5min 
Ensemble'Forecasts
200GB
Phased'Array'Radar 
1GB/30sec/2'radars 
Ensemble'Analyses
200GB
B#1.'Quality'Control 
B#2.'Data'Processing 
Analysis'Data 
2GB 
30#min 
Forecast'Simulation 
1.2'PFLOP 
30#min Forecast 
2GB 
Repeat'every'30'sec.
Big Data Examples 
Rates and Volumes are extremely immense 
Social NW 
• Facebook 
Future “Extreme Big Data” 
• NOT mining Tbytes Silo Data 
• Peta~Zetabytes of Data 
• Ultra High-BW Data Stream 
• Highly Unstructured, Irregular 
• Complex correlations between data from multiple sources 
• Extreme Capacity, Bandwidth, Compute All Required 
– 1 billion users 
– Average 130 friends 
– 30 billion pieces of content 
shared per month 
• Twitter 
– 500 million active users 
– 340 million tweets per day 
• Internet 
– 300 million new websites per year 
– 48 hours of video to YouTube per minute 
– 30,000 YouTube videos played per second 
Genomics 
Social Simulation
Sequencing)data)(bp)/$) 
)becomes)x4000)per)5)years) 
c.f.,)HPC)x33)in)5)years 
 
).121
,.1(,120,
.2252	  
• Applications 
– Target Area: Planet 
(Open Street Map) 
– 7 billion people 
• Input Data 
– Road Network for Planet: 
300GB (XML) 
– Trip data for 7 billion people 
10KB (1trip) x 7 billion = 70TB 
– Real-Time Streaming Data 
(e.g., Social sensor, physical data) 
• Simulated Output for 1 Iteration 
– 700TB 
Weather 
A#1.'Quality'Control 
A#2.'Data'Processing 
30#sec'Ensemble' 
Forecast'Simulations 
2'PFLOP 
Ensemble 
Data'Assimilation 
2'PFLOP 
Himawari 
500MB/2.5min 
Ensemble'Forecasts
200GB
Phased'Array'Radar 
1GB/30sec/2'radars 
Ensemble'Analyses
200GB
B#1.'Quality'Control 
B#2.'Data'Processing 
Analysis'Data 
2GB 
30#min 
Forecast'Simulation 
1.2'PFLOP 
30#min Forecast 
2GB 
Repeat'every'30'sec.
Graph500(“Big(Data”(Benchmark
A: 0.57, B: 0.19 
C: 0.19, D: 0.05 
November(15,(2010( 
Graph500TakesAimataNewKindofHPC 
RichardMurphy(SandiaNL=Micron) 
“(IexpectthatthisrankingmayatImeslookvery 
differentfromtheTOP500list.Cloudarchitectureswill 
almostcertainlydominateamajorchunkofpartofthe 
list.”( 
The 8th Graph500 List (June2014): K Computer #1, TSUBAME2 #12 
Koji Ueno, Tokyo Institute of Technology/RIKEN AICS 
RIKEN Advanced Institute for Computational 
Science (AICS)’s K computer 
is ranked 
No.1 
Reality: Top500 Supercomputers Dominate 
on the Graph500 Ranking of Supercomputers with 
17977.1 GE/s on Scale 40 
on the 8th Graph500 list published at the International 
Supercomputing Conference, June 22, 2014. 
Congratulations from the Graph500 Executive Committee 
#1 K Computer 
Global Scientific Information and Computing Center, Tokyo Institute 
of Technology’s TSUBAME 2.5 
is ranked 
No.12 
on the Graph500 Ranking of Supercomputers with 
1280.43 GE/s on Scale 36 
on the 8th Graph500 list published at the International 
Supercomputing Conference, June 22, 2014. 
Congratulations from the Graph500 Executive Committee 
#12 TSUBAME2 
No Cloud IDCs at all
-‐‑‒ :2
2222: 
22-‐‑‒DUQ@UDLJO 
(.#6C6#	,.#646# (
UTDUV,

:=T)
( 
!:2 
(
/TUKLX#:HJQX 
 
3#E0TLRGLUTG,-‐‑‒.
-‐‑‒JU:LXC
 
7#E0!F41DLXRH)
GC
 
 5 0,/7244B
(


 8A 
CC40-‐‑‒
72C)
2  
CA4B6=RRIPXLJPUTVPJHRTBTPIHTK 
2 
CC40)

D2 
8440.#2=X:L#7#6C 
DHVL0#2CU:HNLDLQC/,

C)
TSUBAME2 System Overview 
11PB (7PB HDD, 4PB Tape, 200TB SSD) 
 
++CompuIngNodes- 
17.1PFlops(SFP),5.76PFlops(DFP),224.69TFlops(CPU),~100TBMEM,~200TBSSD 
Edge(Switch( 
Edge(Switch((/w(10GbE(ports)( 
QDR(IB(×4)(×(20 QDR(IB((×4)(×(8 10GbE(×(2 
Core(Switch( 
SFA10k(#1 SFA10k(#2 SFA10k(#3 SFA10k(#4 
GPFS+Tape Lustre Home 
/data0( /work0 /work1(((((/gscr 
“Global(Work(Space”(#1 
SFA10k(#5 
“Global(Work( 
Space”(#2 “Global(Work(Space”(#3 
GPFS#1 GPFS#2 GPFS#3 GPFS#4 
HOME 
System( 
applicaQon 
“cNFS/Clusterd(Samba(w/(GPFS”(( 
HOME 
iSCSI 
“NFS/CIFS/iSCSI(by(BlueARC”(( 
Infiniband(QDR(Networks 
SFA10k(#6 
1.2PB 3.6PB 
Parallel(File(System(Volumes Home(Volumes 
/data1( 
(( 
(( 
Thin(nodes 1408nodes((((32nodes(x44(Racks)( 
HP(Proliant(SL390s(G7(1408nodes++++++++++++ 
CPU:(Intel(WestmerecEP((2.93GHz(( 
(((((((((6cores(×(2(=(12cores/node( 
GPU:(NVIDIA(Tesla(K20X,(3GPUs/node( 
Mem:(54GB((96GB)( 
SSD:((60GB(x(2(=(120GB((120GB(x(2(=(240GB)+( 
Medium(nodes 
HP(Proliant(DL580(G7(24nodes(( 
CPU:(Intel(NehalemcEX(2.0GHz( 
(((((((((8cores(×(2(=(32cores/node( 
GPU:(NVIDIA((Tesla(S1070,(( 
((((((((((NextIO(vCORE(Express(2070( 
Mem:128GB( 
SSD:(120GB(x(4(=(480GB( 
(( 
Fat(nodes 
HP(Proliant(DL580(G7(10nodes( 
CPU:(Intel(NehalemcEX(2.0GHz( 
(((((((((8cores(×(2(=(32cores/node(( 
GPU:(NVIDIA(Tesla(S1070( 
Mem:(256GB((512GB)( 
SSD:(120GB(x(4(=(480GB( 
,,,,,, 
Interconnets:+Full_bisecIonOpIcalQDRInfinibandNetwork 
(( 
Voltaire(Grid(Director(4700((×12( 
IB(QDR:(324(ports( 
(( 
(( 
Voltaire(Grid(Director(4036(×179( 
IB(QDR(:(36(ports( 
Voltaire((Grid(Director(4036E(×6( 
IB(QDR:34ports((( 
10GbE:((2port( 
12switches( 
179switches( 6switches( 
2.4PBHDD+ 
4PBTape 
Local SSDs
TSUBAME2 System Overview 
11PB (7PB HDD, 4PB Tape, 200TB SSD) 
 
++CompuIngNodes- 
17.1PFlops(SFP),5.76PFlops(DFP),224.69TFlops(CPU),~100TBMEM,~200TBSSD 
Edge(Switch( 
Finecgrained(R/W(I/O( 
(check(point,(temporal(files) 
Edge(Switch((/w(10GbE(ports)( 
QDR(IB(×4)(×(20 QDR(IB((×4)(×(8 10GbE(×(2 
Core(Switch( 
SFA10k(#1 SFA10k(#2 SFA10k(#3 SFA10k(#4 
GPFS+Tape Lustre Home 
/data0( /work0 /work1(((((/gscr 
“Global(Work(Space”(#1 
SFA10k(#5 
“Global(Work( 
Space”(#2 “Global(Work(Space”(#3 
GPFS#1 GPFS#2 GPFS#3 GPFS#4 
HOME 
System( 
applicaQon 
“cNFS/Clusterd(Samba(w/(GPFS”(( 
HOME 
iSCSI 
“NFS/CIFS/iSCSI(by(BlueARC”(( 
Infiniband(QDR(Networks 
SFA10k(#6 
1.2PB 3.6PB 
Parallel(File(System(Volumes Home(Volumes 
/data1( 
(( 
(( 
Thin(nodes 1408nodes((((32nodes(x44(Racks)( 
HP(Proliant(SL390s(G7(1408nodes++++++++++++ 
CPU:(Intel(WestmerecEP((2.93GHz(( 
(((((((((6cores(×(2(=(12cores/node( 
GPU:(NVIDIA(Tesla(K20X,(3GPUs/node( 
Mem:(54GB((96GB)( 
SSD:((60GB(x(2(=(120GB((120GB(x(2(=(240GB)+( 
Medium(nodes 
HP(Proliant(DL580(G7(24nodes(( 
CPU:(Intel(NehalemcEX(2.0GHz( 
(((((((((8cores(×(2(=(32cores/node( 
GPU:(NVIDIA((Tesla(S1070,(( 
((((((((((NextIO(vCORE(Express(2070( 
Mem:128GB( 
SSD:(120GB(x(4(=(480GB( 
(( 
Fat(nodes 
HP(Proliant(DL580(G7(10nodes( 
CPU:(Intel(NehalemcEX(2.0GHz( 
(((((((((8cores(×(2(=(32cores/node(( 
GPU:(NVIDIA(Tesla(S1070( 
Mem:(256GB((512GB)( 
SSD:(120GB(x(4(=(480GB( 
,,,,,, 
Interconnets:+Full_bisecIonOpIcalQDRInfinibandNetwork 
(( 
Voltaire(Grid(Director(4700((×12( 
IB(QDR:(324(ports( 
(( 
(( 
Voltaire(Grid(Director(4036(×179( 
IB(QDR(:(36(ports( 
Voltaire((Grid(Director(4036E(×6( 
IB(QDR:34ports((( 
10GbE:((2port( 
12switches( 
179switches( 6switches( 
2.4PBHDD+ 
4PBTape 
Local SSDs 
Read(mostly(I/O(( 
(datacintensive(apps,(parallel(workflow,( 
parameter(survey) 
• (Home(storage(for(compuQng(nodes( 
• (Cloudcbased(campus(storage(services 
Backup 
Finecgrained(R/W(I/O( 
(check(point,(temporal(files)
TSUBAME2 Storage Usage Since Nov. 2010
DCE21 5),HXH2PN4HHTM:HX:=J=:L 
• 7#EIHXLK HT@JU:L1JJLRL:HU: 
– !F41DLXRH)
G))JH:KX 
• 8=NL LSU:@FUR=SLXVL:!UKL 
– ,/72VL:TUKLC(
/TUKLX 
• 6HD:LL4=HRBHPRA4BTBTP2HTK 
– )

DIVXUMM=RRIPXLJPUTIHTK?PKO 
 
• UJHRCC4KLPJLXVL:TUKL 
– )

D2PTUHR 
 
• H:NLXJHRLCU:HNLC@XLSX 
– =X:L#7#6C 
– .#2UM844X##2UMDHVLX
DCE21 5),HXH2PN4HHTM:HX:=J=:L 
• 7#EIHXLK HT@JU:L1JJLRL:HU: 
– !F41DLXRH)
G))JH:KX 
• 8=NL LSU:@FUR=SLXVL:!UKL 
– ,/72VL:TUKLC(
/TUKLX 
• 6HD:LL4=HRBHPRA4BTBTP2HTK 
– )

DIVXUMM=RRIPXLJPUTIHTK?PKO 
 
• UJHRCC4KLPJLXVL:TUKL 
– )

D2PTUHR 
 
• H:NLXJHRLCU:HNLC@XLSX 
– =X:L#7#6C 
– .#2UM844X##2UMDHVLX
A Major Northern Japanese 
Cloud Datacenter (2013)  
10GbE 10GbE 
Juniper(MX480 Juniper(MX480 
2(zone(switches((Virtual(Chassis) 
10GbE 
Juniper(EX8208 Juniper(EX8208 
Juniper( 
EX4200 
Juniper( 
EX4200 
Zone((700(nodes) 
Juniper( 
EX4200 
Juniper( 
EX4200 
Zone((700(nodes) 
Juniper( 
EX4200 
10GbE 
Juniper( 
EX4200 
Zone((700(nodes) 
LACP 
the(Internet 
8 zones, Total 5600 nodes, 
Injection 1GBps/Node 
Bisection 160Gigabps 
Supercomputer Tokyo Tech. 
Tsubame 2.0 
#4 Top500 (2010) 
Advanced Silicon 
Photonics 40G 
single CMOS Die 
1490nm DFB 
100km Fiber 
~1500 nodes compute  storage 
Full Bisection Multi-Rail 
Optical Network 
Injection 80GBps/Node 
Bisection 220Terabps 
 
x1000!
Towards Extreme-scale 
Supercomputers and BigData Machines 
• Computation 
– Increase in Parallelism, Heterogeneity, Density 
• Multi-core, Many-core processors 
• Heterogeneous processors 
• Hierarchial Memory/Storage Architecture 
– NVM (Non-Volatile Memory), 
SCM (Storage Class Memory) 
• 61C8##3 #CDD B1 #BLB1 # 
8 3#LJ 
– Next-gen HDDs (SMR), 
Tapes (LTFS) 
( 
Algorithm 
Network 
Locality 
Power 
FT Productivity 
Storage Hierarchy 
I/O 
Problems 
Scalability 
Heterogeneity
Extreme Big Data (EBD) 
Next Generation Big Data 
Infrastructure Technologies Towards 
Yottabyte/Year  
Principal Investigator 
Satoshi Matsuoka 
Global Scientific Information and 
Computing Center 
Tokyo Institute of Technology 
2014/11/05 JST CREST Big Data Symposium
EBE Research Scheme 
Future Non-Silo Extreme Big Data Apps 
Co-Design 
Co-Design 
Co-Design 日本地図13/06/06 22:36 
EBD System Software 
incl. EBD Object System 
NVM/ 
Flash 
NVM/ 
Flash 
NVM/ 
Flash 
DRAM 
DRAM 
DRAM 
2Tbps HBM 
4~6HBM Channels 
1.5TB/s DRAM  
NVM BW 
30PB/s I/O BW Possible 
1 Yottabyte / Year 
TSV Interposer 
NVM/ 
Flash 
NVM/ 
Flash 
NVM/ 
Flash 
DRAM 
DRAM 
DRAM 
EBD Bag 
Cartesian 
Plane 
KV 
S 
KV 
S 
EBD KVS 
1000km 
KV 
S 
Convergent Architecture (Phases 1~4) 
Large Capacity NVM, High-Bisection NW 
Supercomputers 
ComputeBatch-Oriented 
Cloud+IDC 
Very low BW  Efficiencty 
 
PCB 
High Powered 
Main CPU 
Low 
Power 
CPU 
Low 
Power 
CPU 
Introduction 
Problem Domain 
In most living organisms their development molecule called DNA consists of called nucleotides. 
The four bases found A), cytosine (C), Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Algorithm Large Scale 
Metagenomics 
Massive Sensors and 
Data Assimilation in 
Weather Prediction 
Ultra Large Scale 
Graphs and Social 
Infrastructures 
Exascale Big Data HPC  
Graph 
Store 
file:///Users/shirahata/Pictures/日本地図.svg 1/1 ページ
Extreme Big Data (EBD) Team) 
Co-Design EHPC and EBD Apps 
• Satoshi Matsuoka (PI), Toshio 
Endo, Hitoshi Sato (Tokyo 
Tech.) (Tasks 1, 3, 4, 6) 
• Osamu Tatebe (Univ. 
Tsukuba) (Tasks 2, 3) 
• Michihiro Koibuchi (NII) 
(Tasks 1, 2) 
• Yutaka Akiyama, Ken 
Kurokawa (Tokyo Tech, 5-1) 
• Toyotaro Suzumura 
(IBM Lab, 5-2) 
• Takemasa Miyoshi (Riken 
AICS, 5-3)
Tasks(5c1~5c3 Task6 
Problem Domain 
In most their development molecule DNA consists called nucleotides. 
The four A), cytosine Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Introduction 
100,000TimesFoldEBD“Convergent”SystemOverview 
Problem Domain 
EBD(Performance(Modeling(( 
(EvaluaQon 
Task(4 
To decipher the information contained we need to determine the order This task is important for many areas of science and 日本地図medicine. 
13/06/06 22:36 
Cartesian(Plane 
Modern sequencing techniques KVS 
molecule into pieces (called reads) KVS 
processed separately KVS 
to increase sequencing throughput. 
1000km 
Reads must be aligned file:///Users/shirahata/Pictures/日本地図.svg to the 1/1 ページ 
reference 
sequence to determine their position molecule. This process is called alignment. 
Task(3 
EBD(Programming(System( 
Graph(Store 
EBD(ApplicaQon(Coc 
Design(and(ValidaQon( 
Ultra(High(BW((Low(Latency(NVM Ultra(High(BW((Low(Latency(NW( 
Processorcincmemory 3D(stacking 
Large(Scale( 
Genomic( 
CorrelaQon 
Data(AssimilaQon( 
in(Large(Scale(Sensors( 
and(Exascale( 
Atmospherics 
Large(Scale(Graphs( 
and(Social( 
Infrastructure(Apps 
TSUBAME(3.0 
TSUBAME(2.0/2.5 
EBD(“converged”(RealcTime( 
Resource(Scheduling( 
EBD(Distrbuted(Object(Store(on( 
100,000(NVM(Extreme(Compute( 
and(Data(Nodes( 
Task(2 
EBD(Bag 
EBD(KVS( 
Ultra(Parallel((Low(Powe(I/O(EBD( 
“Convergent”(Supercomputer( 
~10TB/s*~100TB/s*~10PB/s 
Task(1
Problem Domain 
In most their development molecule DNA consists called nucleotides. 
The four A), cytosine Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Introduction 
100,000TimesFoldEBD“Convergent”SystemOverview 
Problem Domain 
To decipher the information contained we need to determine the order This task is important for many areas of science and medicine. 
Modern sequencing techniques molecule into pieces (called reads) processed separately to increase sequencing throughput. 
Reads must be aligned to the reference 
sequence to determine their position molecule. This process is called alignment. 
SQLforEBD Xpregel(Graph) 
Workflow/ScripIngLanguagesforEBD MapReduceforEBD 
日本地図13/06/06 22:36 
Cartesian(Plane 
KVS 
EBDBurstI/OBuffer EBDNetworkTopologyandRouIng 
KVS 
1000km 
KVS 
file:///Users/shirahata/Pictures/日本地図.svg 1/1 ページ 
MessagePassing(MPI,X10)forEBD 
EBD(Bag 
EBDFileSystem EBDDataObject 
Graph(Store 
Cloud(Datacenter 
Large(Scale(Genomic( 
CorrelaQon 
Data(AssimilaQon( 
in(Large(Scale(Sensors( 
and(Exascale( 
Atmospherics 
Large(Scale(Graphs( 
and(Social( 
Infrastructure(Apps 
TSUBAME(3.0( TSUBAMEcGoldenBox 
EBD(KVS( 
Interconnect 
(InfiniBand(100GbE) 
EBDAbstractDataModels 
(Distributed(Array,(Key(Value,(Sparse(Data(Model,(Tree,(etc.) 
EBDAlgorithmKernels(( 
(Search/(Sort,(Matching,(Graph(Traversals,(,(etc.)( 
NVM 
(61C8##3 #CDD B1 # 
BLB1 #8 3#LJ)( 
HPCStorage 
PGAS/GlobalArrayforEBD 
Network 
(SINET5) 
Intercloud(/(Grid((HPCI) 
WebObjectStorage
Hamar((Highly(Accelerated(Map(Reduce)( 
[Shirahata,(Sato(et(al.(Cluster2014] 
• A((soqware(framework(for(largecscale(supercomputers(( 
w/(manyccore(accelerators(and(local(NVM(devices( 
– AbstracQon(for(deepening(memory(hierarchy( 
• Device(memory(on(GPUs,(DRAM,(Flash(devices,(etc.(( 
• Features( 
– Objectcoriented(( 
• C++cbased(implementaQon( 
• Easy(adaptaQon(to(modern(commodity(( 
manyccore(accelerator/Flash(devices(w/(SDKs( 
– CUDA,(OpenNVM,(etc.( 
– Weakcscaling(over(1000(GPUs(( 
• TSUBAME2( 
– Outcofccore(GPU(data(management( 
• OpQmized(data(streaming(between(( 
device/host(memory( 
• GPUcbased(external(sorQng( 
– OpQmized(data(formats(for(( 
manyccore(accelerators( 
• Similar(to(JDS(format(
Hamar(Overview 
Rank(0 Rank(1 Rank(n 
Map 
Local(Array Local(Array Local(Array Local(Array 
Distributed(Array 
Reduce 
Map 
Reduce 
Map 
Reduce 
Shuffle 
Shuffle 
Data(Transfer(between(ranks 
Shuffle 
Shuffle 
Local(Array Local(Array Local(Array Local(Array 
Local(Array(on(NVM Local(Array(on(NVM Local(Array(on(NVMVirtualizedL(oDcaalt(Aar(rOayb(ojne(NcVtM 
Device(GPU)( 
Data 
Host(CPU)( 
Data Memcpy(( 
(H2D,(D2H)
ApplicaQon(Example(:(GIMcV( 
Generalized(IteraQve(MatrixcVector(mulQplicaQon*1  
• Easy(descripQon(of(various(graph(algorithms(by(implemenQng( 
combine2,(combineAll,(assign(funcQons( 
• PageRank,(Random(Walk(Restart,(Connected(Component( 
– v’#=#M#×G#v((where( 
v’i(=(assign(vj(,(combineAllj(({xj#|(j#=(1..n,(xj#=(combine2(mi,j,(vj)}))(((i(=(1..n)( 
– IteraQve(2(phases(MapReduce(operaQons( 
Straighyorward(implementaQon(using(Hamar 
v’ . ×G i mi,j 
vj 
v’ M 
combine2(stage1) 
combineAll 
andassign(stage2) 
assign v 
*1(:(Kang,(U.(et(al,(“PEGASUS:(A(PetacScale(Graph(Mining(Systemc(ImplementaQon(( 
and(ObservaQons”,(IEEE(INTERNATIONAL(CONFERENCE(ON(DATA(MINING(2009

Contenu connexe

Tendances

20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multiKohei KaiGai
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_PlaceKohei KaiGai
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 

Tendances (20)

20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 

Similaire à Japan Lustre User Group 2014

¿Es posible construir el Airbus de la Supercomputación en Europa?
¿Es posible construir el Airbus de la Supercomputación en Europa?¿Es posible construir el Airbus de la Supercomputación en Europa?
¿Es posible construir el Airbus de la Supercomputación en Europa?AMETIC
 
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Tracy Chen
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...NECST Lab @ Politecnico di Milano
 
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...EUDAT
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
cisco-n3k-c3232c-datasheet.pdf
cisco-n3k-c3232c-datasheet.pdfcisco-n3k-c3232c-datasheet.pdf
cisco-n3k-c3232c-datasheet.pdfHi-Network.com
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
DevLove k8s nobusue 20180711
DevLove k8s nobusue 20180711DevLove k8s nobusue 20180711
DevLove k8s nobusue 20180711Nobuhiro Sue
 
Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japaninside-BigData.com
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...Zalando adtech lab
 
Dimensioning of IP Backbone
Dimensioning of IP BackboneDimensioning of IP Backbone
Dimensioning of IP BackboneEM Archieve
 
TEF_GigFire_0 5
TEF_GigFire_0 5TEF_GigFire_0 5
TEF_GigFire_0 5Raj Ojha
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDLarry Smarr
 
National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)The HDF-EOS Tools and Information Center
 

Similaire à Japan Lustre User Group 2014 (20)

¿Es posible construir el Airbus de la Supercomputación en Europa?
¿Es posible construir el Airbus de la Supercomputación en Europa?¿Es posible construir el Airbus de la Supercomputación en Europa?
¿Es posible construir el Airbus de la Supercomputación en Europa?
 
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...
EODATASERVICE.ORG - Digital Earth Platform to enable Muti-disciplinary Geospa...
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
cisco-n3k-c3232c-datasheet.pdf
cisco-n3k-c3232c-datasheet.pdfcisco-n3k-c3232c-datasheet.pdf
cisco-n3k-c3232c-datasheet.pdf
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
DevLove k8s nobusue 20180711
DevLove k8s nobusue 20180711DevLove k8s nobusue 20180711
DevLove k8s nobusue 20180711
 
Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japan
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
 
Dimensioning of IP Backbone
Dimensioning of IP BackboneDimensioning of IP Backbone
Dimensioning of IP Backbone
 
TEF_GigFire_0 5
TEF_GigFire_0 5TEF_GigFire_0 5
TEF_GigFire_0 5
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)
 

Plus de Hitoshi Sato

Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習Hitoshi Sato
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会Hitoshi Sato
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMNHitoshi Sato
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus WorkshopHitoshi Sato
 

Plus de Hitoshi Sato (6)

Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
 
GTC Japan 2017
GTC Japan 2017GTC Japan 2017
GTC Japan 2017
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus Workshop
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 

Dernier

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 

Dernier (20)

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 

Japan Lustre User Group 2014

  • 1. Extreme Big Data (EBD) Convergence of Extreme Computing and Big Data Technologies 7RUIHRCJPLTPBJTMU:SHPUTHTK3USV=PTN3LTL: DUQ@UTXP=LUMDLJOTURUN@ 8PUXOPCHU
  • 2. Big Data Examples Rates and Volumes are extremely immense Social NW • Facebook – 1 billion users – Average 130 friends – 30 billion pieces of content shared per month • Twitter – 500 million active users – 340 million tweets per day • Internet – 300 million new websites per year – 48 hours of video to YouTube per minute – 30,000 YouTube videos played per second Genomics Social Simulation
  • 3. Sequencing)data)(bp)/$) )becomes)x4000)per)5)years) c.f.,)HPC)x33)in)5)years ).121 ,.1(,120, .2252 • Applications – Target Area: Planet (Open Street Map) – 7 billion people • Input Data – Road Network for Planet: 300GB (XML) – Trip data for 7 billion people 10KB (1trip) x 7 billion = 70TB – Real-Time Streaming Data (e.g., Social sensor, physical data) • Simulated Output for 1 Iteration – 700TB Weather A#1.'Quality'Control A#2.'Data'Processing 30#sec'Ensemble' Forecast'Simulations 2'PFLOP Ensemble Data'Assimilation 2'PFLOP Himawari 500MB/2.5min Ensemble'Forecasts
  • 7. B#1.'Quality'Control B#2.'Data'Processing Analysis'Data 2GB 30#min Forecast'Simulation 1.2'PFLOP 30#min Forecast 2GB Repeat'every'30'sec.
  • 8. Big Data Examples Rates and Volumes are extremely immense Social NW • Facebook Future “Extreme Big Data” • NOT mining Tbytes Silo Data • Peta~Zetabytes of Data • Ultra High-BW Data Stream • Highly Unstructured, Irregular • Complex correlations between data from multiple sources • Extreme Capacity, Bandwidth, Compute All Required – 1 billion users – Average 130 friends – 30 billion pieces of content shared per month • Twitter – 500 million active users – 340 million tweets per day • Internet – 300 million new websites per year – 48 hours of video to YouTube per minute – 30,000 YouTube videos played per second Genomics Social Simulation
  • 9. Sequencing)data)(bp)/$) )becomes)x4000)per)5)years) c.f.,)HPC)x33)in)5)years ).121 ,.1(,120, .2252 • Applications – Target Area: Planet (Open Street Map) – 7 billion people • Input Data – Road Network for Planet: 300GB (XML) – Trip data for 7 billion people 10KB (1trip) x 7 billion = 70TB – Real-Time Streaming Data (e.g., Social sensor, physical data) • Simulated Output for 1 Iteration – 700TB Weather A#1.'Quality'Control A#2.'Data'Processing 30#sec'Ensemble' Forecast'Simulations 2'PFLOP Ensemble Data'Assimilation 2'PFLOP Himawari 500MB/2.5min Ensemble'Forecasts
  • 10. 200GB
  • 12. 200GB
  • 13. B#1.'Quality'Control B#2.'Data'Processing Analysis'Data 2GB 30#min Forecast'Simulation 1.2'PFLOP 30#min Forecast 2GB Repeat'every'30'sec.
  • 15. A: 0.57, B: 0.19 C: 0.19, D: 0.05 November(15,(2010( Graph500TakesAimataNewKindofHPC RichardMurphy(SandiaNL=Micron) “(IexpectthatthisrankingmayatImeslookvery differentfromtheTOP500list.Cloudarchitectureswill almostcertainlydominateamajorchunkofpartofthe list.”( The 8th Graph500 List (June2014): K Computer #1, TSUBAME2 #12 Koji Ueno, Tokyo Institute of Technology/RIKEN AICS RIKEN Advanced Institute for Computational Science (AICS)’s K computer is ranked No.1 Reality: Top500 Supercomputers Dominate on the Graph500 Ranking of Supercomputers with 17977.1 GE/s on Scale 40 on the 8th Graph500 list published at the International Supercomputing Conference, June 22, 2014. Congratulations from the Graph500 Executive Committee #1 K Computer Global Scientific Information and Computing Center, Tokyo Institute of Technology’s TSUBAME 2.5 is ranked No.12 on the Graph500 Ranking of Supercomputers with 1280.43 GE/s on Scale 36 on the 8th Graph500 list published at the International Supercomputing Conference, June 22, 2014. Congratulations from the Graph500 Executive Committee #12 TSUBAME2 No Cloud IDCs at all
  • 16. -‐‑‒ :2 2222: 22-‐‑‒DUQ@UDLJO (.#6C6# ,.#646# ( UTDUV, :=T) ( !:2 ( /TUKLX#:HJQX 3#E0TLRGLUTG,-‐‑‒. -‐‑‒JU:LXC 7#E0!F41DLXRH) GC 5 0,/7244B ( 8A CC40-‐‑‒ 72C)
  • 17. 2 CA4B6=RRIPXLJPUTVPJHRTBTPIHTK 2 CC40) D2 8440.#2=X:L#7#6C DHVL0#2CU:HNLDLQC/, C)
  • 18. TSUBAME2 System Overview 11PB (7PB HDD, 4PB Tape, 200TB SSD) ++CompuIngNodes- 17.1PFlops(SFP),5.76PFlops(DFP),224.69TFlops(CPU),~100TBMEM,~200TBSSD Edge(Switch( Edge(Switch((/w(10GbE(ports)( QDR(IB(×4)(×(20 QDR(IB((×4)(×(8 10GbE(×(2 Core(Switch( SFA10k(#1 SFA10k(#2 SFA10k(#3 SFA10k(#4 GPFS+Tape Lustre Home /data0( /work0 /work1(((((/gscr “Global(Work(Space”(#1 SFA10k(#5 “Global(Work( Space”(#2 “Global(Work(Space”(#3 GPFS#1 GPFS#2 GPFS#3 GPFS#4 HOME System( applicaQon “cNFS/Clusterd(Samba(w/(GPFS”(( HOME iSCSI “NFS/CIFS/iSCSI(by(BlueARC”(( Infiniband(QDR(Networks SFA10k(#6 1.2PB 3.6PB Parallel(File(System(Volumes Home(Volumes /data1( (( (( Thin(nodes 1408nodes((((32nodes(x44(Racks)( HP(Proliant(SL390s(G7(1408nodes++++++++++++ CPU:(Intel(WestmerecEP((2.93GHz(( (((((((((6cores(×(2(=(12cores/node( GPU:(NVIDIA(Tesla(K20X,(3GPUs/node( Mem:(54GB((96GB)( SSD:((60GB(x(2(=(120GB((120GB(x(2(=(240GB)+( Medium(nodes HP(Proliant(DL580(G7(24nodes(( CPU:(Intel(NehalemcEX(2.0GHz( (((((((((8cores(×(2(=(32cores/node( GPU:(NVIDIA((Tesla(S1070,(( ((((((((((NextIO(vCORE(Express(2070( Mem:128GB( SSD:(120GB(x(4(=(480GB( (( Fat(nodes HP(Proliant(DL580(G7(10nodes( CPU:(Intel(NehalemcEX(2.0GHz( (((((((((8cores(×(2(=(32cores/node(( GPU:(NVIDIA(Tesla(S1070( Mem:(256GB((512GB)( SSD:(120GB(x(4(=(480GB( ,,,,,, Interconnets:+Full_bisecIonOpIcalQDRInfinibandNetwork (( Voltaire(Grid(Director(4700((×12( IB(QDR:(324(ports( (( (( Voltaire(Grid(Director(4036(×179( IB(QDR(:(36(ports( Voltaire((Grid(Director(4036E(×6( IB(QDR:34ports((( 10GbE:((2port( 12switches( 179switches( 6switches( 2.4PBHDD+ 4PBTape Local SSDs
  • 19. TSUBAME2 System Overview 11PB (7PB HDD, 4PB Tape, 200TB SSD) ++CompuIngNodes- 17.1PFlops(SFP),5.76PFlops(DFP),224.69TFlops(CPU),~100TBMEM,~200TBSSD Edge(Switch( Finecgrained(R/W(I/O( (check(point,(temporal(files) Edge(Switch((/w(10GbE(ports)( QDR(IB(×4)(×(20 QDR(IB((×4)(×(8 10GbE(×(2 Core(Switch( SFA10k(#1 SFA10k(#2 SFA10k(#3 SFA10k(#4 GPFS+Tape Lustre Home /data0( /work0 /work1(((((/gscr “Global(Work(Space”(#1 SFA10k(#5 “Global(Work( Space”(#2 “Global(Work(Space”(#3 GPFS#1 GPFS#2 GPFS#3 GPFS#4 HOME System( applicaQon “cNFS/Clusterd(Samba(w/(GPFS”(( HOME iSCSI “NFS/CIFS/iSCSI(by(BlueARC”(( Infiniband(QDR(Networks SFA10k(#6 1.2PB 3.6PB Parallel(File(System(Volumes Home(Volumes /data1( (( (( Thin(nodes 1408nodes((((32nodes(x44(Racks)( HP(Proliant(SL390s(G7(1408nodes++++++++++++ CPU:(Intel(WestmerecEP((2.93GHz(( (((((((((6cores(×(2(=(12cores/node( GPU:(NVIDIA(Tesla(K20X,(3GPUs/node( Mem:(54GB((96GB)( SSD:((60GB(x(2(=(120GB((120GB(x(2(=(240GB)+( Medium(nodes HP(Proliant(DL580(G7(24nodes(( CPU:(Intel(NehalemcEX(2.0GHz( (((((((((8cores(×(2(=(32cores/node( GPU:(NVIDIA((Tesla(S1070,(( ((((((((((NextIO(vCORE(Express(2070( Mem:128GB( SSD:(120GB(x(4(=(480GB( (( Fat(nodes HP(Proliant(DL580(G7(10nodes( CPU:(Intel(NehalemcEX(2.0GHz( (((((((((8cores(×(2(=(32cores/node(( GPU:(NVIDIA(Tesla(S1070( Mem:(256GB((512GB)( SSD:(120GB(x(4(=(480GB( ,,,,,, Interconnets:+Full_bisecIonOpIcalQDRInfinibandNetwork (( Voltaire(Grid(Director(4700((×12( IB(QDR:(324(ports( (( (( Voltaire(Grid(Director(4036(×179( IB(QDR(:(36(ports( Voltaire((Grid(Director(4036E(×6( IB(QDR:34ports((( 10GbE:((2port( 12switches( 179switches( 6switches( 2.4PBHDD+ 4PBTape Local SSDs Read(mostly(I/O(( (datacintensive(apps,(parallel(workflow,( parameter(survey) • (Home(storage(for(compuQng(nodes( • (Cloudcbased(campus(storage(services Backup Finecgrained(R/W(I/O( (check(point,(temporal(files)
  • 20. TSUBAME2 Storage Usage Since Nov. 2010
  • 21. DCE21 5),HXH2PN4HHTM:HX:=J=:L • 7#EIHXLK HT@JU:L1JJLRL:HU: – !F41DLXRH) G))JH:KX • 8=NL LSU:@FUR=SLXVL:!UKL – ,/72VL:TUKLC( /TUKLX • 6HD:LL4=HRBHPRA4BTBTP2HTK – ) DIVXUMM=RRIPXLJPUTIHTK?PKO • UJHRCC4KLPJLXVL:TUKL – ) D2PTUHR • H:NLXJHRLCU:HNLC@XLSX – =X:L#7#6C – .#2UM844X##2UMDHVLX
  • 22. DCE21 5),HXH2PN4HHTM:HX:=J=:L • 7#EIHXLK HT@JU:L1JJLRL:HU: – !F41DLXRH) G))JH:KX • 8=NL LSU:@FUR=SLXVL:!UKL – ,/72VL:TUKLC( /TUKLX • 6HD:LL4=HRBHPRA4BTBTP2HTK – ) DIVXUMM=RRIPXLJPUTIHTK?PKO • UJHRCC4KLPJLXVL:TUKL – ) D2PTUHR • H:NLXJHRLCU:HNLC@XLSX – =X:L#7#6C – .#2UM844X##2UMDHVLX
  • 23. A Major Northern Japanese Cloud Datacenter (2013) 10GbE 10GbE Juniper(MX480 Juniper(MX480 2(zone(switches((Virtual(Chassis) 10GbE Juniper(EX8208 Juniper(EX8208 Juniper( EX4200 Juniper( EX4200 Zone((700(nodes) Juniper( EX4200 Juniper( EX4200 Zone((700(nodes) Juniper( EX4200 10GbE Juniper( EX4200 Zone((700(nodes) LACP the(Internet 8 zones, Total 5600 nodes, Injection 1GBps/Node Bisection 160Gigabps Supercomputer Tokyo Tech. Tsubame 2.0 #4 Top500 (2010) Advanced Silicon Photonics 40G single CMOS Die 1490nm DFB 100km Fiber ~1500 nodes compute storage Full Bisection Multi-Rail Optical Network Injection 80GBps/Node Bisection 220Terabps x1000!
  • 24. Towards Extreme-scale Supercomputers and BigData Machines • Computation – Increase in Parallelism, Heterogeneity, Density • Multi-core, Many-core processors • Heterogeneous processors • Hierarchial Memory/Storage Architecture – NVM (Non-Volatile Memory), SCM (Storage Class Memory) • 61C8##3 #CDD B1 #BLB1 # 8 3#LJ – Next-gen HDDs (SMR), Tapes (LTFS) ( Algorithm Network Locality Power FT Productivity Storage Hierarchy I/O Problems Scalability Heterogeneity
  • 25. Extreme Big Data (EBD) Next Generation Big Data Infrastructure Technologies Towards Yottabyte/Year Principal Investigator Satoshi Matsuoka Global Scientific Information and Computing Center Tokyo Institute of Technology 2014/11/05 JST CREST Big Data Symposium
  • 26. EBE Research Scheme Future Non-Silo Extreme Big Data Apps Co-Design Co-Design Co-Design 日本地図13/06/06 22:36 EBD System Software incl. EBD Object System NVM/ Flash NVM/ Flash NVM/ Flash DRAM DRAM DRAM 2Tbps HBM 4~6HBM Channels 1.5TB/s DRAM NVM BW 30PB/s I/O BW Possible 1 Yottabyte / Year TSV Interposer NVM/ Flash NVM/ Flash NVM/ Flash DRAM DRAM DRAM EBD Bag Cartesian Plane KV S KV S EBD KVS 1000km KV S Convergent Architecture (Phases 1~4) Large Capacity NVM, High-Bisection NW Supercomputers ComputeBatch-Oriented Cloud+IDC Very low BW Efficiencty PCB High Powered Main CPU Low Power CPU Low Power CPU Introduction Problem Domain In most living organisms their development molecule called DNA consists of called nucleotides. The four bases found A), cytosine (C), Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Algorithm Large Scale Metagenomics Massive Sensors and Data Assimilation in Weather Prediction Ultra Large Scale Graphs and Social Infrastructures Exascale Big Data HPC Graph Store file:///Users/shirahata/Pictures/日本地図.svg 1/1 ページ
  • 27. Extreme Big Data (EBD) Team) Co-Design EHPC and EBD Apps • Satoshi Matsuoka (PI), Toshio Endo, Hitoshi Sato (Tokyo Tech.) (Tasks 1, 3, 4, 6) • Osamu Tatebe (Univ. Tsukuba) (Tasks 2, 3) • Michihiro Koibuchi (NII) (Tasks 1, 2) • Yutaka Akiyama, Ken Kurokawa (Tokyo Tech, 5-1) • Toyotaro Suzumura (IBM Lab, 5-2) • Takemasa Miyoshi (Riken AICS, 5-3)
  • 28. Tasks(5c1~5c3 Task6 Problem Domain In most their development molecule DNA consists called nucleotides. The four A), cytosine Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Introduction 100,000TimesFoldEBD“Convergent”SystemOverview Problem Domain EBD(Performance(Modeling(( (EvaluaQon Task(4 To decipher the information contained we need to determine the order This task is important for many areas of science and 日本地図medicine. 13/06/06 22:36 Cartesian(Plane Modern sequencing techniques KVS molecule into pieces (called reads) KVS processed separately KVS to increase sequencing throughput. 1000km Reads must be aligned file:///Users/shirahata/Pictures/日本地図.svg to the 1/1 ページ reference sequence to determine their position molecule. This process is called alignment. Task(3 EBD(Programming(System( Graph(Store EBD(ApplicaQon(Coc Design(and(ValidaQon( Ultra(High(BW((Low(Latency(NVM Ultra(High(BW((Low(Latency(NW( Processorcincmemory 3D(stacking Large(Scale( Genomic( CorrelaQon Data(AssimilaQon( in(Large(Scale(Sensors( and(Exascale( Atmospherics Large(Scale(Graphs( and(Social( Infrastructure(Apps TSUBAME(3.0 TSUBAME(2.0/2.5 EBD(“converged”(RealcTime( Resource(Scheduling( EBD(Distrbuted(Object(Store(on( 100,000(NVM(Extreme(Compute( and(Data(Nodes( Task(2 EBD(Bag EBD(KVS( Ultra(Parallel((Low(Powe(I/O(EBD( “Convergent”(Supercomputer( ~10TB/s*~100TB/s*~10PB/s Task(1
  • 29. Problem Domain In most their development molecule DNA consists called nucleotides. The four A), cytosine Aleksandr Drozd, Naoya Maruyama, Satoshi Matsuoka A(TMITuEltCi HG)PU Read Alignment Introduction 100,000TimesFoldEBD“Convergent”SystemOverview Problem Domain To decipher the information contained we need to determine the order This task is important for many areas of science and medicine. Modern sequencing techniques molecule into pieces (called reads) processed separately to increase sequencing throughput. Reads must be aligned to the reference sequence to determine their position molecule. This process is called alignment. SQLforEBD Xpregel(Graph) Workflow/ScripIngLanguagesforEBD MapReduceforEBD 日本地図13/06/06 22:36 Cartesian(Plane KVS EBDBurstI/OBuffer EBDNetworkTopologyandRouIng KVS 1000km KVS file:///Users/shirahata/Pictures/日本地図.svg 1/1 ページ MessagePassing(MPI,X10)forEBD EBD(Bag EBDFileSystem EBDDataObject Graph(Store Cloud(Datacenter Large(Scale(Genomic( CorrelaQon Data(AssimilaQon( in(Large(Scale(Sensors( and(Exascale( Atmospherics Large(Scale(Graphs( and(Social( Infrastructure(Apps TSUBAME(3.0( TSUBAMEcGoldenBox EBD(KVS( Interconnect (InfiniBand(100GbE) EBDAbstractDataModels (Distributed(Array,(Key(Value,(Sparse(Data(Model,(Tree,(etc.) EBDAlgorithmKernels(( (Search/(Sort,(Matching,(Graph(Traversals,(,(etc.)( NVM (61C8##3 #CDD B1 # BLB1 #8 3#LJ)( HPCStorage PGAS/GlobalArrayforEBD Network (SINET5) Intercloud(/(Grid((HPCI) WebObjectStorage
  • 30. Hamar((Highly(Accelerated(Map(Reduce)( [Shirahata,(Sato(et(al.(Cluster2014] • A((soqware(framework(for(largecscale(supercomputers(( w/(manyccore(accelerators(and(local(NVM(devices( – AbstracQon(for(deepening(memory(hierarchy( • Device(memory(on(GPUs,(DRAM,(Flash(devices,(etc.(( • Features( – Objectcoriented(( • C++cbased(implementaQon( • Easy(adaptaQon(to(modern(commodity(( manyccore(accelerator/Flash(devices(w/(SDKs( – CUDA,(OpenNVM,(etc.( – Weakcscaling(over(1000(GPUs(( • TSUBAME2( – Outcofccore(GPU(data(management( • OpQmized(data(streaming(between(( device/host(memory( • GPUcbased(external(sorQng( – OpQmized(data(formats(for(( manyccore(accelerators( • Similar(to(JDS(format(
  • 31. Hamar(Overview Rank(0 Rank(1 Rank(n Map Local(Array Local(Array Local(Array Local(Array Distributed(Array Reduce Map Reduce Map Reduce Shuffle Shuffle Data(Transfer(between(ranks Shuffle Shuffle Local(Array Local(Array Local(Array Local(Array Local(Array(on(NVM Local(Array(on(NVM Local(Array(on(NVMVirtualizedL(oDcaalt(Aar(rOayb(ojne(NcVtM Device(GPU)( Data Host(CPU)( Data Memcpy(( (H2D,(D2H)
  • 32. ApplicaQon(Example(:(GIMcV( Generalized(IteraQve(MatrixcVector(mulQplicaQon*1 • Easy(descripQon(of(various(graph(algorithms(by(implemenQng( combine2,(combineAll,(assign(funcQons( • PageRank,(Random(Walk(Restart,(Connected(Component( – v’#=#M#×G#v((where( v’i(=(assign(vj(,(combineAllj(({xj#|(j#=(1..n,(xj#=(combine2(mi,j,(vj)}))(((i(=(1..n)( – IteraQve(2(phases(MapReduce(operaQons( Straighyorward(implementaQon(using(Hamar v’ . ×G i mi,j vj v’ M combine2(stage1) combineAll andassign(stage2) assign v *1(:(Kang,(U.(et(al,(“PEGASUS:(A(PetacScale(Graph(Mining(Systemc(ImplementaQon(( and(ObservaQons”,(IEEE(INTERNATIONAL(CONFERENCE(ON(DATA(MINING(2009
  • 33. MapReducecbased(Graph(Processing( with(Outcofccore(Support(on(GPUs 3000( 2500( 2000( 1500( 1000( 500( 0( Weakscalingperformance 1CPU((S23(per(node)( 1GPU((S23(per(node)( 2CPUs((S24(per(node)( 2GPUs((S24(per(node)( 3GPUs((S24(per(node)( 0( 200( 400( 600( 800( 1000( 1200( Performance[MEdges/sec] NumberofComputeNodes 2.81(GE/s(on(( 3072(GPUs(( (SCALE(34) 2.10x(Speedup( (3(GPU(v(2CPU) Bemer • Hierarchical(memory(management(for(largecscale(( graph(processing(using(mulQcGPUs( – Support(outcofccore(processing(on(GPU( – Overlapping(computaQon(and(CPUcGPU(communicaQon( • PageRank(applicaQon(on(TSUBAME(2.5(
  • 34. EBD-IO Device: A Prototype of Local Storage reliable storage designs for resilient extreme scale computing. Configuration [Shirahata, Sato et al. GTC2014] 3.2 Burst Buffer System To solve the problems in a flat buffer system, we consider a burst buffer system [21]. A burst buffer is a storage space to bridge the gap in latency and bandwidth between node-local stor-age High(Bandwidth(and(IOPS,(Huge(Capacity,(Low(Cost,(Power(Efficient(( 16cardsofmSATASSDdevices and the PFS, and is shared by a subset of compute nodes. Although additional nodes are required, a burst buffer can offer a system many advantages including higher reliability and effi-ciency Capacity:256GBx16→4TBReadBW:0.5GB/sx16→8GB/s over a flat buffer system. A burst buffer system is more reliable for checkpointing because burst buffers are located on a smaller number of dedicated I/O nodes, so the probability of lost checkpoints is decreased. In addition, even if a large number of compute nodes fail concurrently, an application can still ac-cess the checkpoints from the burst buffer. A burst buffer system provides more efficient utilization of storage resources for partial restart of uncoordinated checkpointing because processes involv-ing restart can exploit higher storage bandwidth. For example, if compute node 1 and 3 are in the same cluster, and both restart from a failure, the processes can utilize all SSD bandwidth unlike a flat buffer system. This capability accelerates the partial restart of uncoordinated checkpoint/restart. Table 1 Node specification CPU Intel Core i7-3770K CPU (3.50GHz x 4 cores) Memory Cetus DDR3-1600 (16GB) M/B GIGABYTE GA-Z77X-UD5H SSD Crucial m4 msata 256GB CT256M4SSD3 (Peak read: 500MB/s, Peak write: 260MB/s) SATA converter KOUTECH IO-ASS110 mSATA to 2.5’ SATA Device Converter with Metal Fram RAID Card Adaptec RAID 7805Q ASR-7805Q Single A(single(mSATA(SSD( 8(integrated(mSATA(SSDs( RAID(cards( Prototype/Test(machine(
  • 35. Preliminary I/O Performance Evaluation Between GPU and EBD-I/O device Using Matrix-Vector Multiplication 9000( 8000( 7000( 6000( 5000( 4000( 3000( 2000( 1000( 0( Raw(mSATA(4KB( RAID0(1MB( RAID0(64KB( 0( 5( 10( 15( 20( Bandwidth[MB/s] #mSATAs 3.5( 3( 2.5( 2( 1.5( 1( 0.5( 0( Throughuput[GB/s] Raw(8(mSATA( 8(mSATA(RAID0((1MB)( 8(mSATA(RAID0((64KB)( MatrixSize[GB] I/O(Performance(using(FIO( I/O(Performance(with(MatrixcVector(( MulQplicaQon(on(GPU I/OPerformanceofEBD_I/Odevice 7.39GB/s(RAID0) I/OPerformancetoGPU 3.06GB/s(UptoPCI_EBW)
  • 36. Sorting for EBD Plugging in GPUs for large-scale sorting 30 20 10 0 0 500 1000 1500 2000 # of proccesses (2 proccesses per node) Keys/second(billions) HykSort 1thread HykSort 6threads HykSort GPU + 6threads K20x x4 faster than K20x 60 40 20 0 0 500 1000 1500 2000 0 500 1000 1500 2000 # of proccesses (2 proccesses per node) Keys/second(billions) HykSort 6threads HykSort GPU + 6threads PCIe_10 PCIe_100 PCIe_200 PCIe_50 Prediction of our implementation [Shamoto, Sato et al. BigData 2014] • GPU implementation of splitter-based sorting (HykSort) • Weak scaling performance (Grand Challenge on TSUBAME2.5) – 1 ~ 1024 nodes (2 ~ 2048 GPUs) – 2 processes per node and each node has 2GB 64bit integer • Yahoo/Hadoop Terasort: 0.02[TB/s] – Including I/O x1.4 x3.61 x389 0.25 [TB/s] • Performance prediction ! PCIe_#: #GB/s bandwidth x2.2 speedup compared to CPU-based implmentataion when the # of PCI bandwidth increase to 50GB/s of interconnect between CPU and GPU 8.8% reduction of overall runtime when the accelerators work 4 times faster than K20x
  • 37. Graph500Benchmark h{p://www.graph500.org ! New BigData Benchmark based on Large-scale Graph Search for Ranking Supercomputers ! BFS (Breadth First Search) from a single vertex on a static, undirected Kronecker graph with average vertex degree edgegactor (=16). ! Evaluation criteria: TEPS (Traversed Edges Per Second), and problem size that can be solved on a system, minimum execution time. SCALE(and(edgefactor(=16) Input parameters Graph generation Graph construction BFS Validation Results BFS Validation Input parameters Graph generation Graph construction BFS Validation TEPS MedianTEPS 1. GeneraIon - SCALE - edgefactor - SCALE - edgefactor - BFS Time - Traversed edges - TEPS TEPS ratio 64 Iterations - SCALE - edgefactor - SCALE - edgefactor - BFS - Traversed - TEPS Input parameters Graph generation Graph construction TEPS ratio 64 Iterations - SCALE - edgefactor ratio 64 Iterations 2. ConstrucIon 3. BFS x64 • Kronecker(graph( TEPS(raQox64 – 2SCALE(verQces(and(2SCALE+4(edges( – syntheQc(scalecfree(network(
  • 38. Scalable distributed memory BFS for Graph500 Koji Ueno (Tokyo Institute of Technology) et al. ! What’s the best algorithm for distributed memory BFS? We proposed and explored many optimizations. Optimizations SC11 ISC12 SC12 ISC14 2D decomposition ✓ ✓ ✓ ✓ vertex sorting ✓ direction optimization ✓ data compression ✓ ✓ ✓ sparse vector with pop counting ✓ adaptive data representation ✓ overlapped communication ✓ ✓ ✓ ✓ shared memory ✓ GPGPU ✓ ✓ ✓ Utilization for each version 100 317 462 1,280 1400 1200 1000 800 600 400 200 0 SC'11 ISC'12 SC'12 ISC'14 Performance (GTEPS) Graph500 score history of TSUBAME2 Continuous effort to improve performance Optimized for various machines Machine # of nodes Performance K computer 65536 5524 GTEPS TSUBAME2.5 1024 1280 GTEPS TSUBAME-KFC 32 104 GTEPS
  • 39. LargeScaleGraphProcessingUsingNVM 1.Hybrid_BFS(Beamer’11) 2.Proposal [Iwabuchi(Sato(et(al.(BigData2014] Holds(highly(accessed(data Holds(full(size(of(Graph Switching(two(approaches( $'! !%$
  • 41. ( 3.Experiment DRAM NVM Load(highly(accessed(graph(data(before(BFS + ) ( 8EGELIC419+)1 419524#7 419+)1 () ( (+ ( (- (. (/ ) ) =3185 MB-LE#B.0(=3185 ). 9BAE+6ED!5= 6ED!-MB-.BA5ADB.B-=B#I+A. mSATA-SSD RAID Card www.crucial.com/ # ×8 # ,,, www.adaptec.com 4(Qmes(larger(graph(with( 6.9(%(of(degradaQon( Top_down Bomom_up #(of(fronQers:nfron2er#(of(all(verQces:nall,###### ###parameter:α,#β#
  • 42. Results:BFSPerformance The(Graph(500(2014(June( ((((((((((DRAM(+(NVM(model MEM_CRESTNode#2 (Supermicro2027GR_TRF) GraphCrestNode#1 EBD_RH5885v2 (HuaweiTecalRH5885V2) DRAM 128(GB 256(GB 1024(GB NVM ioDrive2(1.2(TB(×(2 EBDcI/O(2TB(×(2 • Tecal(ES3000( 800GBx2,1.2TBx2( • EBDcI/O(4TB(×(2 SCALE( (Total(Data(Size) 30( (500GB) 31( (1TB) 33( (4TB) GTEPS 7.98 13.80( 3.11 MTEPS(/(W 28.88( 35.21 3.42
  • 43. The2ndGreenGraph500list(on(Nov.(2013 • Measures(powercefficient(using(TEPS/W(raQo( • Results(on(various(system(such(as(Huawei’sRH5885v2w/TecalES3000 PCIeSSD800GB*2and1.2TB*2 • hmp://green.graph500.org(
  • 44.
  • 45. Expectations for Next-Gen Storage System (Towards TSUBAME3.0) • Achieving high IOPS – Many apps with massive small I/O ops • graph, etc. • Utilizing NVM devices – Discrete local SSDs on TSUBAME2 – How to aggregate them? • Stability/Reliability as Archival Storage • I/O resource reduction/consolidation – Can we allow a large number of OSSs for achieving ~TB/s throughput – Many constraints • Space, Power, Budget, etc.
  • 46. Current Status • New Approaches • Tsubame 2.0 has pioneered the use of local flash storage as a high-IOPS alternative to an external PFS • Tired and hybrid storage environments, combining (node) local flash with an external PFS • Industry Status • High-performance, high-capacity flash (and other new semiconductor devices) are becoming available at reasonable cost • New approaches/interface to use high-performance devices (e.g. NVMexpress)