Contenu connexe Similaire à Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL (20) Boosting your HTML Apps – Overview of OpenCL and Hello World of WebCL1. Boosting HTML Apps with WebCL
Janakiram Raghumandala
December 10, 2013
DevCon 2013: Dec 9th–11th 2013,
Monte Carlo
2. What is WebCL?
• WebCL is an open API to achieve heterogeneous parallel
computing in HTML5 web browsers.
• It is a JavaScript binding to OpenCL
• Generally used with Canvas element, SIMD computing,
particle simulation
• WebCL enables
of computeintensive web applications.
Source Name Placement
2
| © 2013 Cisco and/or its affiliates. All rights reserved.
3. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
3
| © 2013 Cisco and/or its affiliates. All rights reserved.
4. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
4
| © 2013 Cisco and/or its affiliates. All rights reserved.
5. Motivation
• New mobile apps with high-computing demands
– Augmented Reality
– Video Processing
– 3D Games – lots of calculations and physics
– Computational photography
5
| © 2013 Cisco and/or its affiliates. All rights reserved.
7. Motivation…
• Higher FLOPS per WATT
•
•
•
•
•
RED – CPUs
ORANGE – APUs
Light BLUE – ARM
GREEN – Grid Processors
YELLO – GPUs
• Top-left is the preferred zone
7
| © 2013 Cisco and/or its affiliates. All rights reserved.
10. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
10
| © 2013 Cisco and/or its affiliates. All rights reserved.
12. OpenCL…
• Host contains one or
more OpenCL Devices
(aka Compute Devices)
• A Computer Device
contains one or more
Compute Units ~ Cores
12
| © 2013 Cisco and/or its affiliates. All rights reserved.
13. OpenCL…
• A Compute Unit
contains one or more
processing elements
~ Intel’s hyper threads
• Processing elements
execute code as SIMD
13
| © 2013 Cisco and/or its affiliates. All rights reserved.
14. OpenCL… Work Items & Work Groups
14
| © 2013 Cisco and/or its affiliates. All rights reserved.
15. OpenCL… Execution Model
• Kernel
– Basic unit of executable code, similar to a C function
• Program
– Collection of kernels and functions called by kernels
– Analogous to a dynamic library (run-time linking)
• Command Queue
– Applications queue kernels and data transfers
– Performed in-order or out-of-order
• Work-item ~ Processing Element ~ Thread
– An execution of a kernel by a processing element (~ thread)
• Work-group ~ Compute Unit ~ Core
– A collection of related work-items that execute on a single compute unit (~ core)
15
| © 2013 Cisco and/or its affiliates. All rights reserved.
16. OpenCL… Execution Model
• Kernel
– Basic program element, similar to a C function
• Program
– Collection of kernels and functions called by kernels
– Analogous to a dynamic library (run-time linking)
• Command Queue
– Applications queue kernels and data transfers
– Performed in-order or out-of-order
• Work-item ~ Processing Element ~ Thread
– An execution of a kernel by a processing element (~ thread)
• Work-group ~ Compute Unit ~ Core
– A collection of related work-items that execute on a single compute unit (~ core)
16
| © 2013 Cisco and/or its affiliates. All rights reserved.
17. OpenCL… Memory Model
•
•
•
•
17
| © 2013 Cisco and/or its affiliates. All rights reserved.
Private Memory per PE
Local Memory per Compute Unit
Global Memory per Device
Host Memory per Host System
18. OpenCL:Kernels
OpenCL: Kernels
• OpenCL kernel
– Defined on an N-dimensional computation domain
– Execute a kernel at each point of the computation domain
Traditional Loop
void vectorMult(
const float* a,
const float* b,
float* c,
const unsigned int count)
{
for(int i=0; i<count; i++)
c[i] = a[i] * b[i];
}
18
Data Parallel OpenCL Kernel
kernel void vectorMult(
global const float* a,
global const float* b,
global float* c)
| © 2013 Cisco and/or its affiliates. All rights reserved.
{
int id = get_global_id(0);
c[id] = a[id] * b[id];
}
19. OpenCL:Execution of Programs - Steps
1. Query host for OpenCL devices.
2. Create a context to associate OpenCL devices.
3. Create programs for execution on one or more
associated devices.
4. From the programs, select kernels to execute.
5. Create memory objects accessible from the host
and/or the device.
6. Copy memory data to the device as needed.
7. Provide kernels to the command queue for
execution.
8. Copy results from the device to the host.
19
| © 2013 Cisco and/or its affiliates. All rights reserved.
20. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
20
| © 2013 Cisco and/or its affiliates. All rights reserved.
21. WebCL
• WebCL concepts were deliberately made similar to the
OpenCL model
– Programs
– Kernels
– Linking
– Memory Transfers
21
| © 2013 Cisco and/or its affiliates. All rights reserved.
23. WebCL:Initialization
<script>
var platforms = WebCL.getPlatforms();
var devices = platforms[0].getDevices(WebCL.DEVICE_TYPE_GPU);
var context = WebCL.createContext({ WebCLDevice: devices[0] } );
var queue = context.createCommandQueue();
</script>
23
| © 2013 Cisco and/or its affiliates. All rights reserved.
24. WebCL:Creating a kernel
<script id="squareProgram" type="x-kernel">
__kernel square(
__global float* input,
__global float* output,
const unsigned int count)
{
int i = get_global_id(0);
if(i < count)
output[i] = input[i] * input[i];
}
</script>
<script>
var programSource = getProgramSource("squareProgram");
var program = context.createProgram(programSource);
program.build();
var kernel = program.createKernel("square");
</script>
24
| © 2013 Cisco and/or its affiliates. All rights reserved.
25. WebCL:Running WebCL Kernes
<script> ...
var inputBuf context.createBuffer(WebCL.MEM_READ_ONLY, Float32Array.BYTES_PER_ELEMENT * count);
var outputBuf =
context.createBuffer(WebCL.MEM_WRITE_ONLY, Float32Array.BYTES_PER_ELEMENT * count);
var data = new Float32Array(count); // populate data ...
queue.enqueueWriteBuffer(inputBuf, data, true); //Last arg indicates blocking API
kernel.setKernelArg(0, inputBuf);
kernel.setKernelArg(1, outputBuf);
kernel.setKernelArg(2, count, WebCL.KERNEL_ARG_INT);
var workGroupSize = kernel.getWorkGroupInfo(devices[0], WebCL.KERNEL_WORK_GROUP_SIZE);
queue.enqueueNDRangeKernel(kernel, [count], [workGroupSize]);
queue.finish(); //This API blocks
queue.enqueueReadBuffer(outputBuf, data, true); //Last arg indicates blocking API
</script>
25
| © 2013 Cisco and/or its affiliates. All rights reserved.
26. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
26
| © 2013 Cisco and/or its affiliates. All rights reserved.
28. Contents
– Motivation
– OpenCL, The Underlying Concept
– WebCL
– Performance Comparison & Demos
– Summary, current status and References
28
| © 2013 Cisco and/or its affiliates. All rights reserved.
29. WebCL:Summary
JavaScript binding for OpenCL
Provides high performance parallel processing on multicore CPU & GPGPU
– Portable and efficient access to heterogeneous multicore devices
– Provides a single coherent standard across desktop and mobile devices
WebCL HW and SW requirements
– Need a modified browser with WebCL support
– Hardware, driver, and runtime support for OpenCL
WebCL stays close to the OpenCL standard
– Preserves developer familiarity and facilitates adoption
– Allows developers to translate their OpenCL knowledge to the web environment
– Easier to keep OpenCL and WebCL in sync, as the two evolve
Intended to be an interface above OpenCL
Facilitates layering of higher level abstractions on top of WebCL API
29
| © 2013 Cisco and/or its affiliates. All rights reserved.
30. WebCL:Goals
• Compute intensive web applications
– High Performance
– Platform independent – Standards-compliant
– Ease of application development
30
| © 2013 Cisco and/or its affiliates. All rights reserved.
31. WebCL Standardization – Current Status
• Work in progress
• Working-draft is available
31
| © 2013 Cisco and/or its affiliates. All rights reserved.
Notes de l'éditeur Typically there will be hundres of Processing elements Square function is anOpenCL C based on C99 (ie., ISO/IEC 9899:1999) ----- Meeting Notes (29/11/13 16:11) -----Slow - - lively interactive, explain challenges,show that you are confident,