Testing tools and AI - ideas what to try with some tool examples
AMD Embedded Solutions Guide
1. JAN U A RY 2 0 1 3
PROCESSOR
ADVANCEMENTS
FOR EMBEDDED
OPENCL™
AND PARALLEL
PROCESSING
AMD
E M B E D D ED
APU
SOLUT IONS
G UID E
ANDROID GOES
BEYOND GOOGLE
APUs SOAR IN
REAL-TIME IMAGE
PROCESSING
E XCL USIVE I N S ID E : NEW DE V E L O P M E N T A N D E V A L U A T I O N B O A R D
2. Introducing
open source, infinite possibilities
Unleash Your Inner Inventor With
GizmoSphere, An Embedded
Development Environment &
Community
GizmoSphere enables developers to create
embedded design solutions plus communicate
ideas and shared interests with a global
community of embedded innovators.
At GizmoSphere.org you can:
• Join discussion groups
• Buy the Gizmo Explorer Kit
• View diagrams and read guides
• Access open source software
• Even enter contests!
www.gizmosphere.org
Gizmo Board Features Gizmo Explorer Kit
• Multicore computing and mixed core architecture with Your development kit includes:
a performance capacity of 52.8 GFLOPS • Gizmo, a compact 4”x4” development board powered
• High-speed card edge connector enables PCIe, SATA, by the AMD Embedded G-Series APU
USB, Display Port • Explorer, an expansion board for additional I/O
• Low-speed card edge connector supports SPI, I2C, functionality including SPI, I2C, GPIO, PWM
GPIO, PWM, ADC, DAC • SmartProbe® by Sage, the automated, configurable
• Perfect for development with open source coreboot®, plug-in development tool for AMD-based embedded
including SeaBIOS and SageBIOS™ designs
• Board also supplies JTAG header, VGA video output, • Trial license for the Sage EDK graphical interface (IDE)
Audio input/output, Ethernet, USB and SmartProbe® trial time use
• Supports both high performance PC-style I/O and • On-board SageBIOS™, a distribution of open source
easily accessible embedded I/O coreboot®
• AMD Embedded G-Series APU boasts low power • Installation DVD and Quick Start Guide
consumption – only 6.4W max TDP • Power Supply with U.S.-standard power cord
• Convenient compact size – just 4” x 4” All For The Unbelievably Low Price Of $199!!
• Advanced features include 64-bit processing and
GizmoSphere partners:
hardware virtualization
• Super energy efficient at an amazing 8.25 GFLOPS/W AMD
Sage Electronic Engineering
For information on becoming a GizmoSphere partner visit: www.gizmosphere.org/partners
3. AMD EMBEDDED APU SOLUTIONS GUIDE
APUs and Processor
Advancements
for Embedded
Applications Kelly Gillilan
A
Marketing Manager,
By Kelly Gillilan, Marketing Manager, AMD Embedded Solutions AMD Embedded Solutions
t the beginning of a new tectures able to handle multi-threaded Kelly Gillilan has worked extensively
in embedded applications for most of
year, it’s interesting to applications efficiently. However, there
the past decade. He currently is the
look back at the techni- are limits to how many cores a design Product Marketing Manager for the AMD
cal advancements that can incorporate before the throughput Embedded Solution division, overseeing
have occurred in the pri- performance levels off—that’s because worldwide marketing strategy and
activities. He holds a degree in Computer
or 12 months. of the increase in “overhead” to manage
Engineering and is fluent in Mandarin
Take AMD’s Embedded Accelerated such an architecture. GPUs also have Chinese.
Processing Units (APUs) for example. undergone a significant transforma-
In the past year we launched the high- tion—from simply driving a display to
performance AMD Embedded R-Series driver-based programs to system-based lation. Some of our partners have com-
APU platform consisting of quad- and programming models with power effi- bined the AMD R-Series APU—which
dual-core models running at 2.3 GHz (3.2 ciency enhancements. supports four independent displays—
GHz boost) with AMD Radeon™ 7000 se- The next phase in processor evolu- with the AMD Radeon™ E6760 embed-
ries integrated graphics providing more tion is currently unfolding. AMD’s APUs ded discrete GPU—which supports six
than 570 GFLOPS of performance in combine multiple x86 CPU cores to han- independent displays—to create systems
just a 35 W TDP. Shortly after the launch dle serialized data with dozens or even capable of driving a total of 10 indepen-
of the AMD R-Series APUs, we released hundreds of compute units in the GPU. dent displays. These types of systems are
a new addition to our low-power AMD These cores process parallelized data to ideal solutions for applications such as
Embedded G-Series APU line—the 4.5 provide a heterogeneous system architec- casino gaming and digital signage, where
W TDP AMD T-16R APU for power-sen- ture with excellent performance potential multimedia content must be large, bold,
sitive applications requiring efficient per- in low-power bands. and eye-catching. Metrics for these sys-
formance and high-definition graphics. What does this mean for embedded tems also can be efficiently processed by
These two families of APUs service dif- applications? Over the past year I have using programming languages such as
ferent market segments, but both provide seen our partners develop hardware so- OpenCL™ that compile specifically for
the unique combination of a powerful, yet lutions in very small form factors (such heterogeneous system designs such as
power-efficient CPU with a discreet-level, as Qseven) that are able to drive two those based on AMD’s APUs.
high-performance GPU for a heteroge- full-HD independent displays from a I can’t predict the future, but one thing
neous system architecture. fanless, compact enclosure. Systems like is certain: as these technologies continue to
So how did we get here? We’ve seen these are ideal for powering industrial evolve and new innovations are introduced,
CPUs transition from single-core ar- control and factory automation systems embedded system designers and integrators
chitectures, where performance boosts as the industry phases out old arrays of will capitalize on new application and mar-
typically were accomplished by increas- buttons, knobs, and switches in favor of ket opportunities that can lead to increased
ing clock speed, to multi-core archi- touch-panel controls with 3D manipu- revenue streams.
WWW. AMD . C OM/ EMBED D ED / C ATALOG 03
4. AMD EMBEDDED APU SOLUTIONS GUIDE
AMD EMBEDDED R-SERIES PLATFORM
Delivering exceptional performance in a
power efficient platform.
The AMD Embedded R-Series platform delivers high-performance processing coupled with
a premium high-definition visual experience in a solution that is still power efficient, enabling
unprecedented integrated graphics and multi-display capabilities in embedded applications that
can be compact and low power.
The AMD R-Series APU (Accelerated Processing Unit) is designed to efficiently handle your advanced
multimedia and computational workloads. With average power below 13 watts and discrete-class AMD
Radeon™ graphics performance integrated into the AMD R-Series APU, applications that previously
required a discrete graphics card can be developed in smaller form factors with lower power and cost.
For more demanding graphics applications, AMD Radeon™ Dual Graphics technology can combine
the processing power of AMD R-Series APUs and AMD Radeon™ Embedded 6000 Series GPUs to
more than double graphics performance compared to using discrete graphics alone.
x86 Core
Clock Speed DDR3 x86 AMD-V™
Model Base/Boost L2 Cache GPU Speed Cores UVD1 3 Tech. 2 EVP3 Package Max TDP
AMD Embedded R-Series APU – FS1r2 PGA
R-464L 2.3/3.2 GHz AMD Radeon™ HD 7660G
2MB x 2 4
R-460H 1.9/2.8 GHz AMD Radeon™ HD 7640G FS1r2
DDR3-1600 Yes Yes Yes 35W
R-272F 2.7/3.2 GHz AMD Radeon™ HD 7520G (722-PGA)
1 MB 2
R-268D 2.5/3.0 GHz AMD Radeon™ HD 7420G
AMD Embedded R-Series APU – FP2 BGA
R-460L 2.0/2.8 GHz AMD Radeon™ HD 7620G 25W
2 MB x 2 4
R-452L 1.6/2.4 GHz AMD Radeon™ HD 7600G FP2 19W
DDR3-1600 Yes Yes Yes
R-260H 2.1/2.6 GHz 2 MB AMD Radeon™ HD 7500G (827-BGA)
2 17W
R-252F 1.7/2.3 GHz 1 MB AMD Radeon™ HD 7400G
1. Unified Video Decoder (UVD 3) for hardware decode of high definition video.
2. AMD Virtualization™ technology. When used as part of a DAS 1.0 implementation can improve the performance, reliability and security of
embedded applications.
3. As part of a comprehensive security program, AMD strongly recommends enabling Enhanced Virus Protection (EVP) and using up-to-date third-
party anti-virus software.
Note: Always refer to the processor/chipset data sheets for technical specifications. Feature information in this document is provided for reference only.
04 J ANU ARY 201 3
5. AMD EMBEDDED APU SOLUTIONS GUIDE
micro-ATX Digital Gaming SBC
Motherboard DPX-S430
GMB-A75 • AMD Embedded R-Series Platform
• AMD Radeon™ HD 7000G Series Graphics
• AMD Embedded R-Series Platform integrated
• AMD Radeon™ HD 7000G Series Graphics integrated • AMD A75 Controller Hub
• AMD A75 Controller Hub • Quad and Dual Core APUs
• Built-in SMI 750 GPU to support extension • Comprehensive Gaming features
2 display
• High-performance integrated or PCI Express
• Extended up to 2 PCIex16 slot graphics
• System memory up to 16 GB (DDR3) • Up to 4 independent displays from the chipset
• Comprehensive I/O: 2 x RJ45, 10 x USB (2.0/3.0), • Low power consumption
8 x COM, 4 x VGA, 2 x DVI-D, CF card, and audio
• Small format
• Gaming, Information Appliance, Digital
Signage, Point of Sale
Advantech Advantech-Innocore
PHONE (949) 789-7178 EMAIL sales@advantech-innocore.com PHONE (949) 789-7178 EMAIL sales@advantech-innocore.com
FAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore
Mini-ITX Single Board COM Express / Type 6
Computer conga-TFS
MANO111 • AMD Embedded R-Series APU
• AMD Radeon™ HD 7000G Series Graphics
• AMD Embedded R-Series APU integrated
• AMD Radeon™ HD 7000G Series Graphics integrated • AMD A70 Controller Hub
• AMD A75 Controller Hub • SODIMM, 16GB, DDR3, 2x, 1066/800
• DDR3 Dual channel SO-DIMM 1333/1600 • 7 x PCI Express™
max. up to 16 GB
• 4 x SATA
• 4 SATA-600 support RAID 0,1,5,10
• 4 x USB 3.0, 4 x USB 2.0
• 4 USB 3.0 supported
• High-performance DirectX®11 GPU supports
• 3 independent displays OpenCL™ 1.1 and OpenGL 4.2
• DisplayPort 2 supports multi-stream • Gaming, Server, Information Appliance,
• Gaming, Communications, Industrial Control- Communications, Industrial Controllers,
lers, Medical, Digital Signage, Point of Sale Medical, Digital Signage
Axiomtek congatec Inc.
PHONE (626) 581-3232 EMAIL info@axiomtek.com PHONE (858) 457-2600 EMAIL sales-us@congatec.com
FAX (626) 581-3552 WEB www.axiomtek.com/US FAX (858) 457-2602 WEB www.congatec.us
COM Express Mini-ITX
Compact R2.0, Type 6 Motherboard
CM901-B CM100-C
• AMD Embedded R-Series APU • AMD Embedded R-Series APU
• AMD Radeon™ HD 7000G Series Graphics integrated • AMD Radeon™ HD 7000G Series Graphics
• AMD A70 Controller Hub integrated
• 2 DDR3 SODIMM up to 8GB • AMD A70 Controller Hub
• VGA, LVDS, DDI (DisplayPort, LVDS, VGA) • 2 DDR3 SODIMM up to 8GB
• 1 PCIe x16, 7 PCIe x1 (first 4 PCIe support • 1 HDMI, 2 DVI (1 supports DVI-D signal), 1 LVDS
PCIe x4) • 1 PCIe x16, 2 PCIe x1 gold fingers, 1 Mini PCIe
• 4 SATA 3.0 • 4x SATA 3.0
• 8 USB 2.0 (first 4 USB ports support up to USB 3.0) • 4x USB 3.0, 6x USB 2.0
• Gaming, Information Appliance, Industrial Con- • Gaming, Industrial Controllers, Medical, Digital
trollers, Medical, Digital Signage, Point of Sale Signage, Point of Sale
DFI DFI
PHONE (916) 568-1234 EMAIL sales@dfitech.com PHONE (916) 568-1234 EMAIL sales@dfitech.com
FAX (916) 568-1233 WEB www.dfi.com FAX (916) 568-1233 WEB www.dfi.com
WWW. AMD . C OM/ EMBED D ED / C ATALOG 05
6. AMD EMBEDDED APU SOLUTIONS GUIDE
Digital Signage Mini-ITX
Player Motherboard
SI-38 MI959
• AMD R-Series Quad-Core / Dual-Core APU, • AMD Embedded R-Series APU
up to 35W • AMD Radeon™ HD 7000G Series Graphics
• Integrated AMD Radeon™ 384/240 Cores integrated
DirectX® 11 GPU in Processor • AMD A70 Controller Hub
• Winner of Computex 2012 Design
Innovation Award • 2 x DDR3-1600 Memory, up-to 16GB Dual
• Dual independent 1080p Hybrid DVI-I display outputs Channel
• Supports DDR3 memory up to 16GB • 1 x DVI-, DVI-D, Display Port LVDS
• iSMART - for EuP/ErP power saving, auto- • 2 x Mini PCI-E(x1), PCI-E(x16)
scheduler and power resume • 4 x USB 3.0 + 8 USB 2.0
• Dual Mini PCI-E(x1) slots for WiFi and TV • Server, Communications, Industrial Controllers,
tuner options Medical, Networking, Digital Signage, Point of Sale
• 2x USB 3.0 and serial port (RS232)
IBASE IBASE
PHONE +886-2-2655-7588 EMAIL sales@IBASE.com.tw PHONE +886-2-2655-7588 EMAIL sales@IBASE.com.tw
FAX +886-2-2655-7388 WEB www.IBASE-usa.com FAX +886-2-2655-7388 WEB www.IBASE-usa.com
Mini-ITX Mini-ITX
Motherboard Motherboard
NF82 KTA70M/mITX
• AMD Embedded R-Series APU • AMD Embedded R-Series APU
• AMD A75 Controller Hub • AMD Radeon™ HD 7000G Series Graphics
• 2 x SODIMM Sockets for un-buffered Dual integrated
Channel DDR3 1600 SDRAM up to 16 GB • AMD A70 Controller Hub
• 6 x Serial ATA3 6Gb/s connectors support • 2x SO-DIMM, up to 8GB each (in total 16 GB)
RAID 0, 1 10 functions • 4 independent display outputs
• 1 x PCI x 16 slot, 1 x Mini PCI-E • 10x USB 2.0, 4x USB 3.0
• Embedded 4 x USB 3.0 8 x USB 2.0/1.1 • 1x PCIe x8 1x PCIe x4
• Gaming, Digital Signage, Point Of Sale • 2x SO-DIMM, up to 8GB each (in total 16 GB)
• Medical, Industrial Automation, Gaming,
Digital Signage
JETWAY Information Co., Ltd Kontron America
PHONE +886 2 89132711 EMAIL louis.chang@jetway.com.tw PHONE (858) 677-0877 EMAIL info@us.kontron.com
FAX +886 2 89132722 WEB www jetway.com.tw FAX (858) 677-0898 WEB www.kontron.com
Digital Signage Digital Gaming
Player System
NDiS165 QX-40
• AMD Embedded R-Series APU • AMD Embedded R-Series APU
• AMD Radeon™ HD 7000G Series Graphics • AMD Radeon™ HD 6760 Graphics integrated
integrated • AMD A75 Controller Hub
• AMD A70 Controller Hub • SODIMM, 8GB, DDR3, 2x, 1600/1333/1066
• SODIMM, 16GB, DDR3 non-ECC, 2x, • OpenGL 4.1, DirectX® 11, OpenCL™ 1.1
1333/1066/800 compatible
• 3x HDMI, 1920 x 1080 • Advanced PCI Express® gaming logic SRAM
• 2x Mini-PCIe, X1 FOR WLAN and TV Tuner / MRAM
• 2x SATA, 6.0Gbps, 3.0 compliant • Support for up to 10 independent monitors
• 4x TypeA, USB 3.0, Host, 4x Header, USB • 7x DisplayPort, 2560 x 1600
2.0, Host • 4x DVI, 2560 x 1600
Nexcom Quixant UK Ltd
PHONE (510) 656-2248 EMAIL marketing@nexcom.com PHONE +44 (0) 1223 89296 EMAIL sales@quixant.com
FAX (510) 656-2158 WEB www.nexcom.com FAX +44 (0) 1223 892401 WEB www.quixant.com
06 J ANU ARY 201 3
7. AMD EMBEDDED APU SOLUTIONS GUIDE
Digital Gaming Mini-ITX Single Board
System Computer
QXi-4000 ITX-AT2X21B
• AMD Embedded R-Series APU • AMD Embedded R-Series APU
• AMD Radeon™ HD 7000G Series Graphics integrated • AMD Radeon™ HD 7000G Series Graphics integrated
• AMD A70 Controller Hub • AMD A70 Controller Hub
• Fanless all-in-one PC-based gaming controller • 1 x SO DIMM DDR3 1066/1333MHz, Max
for slot machines up to 8GB
• Supports up to four independent HD monitors • Supports DirectX11, 1080i, 1080P and H.264.
• Advanced PCI Express® gaming logic SRAM/ VC-1, MPEG-G
MRAM • 2x SATA 3Gb/s with power
• 4x, DisplayPort, 2560 x 1600 • 2x MINI PCI-E slot (1pcs for wifi, 1 pcs for SSD)
• 5x TypeA USB 2.0, Host • 8x USB, 4x USB3.0, 4x USB2.0
• 4x SATA, 6.0Gbps, 3.0 compliant, 2 x CFAST • Gaming, Industrial Controllers, Digital Signage,
sockets Thin Clients
Quixant UK Ltd SHENZHEN XINZHIXIN ENTERPRISE DEVELOPMENT CO.,LTD
PHONE +44 (0) 1223 89296 EMAIL sales@quixant.com PHONE 86-13590127552 EMAIL sale@xzx.net.cn
FAX +44 (0) 1223 892401 WEB www.quixant.com WEB www.micputer.com
Mini-ITX Motherboard PICO-ITX Fanless
EMB-A70M Board
• AMD Embedded R-Series APU (dual-core) PICO-HD01
• Supports DDR3 1333MHz memory up to 8GB
• AMD Embedded G-Series Platform
• 4 x HDMI Supporting Full HD display
• AMD A50M Controller Hub
• 1 x Audio Line-out, 1 x Mic-in
• 204-pin SODIMM DDR3 1066MHz up to
• 2 x USB3.0 (Internal) ; 1 x USB2.0 (Rear I/O);
4GB
4 x USB2.0 (Internal)
• One Reatek 8111E for 10/1000/1000Base-TX
• 2 x RJ45 with LEDs for 10/100/1000Mbps
Ethernet • CRT, 18-bit Single Channel LVDS, HDMI
• 2 x COM (COM1: RS-232/422/485; COM2: • SATA 3.0Gb/s x 1, mSATA (Half-Size) Slot x 1
RS-232) Co-lay with Mini Card
• 1 x miniPCIe (Full Size) with SIM Socket • USB 2.0 x 5, COM x 2, 4-bit Digital I/O
• 1 x mini PCIe ( Optional PCIe SATA signal, • Gaming, Communications, Digital Signage,
default for mSATA) Point Of Sale
Aaeon Aaeon
PHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.com
FAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com
EPIC Board 3.5” SubCompact
EPIC-HD07 Board
• AMD Embedded G-Series Platform GENE-HD05
• AMD A55E Controller Hub
• AMD Embedded G-Series Platform
• SODIMM DDR3 1066/1333 MHz, Max 4G
• AMD A50M Controller Hub
• Up to 24-bit Dual Channel LVDS LCD, CRT,
DVI (Optional) • SODIMM DDR3 1066/1333(T56N) up to 4GB
• SATA 3.0Gb/s x 1, mSATA Slot x 1 • CRT, 18/24/36/48-bit LVDS, HDMI
• USB 2.0 x 8, COM x 6, 16-bit Digital I/O • SATA 3.0Gb/s x 2, CFast x 1, mSATA x 1
(Config. by BIOS)
• Expansion: Mini Card x 1, PCI-104 x 1 (Co-lay
with LPT) • USB2.0 x 8, COM x 4, Parallel x 1, 8-bit Digital I/O
• Gaming, Information Appliance, Industrial • Onboard 4/5-wire Resistive Touch Screen
Controllers, Digital Signage, Point of Sale Controller
• Gaming, Information Appliance, Digital
Signage
Aaeon Aaeon
PHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.com
FAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com
WWW. AMD . C OM/ EMBED D ED / C ATALOG 07
8. AMD EMBEDDED APU SOLUTIONS GUIDE
be run on a wide range of systems without
having to rewrite everything for each system.
OpenCLTM OpenCLTM for Unified,
Portable Source Code
Programming for
OpenCL™, the first open and royalty-
free programming standard for general-
purpose parallel computations on heteroge-
neous systems, is quickly growing in popu-
Heterogeneous
larity as a means for developers to preserve
their expensive source code investments and
easily target multi-core CPUs and GPUs.
OpenCL is maintained by the Khronos
Computing Group, a not-for-profit industry consortium
that creates open standards for the author-
ing and acceleration of parallel computing,
Systems: Parallel
graphics, dynamic media, computer vision
and sensor processing on a wide variety of
platforms and devices. Developed in an
open standards committee with representa-
Processing Made
tives from major industry vendors, OpenCL
affords users a cross-vendor, non-proprietary
solution for accelerating their applications
Faster and Easier
across mainstream processing platforms,
and provides the means to tackle major
development challenges, such as maximiz-
ing parallel compute utilization, efficiently
Than Ever
handling data movement and minimizing
dependencies across cores.
Ultimately, OpenCL enables develop-
ers to focus on applications, not just chip
P
architectures, via a single, portable source
By Todd Roberts, Software Manager, Embedded Solutions, AMD code base. When using OpenCL, developers
can use a unified tool chain and language to
arallel processing isn’t really GPUs, which have always been very paral- target all of the parallel processors currently
new. It has been around in lel—counting hundreds of parallel execu- in use. This is done by presenting the devel-
one form or another since the tion units on a single die—have now be- oper with an abstract platform model that
early days of computing. As come increasingly programmable, to the conceptualizes all of these architectures in a
traditional CPUs have be- point that it is now often useful to think similar way, as well as an execution model
come multi-core parallel processors, with of GPUs as many-core processors instead of supporting data and task parallelism across
many cores in a socket, it has become more special purpose accelerators. heterogeneous architectures.
important for developers to embrace par- All of this diversity has been reflected
allel processing architectures as a means in a wide array of tools and programming Key Concepts and
to realize significant system performance models required for programming these Workflows
improvements by taking advantage of the architectures. This has created a dilemma OpenCL has a flexible execution model
extra cores. This move towards parallel for developers. In order to write high- that incorporates both task and data paral-
processing has been complicated by the performance code they have had to write lelism. Tasks themselves are comprised of
diversity and heterogeneity of the various their code specifically for a particular archi- data-parallel kernels, which apply a single
parallel architectures that are now avail- tecture and give up the flexibility of being function over a range of data elements in
able. A heterogeneous system is made up of able to run on different platforms. In order parallel. Data movements between the host
different processors each with specialized for programs to take advantage of increases and compute devices, as well as OpenCL
capabilities. Over the last several years, in parallel processing power, however, they tasks, are coordinated via command queues.
GPUs have been targeted as yet another must be written in a scalable fashion. Devel- An OpenCL command queue is cre-
source of computing power in the system. opers need the ability to write code that can ated by the developer through an API call,
08 J ANU ARY 201 3
9. AMD EMBEDDED APU SOLUTIONS GUIDE
and associated with a specific compute tion of the computation being performed.
Write A Write B
device. To execute a kernel, the kernel is OpenCL kernels are executed over an index
pushed onto a particular command queue. space, which can be 1, 2 or 3 dimensional.
Enqueueing a kernel can be done asynchro- Write C Kernel A Kernel C In Figure 2, we see an example of a 2 di-
nously, so that the host program may en- mensional index space, which has Gx * Gy
queue many different kernels without wait- elements. For every element of the kernel
ing for any of them to complete. When Kernel B Read A index space, a work-item will be executed.
enqueueing a kernel, the developer option- All work items execute the same program,
ally specifies a list of events that must occur Kernel D
although their execution may differ due to
before the kernel executes. If a developer branching based on data characteristics or
wishes to target multiple OpenCL com- the index assigned to each work-item.
pute devices simultaneously, the developer Read B The index space is regularly subdi-
would create multiple command queues. vided into work-groups, which are tilings
Command queues provide a general FIGURE 1 ask Parallelism within a
T of the entire index space. In Figure 2, we
way of specifying relationships between Command Queue. see a work-group of size Sx * Sy elements.
tasks, ensuring that tasks are executed in an Each work-item in the work-group receives
order that satisfies the natural dependencies try to synchronize or share data, since the a work-group id, labeled (wx, wy) in the
in the computation. The OpenCL runtime is runtime provides no guarantee that all work- figure, as well as a local id, labeled (sx, sy)
free to execute tasks in parallel if their depen- items are concurrently executing, and such in the figure. Each work-item also receives
dencies are satisfied, which provides a gen- synchronization easily introduces deadlocks. a global id, which can be derived from its
eral-purpose task parallel execution model. Developers are also free to construct work-group and local ids.
Events are generated by kernel comple- multiple command queues, either for parallel- Work-items in different work-groups
tion, as well as memory read, write, and izing an application across multiple compute may coordinate execution through the
copy commands. This allows the developer devices, or for expressing more parallelism via use of atomic memory transactions, which
to specify a dependence graph between completely independent streams of computa- are an OpenCL extension supported by
kernel executions and memory transfers in tion. OpenCL’s ability to use both data and some OpenCL runtimes. For example,
a particular command queue or between task parallelism simultaneously is a great ben- work-items may append variable num-
command queues themselves, which the efit to parallel application developers, regard- bers of results to a shared queue in global
OpenCL runtime will traverse during ex- less of their intended hardware target. memory. However, it is good practice that
ecution. Figure 1 shows a task graph illus- work-items do not, generally, attempt to
trating the power of this approach, where Kernels communicate directly, as without careful
arrows indicate dependencies between As mentioned, OpenCL kernels pro- design scalability and deadlock they can
tasks. For example, Kernel A will not ex- vide data parallelism. The kernel execution become difficult problems. The hierarchy
ecute until Write A and Write B have fin- model is based on a hierarchical abstrac- of synchronization and communication
ished, and Kernel D will not execute until
Kernel B and Kernel C have finished. Sx
The ability to construct arbitrary task
graphs is a powerful way of constructing task- work group (Wx , Wy)
parallel applications. The OpenCL runtime
has the freedom to execute the task graph in sx = 0 sx = Sx - 1
parallel, as long as it respects the dependen-
Gx sy = 0 sy = 0
cies encoded in the task graph. Task graphs work item work item
are general enough to represent the kinds (WxSx + sx , (WxSx = sx ,
of parallelism useful across the spectrum of WySy + sy) WySy + sy)
hardware architectures, from CPUs to GPUs.
Besides the task parallel constructs pro- Gy Sy
vided in OpenCL, which allow synchroni- sx = 0 s x = Sx - 1
zation and communication between kernels, s y = Sy - 1 sy = Sy - 1
OpenCL supports local barrier synchroniza- work item work item
tions within a work-group. This mechanism
(WxSx = sx , (WxSx + sx ,
allows work-items to coordinate and share
WySy + sy) WySy + sy)
data in the local memory space using only
very lightweight and efficient barriers. Work-
items in different work-groups should never FIGURE 2 EQ Executing Kernels - Work-Groups and Work-Items.
S
WWW. AMD . C OM/ EMBED D ED / C ATALOG 09
10. AMD EMBEDDED APU SOLUTIONS GUIDE
provided by OpenCL is a good fit for many that allow and sometimes require some level ages, are allocated per context; but changes
of today’s parallel architectures, while still of tuning to achieve optimal performance. made by one device are only guaranteed to
providing developers the ability to write ef- be visible by another device at well-defined
ficient code, even for parallel computations voidtrad_mul(int n, synchronization points. For this, OpenCL
with non-trivial synchronization and com- const float *a, provides events, with the ability to synchro-
const float *b,
munication patterns. float *c) nize on a given event to enforce the correct
The work-items may only commu- { order of execution.
nicate and synchronize locally, within a int i; Most OpenCL programs follow the
work-group, via a barrier mechanism. This for (i=0; in; i++) same pattern. Given a specific platform,
c[i] = a[i] * b[i]; }
provides scalability, traditionally the bane select a device or devices to create a con-
of parallel programming. Because com- text, allocate memory, create device-spe-
munication and synchronization at the FIGURE 3 xample of traditional loop
E cific command queues, and perform data
finest granularity is restricted in scope, the (scalar). transfers and computations. Generally, the
OpenCL runtime has great freedom in how platform is the gateway to accessing specific
work-items are scheduled and executed. kernel void devices. Given these devices and a corre-
dp_mul (global const float *a, sponding context, the application is inde-
A Typical OpenCL Kernel global const float *b, pendent of the platform. Given a context,
global float *c)
As already discussed, the core pro- { the application can:
gramming goal of OpenCL is to provide int id = get_global_id (0); • reate one or more command
C
programmers with a data-parallel execu- queues.
tion model. In practical terms this means c[id] = a[id] * b[id]; • reate programs to run on one or
C
that programmers can define a set of in- } // execute over “n” work-items more associated devices.
structions that will be executed on a large • reate kernels within those pro-
C
number of data items at the same time. grams.
The most obvious example is to replace FIGURE 4 ata parallel OpenCL.
D • A llocate memory buffers or images,
loops with functions (kernels) executing at either on the host or on the device(s)
each point in a problem domain. For example, OpenCL memory object types - Memory can be copied between the
Referring to Figures 3 and 4, let’s and sizes can impact performance. In most host and device.
say you wanted to process a 1024 x 1024 cases key parameters can be gathered from • Write data to the device.
image (your global problem dimension). the OpenCL runtime to tune the operation • Submit the kernel (with appropriate
You would initiate one kernel execution of the application. In addition, each ven- arguments) to the command queue
per pixel (1024 x 1024 = 1,048,576 kernel dor may choose to provide extensions that for execution.
executions). provide for more options to tune your appli- • Read data back to the host from the
Figure 3 shows sample scalar code cation. In most cases these are parameters device.
for processing an image. If you were writ- used with the OpenCL API and should not
ing very simple C code you would write a require extensive rewrite of the algorithms. The relationship between context(s),
simple for loop, and in this for loop you device(s), buffer(s), program(s), kernel(s),
would go from 1 to N and then perform Building an OpenCL and command queue(s) is best seen by
your computation. Application looking at sample code.
An alternate way to do this would be An OpenCL application is built by first
in a data parallel fashion (Figure 4), and in querying the runtime to determine which Summary
this case you’re going to logically read one platforms are present. There can be any OpenCL affords developers an elegant,
element in parallel from all of a (*a), multi- number of different OpenCL implementa- non-proprietary programming platform to
ply it from an element of b in parallel and tions installed on a single system. The de- accelerate parallel processing performance
write it to your output. You’ll notice that in sired OpenCL platform can be selected by for compute-intensive applications. With
Figure 4 there is no for loop—you get an matching the platform vendor string to the the ability to develop and maintain a sin-
id value, read a value from a, multiply by desired vendor name, such as “Advanced Mi- gle source code base that can be applied to
a value from b and then write the output. cro Devices, Inc.” The next step is to create a CPUs, GPUs and APUs with equal ease,
As stated above, a properly written context. An OpenCL context has associated developers can achieve significant program-
OpenCL application will operate correctly with it a number of compute devices (for ming efficiency gains, reduce development
on a wide range of systems. While this is example, CPU or GPU devices). Within a costs, and speed their time to market.
true it should be noted that each system and context, OpenCL guarantees a relaxed con-
compute device available to OpenCL may sistency between these devices. This means Advanced Micro Devices
have different resources and characteristics that memory objects, such as buffers or im- www.amd.com
10 J ANU ARY 201 3
11. AMD EMBEDDED APU SOLUTIONS GUIDE
AMD EMBEDDED G-SERIES PLATFORM
The world’s first combination of low-power
CPU and advanced GPU integrated into a
single embedded device.
The AMD Embedded G-Series processor is the world’s first integrated circuit to combine a low-
power CPU and a discrete-level GPU into a single embedded Accelerated Processing Unit (APU).
This unprecedented level of graphics integration builds a new foundation for high-performance
multimedia content delivery in a small form factor and power-efficient platform for a broad range of
embedded designs. Based on a new power-optimized core, the AMD Embedded G-Series platform
delivers levels of performance in a compact BGA package that is ideal for low-power designs
in embedded applications such as Digital Signage, x86 Set-Top-Box (xSTB), IP-TV, Thin Client,
Information Kiosk, Point-of-Sale, Casino Gaming, Media Servers, and Industrial Control Systems.
x86 Core Clock
Speed Base/
Model Boost L2 Cache GPU DDR3 Speed x86 Cores UVD1 3 Display Ouptuts Max TDP
AMD Embedded G-Series APU – FT1 413-pin
T56N 1.65GHz AMD Radeon™ HD 6320 2 18W
DDR3-1333
T56E 1.65GHz AMD Radeon™ HD 6250 2
Dual independent
Unbuffered 18W
display controllers
AMD Radeon™ HD 6310 1 2 active outputs 18W
T52R 1.5GHz
AMD Radeon™ HD 6250 from: 18W
DDR3-1066
2
T48E 1.4GHz 1xVGA
Unbuffered
Yes 2x single link DVI
T44R 1.2GHz AMD Radeon™ HD 6250 DDR3-10663 1 9W
T40N 1.0GHz3 AMD Radeon™ HD 6290 2 1X single link LVDS 9W
Unbuffered
512KB 2x DisplayPort 1.1a
DDR3-1066
T48n 1.4GHz AMD Radeon™ HD 6310 2 18W
Unbuffered 1x HDMI
T40E 1.0GHz AMD Radeon™ HD 6250 DDR3-10663 2 1X DVO 6.4W
T40R 1.0GHz AMD Radeon™ HD 6250 Unbuffered 1 5.5W
T16R 615mHz AMD Radeon™ HD 6250 LVDDR3-1066 1 4.5W
T48L 1.4GHz N/A DDR3-1066 2 18W
T30L 1.4GHz N/A Unbuffered 1 18W
N/A N/A
DDR3-10663
T24L 1.0GHz N/A 1 5W
Unbuffered
1. nified Video Decoder (UVD 3) for hardware decode of high-definition video.
U
2. ow voltage (1.35V) DDR3 is assumed for the 9W TDP processors. The use of 1.5V DDR3 will incur a power adder.
L
3. odels enabled by AMD Turbo CORE technology, up to 10% clock speed increase is planned. For CPU boost, only one processor core of a dual-
M
core has boost enabled.
Note: Always refer to the processor/chipset data sheets for technical specifications. Feature information in this document is provided for reference only.
WWW. AMD . C OM/ EMBED D ED / C ATALOG 11
12. AMD EMBEDDED APU SOLUTIONS GUIDE
Fanless Industrial Mini-ITX
Computer Motherboard
TKS-E21-HD07 EMB-A50M
• AMD Embedded G-Series Platform • AMD Embedded G-Series Platform
• AMD A55E Controller Hub • AMD A50M Controller Hub
• Includes the Aaeon EPIC-HD07 board • DDR3 800/1066 DIMM x 2, Unbuffered
• Realtek Gigabit Ethernet x 2, Mini Card x 1, Memory, Max. 8 GB
mSATA x 1 • DVI-I, HDMI
• Low Profile 1U Height and Weight Less Than • SATA 6.0Gb/s x 5
1.5Kg • USB3.0 x 2, USB2.0 x 10, COM x 4
• Easy Reconfiguration System • PCI Express 2.0 [x4] x 1, Mini PCIe (Half size)
• Fanless With Excellent Thermal Solution x 1, 8-bit Digital I/O
• Anti-Vibration Up to 1G (HDD, Random) • Gaming, Communications, Digital Signage,
Point of Sale
Aaeon Aaeon
PHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.com
FAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com
Gaming System Digital Signage
ACE-S7400 Player
• AMD Embedded G-T56N APU ARK-DS306
• with AMD Radeon™ HD 6320 Graphics
• AMD Embedded G-T40N APU
• Digital inputs and digital outputs with Micro fit
3.0 connector • with AMD Radeon™ HD 6290 Graphics AMD
A50M Controller Hub
• 2 x ccTalk
• Dual display: HDMI, VGA
• 2MB Battery back up SRAM
• Built-in Mini PCIe slot
• Timer Meter pulse generator, counters
• Supports 2 GLAN, HD audio, I/O interface
• Intrusion Logger
with 2 x COM, 2 x USB
• Storage:2 x CF connectors
• 1 x 2.5” SATA HDD drive bay, 1 x CFast slot
• Supports VESA mounting (Optional)
aCrosser Technology Limited Advantech
PHONE (866) 401-9463 EMAIL juni_hung@acrosser.com.tw PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com
FAX (714) 903 5629 WEB www.acrosser.com FAX (949) 789-7179 WEB www.advantech.com/embcore
MI/O Extension SBC Mini-ITX Single
MIO-5270 Board Computer
• AMD Embedded G-Series Platform AIMB-223
• AMD A50M Controller Hub
• AMD Embedded G-Series Platform
• 1 x DDR3 memory support up to 4 GB
• One 204-pin SODIMM up to
• Multiple display: 48-bit LVDS, HDMI, VGA
2 GB DDR3 1333 MHz SDRAM
• 2 GbE support, HD Audio, Rich I/O interface
• Supports VGA/LVDS/HDMI
with 4 COM, 2 SATA, 6 USB and GPIO
• Dual LANs, 6 COM, Mini PCIe, and Cfast
• Supports embedded software APIs and Utilities
• Supports embedded software APIs and Utilities
Advantech Advantech
PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com
FAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore
12 J ANU ARY 201 3
13. AMD EMBEDDED APU SOLUTIONS GUIDE
Semi-Industrial Mini- ETX3.0 CPU Module
ITX Motherboard SOM-4466
SIMB-M22 • AMD Embedded G-Series Platform
• AMD A55E Controller Hub
• AMD Embedded G-Series Platform
• Supports DX11, OGL3.2, and H.264/AVC,
• AMD A55E Controller Hub VC-1 HW Acceleration
• Fanless design • DDR3-1066 up to 4GB
• dual display: HDMI, VGA, 18-bit single • PCI, ISA, SMBus, I2C
channel LVDS, eDP (optional)
• 10/100 LAN, SATA, mSATA Socket, IDE,
USB2.0
• Information Appliance, Communications,
Industrial Controllers, Medical, Networking,
Digital Signage
Advantech Advantech
PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com
FAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore
Industrial Computer Mini-ITX
System Motherboard
DPX-E120 GMB-A55E
• AMD Embedded G-Series Platform • AMD Embedded G-Series Platform
• AMD A55E Controller Hub • AMD A55E Controller Hub
• DIMM, 8GB, DDR3, 2x, 1333/1066/800 • Supports HDMI 1.3a, DirectX 11, OpenGL4.0,
• 4x, TypeA, USB 2.0; 2x, Header, USB 2.0 dedicated hardware (UVD3.0) for H.264, VC-
• Comprehensive gaming features 1, MPEG 2, DivX decode
• Low power and compact design • Dual display functions: HDMI, VGA, 18-bit
single channel LVDS
• Easy integration for gaming applications
• Dual Realtek RTL8111DL Gigabit LAN
• Dual monitor support
• 1 PCIe x4 Gen.2, 1 Mini PCIe
• 5 SATA 3.0 (6Gb/s)
• 4 COM (2Powered COM), 8 USB2.0
Advantech Advantech
PHONE 886-2-2792-7818 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.com
FAX 886-2-2792-7337 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore
PC/104 Single Board Mini-ITX
Computer Motherboard
PCM-3356 MB-7210
• AMD Embedded G-Series Platform • AMD Embedded G-Series Platform
• AMD A55E Controller Hub • SODIMM, 8GB, DDR3, 2x
• Ultra low power • Supports Direct X 11 and Open GL 4
• Supports up to 4 GB DDR3 SODIMM or 1 GB • Supports full bitstream decoding of H.264/
DDR3 on-board memory MPEG-4, AVC, VC-1, DivX, Xvid, MPEG2, as
• 18-bit LVDS and VGA well as Blu-ray 3D
• 3 COM ports, 4 USB 2.0 ports, dual GbE and • Support one PCIe x16 and one PCI slot
audio codec
• Support 6 x COM, GbE, 1 x DVI and CFast
• Support extended temp: -40 ~ 85C, HALT applied
• Medical, Digital Signage, Point of Sale
• Expansion: PC/104 and miniPCIe
• Communications, Industrial Controllers, Point
of Sale
Advantech AEWIN Technologies Co., Ltd.
PHONE 886-2-2792-7818 #1508 EMAIL austin.lo@advantech.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw
FAX 886-2-2792-7337 WEB www.advantech.com/embcore FAX +886-2-8692 6655 WEB www.aewin.com.tw
WWW. AMD . C OM/ EMBED D ED / C ATALOG 13
14. AMD EMBEDDED APU SOLUTIONS GUIDE
PC/104+ module Networking
PM-6101 Appliance
• AMD Embedded G-Series Platform SCB-6979
• Supports ulta-low-power AMD G-T16R
processors • AMD Embedded G-Series Platform
• Dual 10/100/1000Mbps ethernet • One SO-DIMM up to 4GB DDR3 1066MHz
SDRAM
• Support One SATAIII, Two serial ports and
four USB 2.0 ports • Max 6 GbE ports via PCI-e by1
• One DDR3 SO-DIMM socket supports up to • Robust I/O with USB 2.0; 2.5” SATA HDD
4GB memory bay, CF socket, Minicard slot and Console port
• Digital Signage • Built with long-life AMD Embedded
components
• RoHS compliant
AEWIN Technologies Co., Ltd. AEWIN Technologies Co., Ltd.
PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw
FAX +886-2-8692 6655 WEB www.aewin.com.tw FAX +886-2-8692 6655 WEB www.aewin.com.tw
Custom Gaming Gaming System
Motherboard SGA-2200
GA-2200 • AMD Embedded G-T56N APU
• with AMD Radeon™ HD 6320 Graphics
• AMD Embedded G-T56N APU
• Also supports AMD G-T44R with AMD
• with AMD Radeon™ HD 6320 Graphics Radeon™ HD 6250
• Also supports AMD G-T44R with AMD • 10 x COM, 2nd RTC and NVRAM
Radeon™ HD 6250
• 10 x COM, 2nd RTC and NVRAM
AEWIN Technologies Co., Ltd. AEWIN Technologies Co., Ltd.
PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw
FAX +886-2-8692 6655 WEB www.aewin.com.tw FAX +886-2-8692 6655 WEB www.aewin.com.tw
Mini-ITX Industrial Tablet
Motherboard WA-10
AMDY-7002 • AMD Embedded G-T56N APU
• with AMD Radeon™ HD 6320 Graphics
• AMD Embedded G-Series Platform
• Memory: SODIMM, 2GB, DDR3
• AMD A55E Controller Hub
• Support 1 x SODIMM DDR3 up to 4GB
• 1 x PCIe, 1 x PCI and half size Mini-PCIe
Socket
• CF Socket, CFast Socket and 5 x SATA
Connector
• Gaming, Industrial Controllers, Medical,
Networking, Digital Signage, Point of Sale
American Portwell Technology, Inc. Amtek System Company
PHONE (510) 403-3399 EMAIL info@portwell.com PHONE +886-2-26492212#133 EMAIL ch.chang@amtek.com.tw
FAX (510) 403-3184 WEB www.portwell.com FAX +886-2-26492363 WEB www.amtek.com
14 J ANU ARY 201 3
15. AMD EMBEDDED APU SOLUTIONS GUIDE
No one else provides a similar solution in
terms of performance per watt” Butrash-
AMD APUs Soar
vily said. “One additional advantage of
the AMD G-Series APU is that they are
sold as embedded solutions, meaning a
in Real-Time
good fit for defense solutions that require
long-term availability and durability in
harsh environments.”
Image Processing
Real-Time Threat Detection
“[The defense contractor] needed
semiautomatic systems that could help
W
aid pilots in making decisions” ex-
plained Butrashvily. “The way to do
that is to take images from both aerial
hen the Company quirements. The resulting solution built by and ground systems to stabilize video
for Advanced Su- CASS can serve as a new-generation DSP streams, enabling the detection of im-
percomputing Solu- for sensor and computer-vision platforms, mediate threats.” They required a sys-
tions (CASS) was leveraging a combination of parallel and
approached by an serial processing on a heterogeneous sys-
Israeli defense contractor to create a new tem architecture. Image Registration
field video image registration solution, Image registration is the process of
it was their first venture in working with The challenge transforming a set of sequential im-
an AMD Embedded accelerated processor “Lots of industries use graphics ages (video stream acquired from a
unit (APU). It won’t be the last. processing units (GPUs) for projects sensor) into a similar coordinate sys-
The defense contractor’s executives that include video,” said Mordechai tem, creating a smoother visual flow. In
had come to CASS with a problem: they “Moti” Butrashvily, CASS chief execu- real-life, physical conditions or normal
needed high-quality, smooth, stable tive officer and chief technology offi- movement affect the images a sensor
real-time computer vision images deliv- gathers and may cause vibrations.
cer. CASS has been building solutions
ered from ground and aero systems to around AMD GPUs for years and knew Viewing a continuous frame-set from
back-end systems. The defense contrac- that for applications with a high-degree an image sensor generally looks shaky
tor’s digital signal processing (DSP) and of parallelism—like image processing or unbalanced, as the sensor is often
field-programmable gate array (FPGA) —programmable GPUs offer critical mobile or not stabilized. Image regis-
solutions were not capable of develop- performance advantages. “But we knew tration fixes this problem by smooth-
ing the high-speed, higher-resolution a stand-alone GPU just couldn’t offer ing the output video stream. Applica-
images that could more accurately track a solution that would meet the power tions for image registration vary from
defense to medical imaging and more.
motion—tracking missiles as they are consumption and size constraints of the
carried on a moving vehicle or detect- defense contractor.” Typical registration process stages in-
ing a person climbing into a bunker, for Butrashvily and his team looked at clude: identifying movement vectors
example. a variety of possible solutions, and real- between two relative images, per-
CASS was asked to create a compact ized their options were rather limited. forming alignment, and applying fur-
system that could process a frame-by-frame Few manufacturers can offer the per- ther correction/enhancement filters to
720p video input stream at 120 frames per formance needed without compromis- improve image and stream quality.
second. While the defense contractor im- ing on size or power consumption. The In defense, sensor-based components
posed constraints around maximum size CASS team found their research kept use registration from ground to aerial
and maximum power consumption, CASS pointing them to the AMD Embedded systems with different applications.
was otherwise unlimited in how it could G-Series APU, which combines the par- Adding to its complexity, defense
design the solution. allel processing capabilities of a GPU applications require very high per-
So, the company got creative. By with the serial processing capabilities formance computations (high resolu-
making the right algorithmic adjustments of a CPU in a small footprint and low tions and frame rates) and have limited
and choosing an appropriate architecture, power solution. space for hardware, dictating a small
system size. This requires a solution
the resulting application runs at real-time “We evaluated several solutions, and
with good heat dissipation and ability
speeds where other competitive solutions nothing else compared to the APU for to consistently operate at low power.
(DSP and FPGA) failed to meet the re- size, power consumption and capabilities.
WWW. AMD . C OM/ EMBED D ED / C ATALOG 15
16. AMD EMBEDDED APU SOLUTIONS GUIDE
tem that was compact and low power faster-than-real-time processing, CASS livered very well for the selected appli-
enough to be used in unmanned aerial leveraged parallel processing for the in- cation and environment,” Butrashvily
and ground vehicle (UAV and UGV) tensive dense matrix operations, includ- explained. “This solution provides un-
surround-vision systems for continuous ing GEMM (matrix multiplication), matched performance when you take
monitoring of objects and threats any- GEMV (matrix-vector multiplication) into account the power consumption
where in the world. and GESV (matrix Inverse), achieving and size requirements.”
The AMD G-T56N APU met the up to 130 times the performance of run- The performance achieved was im-
power requirements of the system, and ning those basic building blocks with pressive; showing nearly 150 frames per
could deliver the high performance nec- the AMD BLAS (basic linear algebra second (FPS) peak at HD resolution of
essary to meet the image registration subprograms) libraries on the processor 1280x720 with 16-bits per pixel, mea-
goals. Since the processor had to employ alone. To verify the numeric stability, sured from input to output of corrected
further image filtering to enhance results, which is especially important in long- images. With the AMD Embedded G-
CASS needed to ensure there was enough running, mission-critical operations, Series APU, CASS was able to achieve
performance overhead to run additional the arithmetic results of the APU were the following:
algorithms while maintaining real-time compared to the x86 CPU following • eal-time performance
R
operation. CASS selected OpenCL™ to IEEE 754 standard. CASS found high • rocessing of 120 frames per second
P
implement the accelerated algorithm correspondence and accuracy, assuring sustained
building blocks. that the system achieves great numeri- • D sensor input resolution of 720p
H
In the prototype the APU served as cal stability. (1280x720)
a digital signal and image processor, and • 0 to 30 times the performance of
2
was connected to a sensor. “We tested the performing the entire algorithm on a
APU to see if we could achieve the real- The Results traditional CPU
time performance the sensors require,” Within two months, CASS com-
Butrashvily explained. “There was no op- pleted the prototype development, in- The overall algorithm processing flow
tion for delays: the signal had to be pro- cluding software optimization. The so- was complex, incorporating additional
cessed at the time it was being received lution was developed to support Linux, filters for image enhancement, therefore
with minimum latency.” Windows and their embedded variants. runtime speedup was summarized by 20
The entire algorithm was imple- The algorithmic processing engine to 30 times.
mented in OpenCL, with the APU serv- was also integrated with OpenGL, de- For its next steps, CASS is working
ing as the host manager/coordinator and livering a live display of the processed on support for hard real-time operating
frame grabber. With the goal to achieve results. “The AMD G-T56N APU de- systems, hardware commercialization and
board design to match sensor dimensional
constraints, and support for next-gener-
ation APUs for even higher performance
AMD Embedded Solutions... and resolutions.
Coming Soon to an Event Near You! Moreover, because the job was not
Check out the latest AMD Embedded products and our partner solutions at proprietary to the defense company, CASS
one of these upcoming events. If you’re interested in scheduling a meeting is researching additional applications of its
with an AMD representative at any of these shows, please contact us at new APU-based image registration tech-
embedded@amd.com. nology. Being an important core compo-
To learn more go to: www.amd.com/embedded nent in many image-processing systems,
UPCOMING SEMINAR DATES: registration has relevance for other applica-
1/13-16 NRF (New York, NY) RTECC Roadshow: tions in defense, medical imaging and ma-
1/30 Taiwan Embedded Forum (Taiwan) 1/24 Santa Clara, CA
chine vision.
2/5-7 ICE (London, UK) 3/19 Dallas, TX
2/26-28 Digital Signage Expo (Las Vegas, NV) 3/21 Austin, TX
2/26-28 Embedded World (Nuremberg, Germany) 4/16 Washington, DC
4/22-25 Design West (San Jose, CA) 4/18 Hanover, MD
5/8-10 ESEC (Tokyo, Japan) 5/7 Nashua, NH
5/9 Boston, MA
6/18 Denver, CO
6/20 Salt Lake City, UT
16 J ANU ARY 201 3