- ホーム
- > 洋書
- > 英文書
- > Computer / General
Full Description
Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects. Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includesUnderstanding OpenCL's architecture, concepts, terminology, goals, and rationale Programming with OpenCL C and the runtime API Using buffers, sub-buffers, images, samplers, and events Sharing and synchronizing data with OpenGL and Microsoft's Direct3D Simplifying development with the C++ Wrapper API Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more Source code for this book is available at https://code.google.com/p/opencl-book-samples/
Contents
Figures xv Tables xxiListings xxvForeword xxixPreface xxxiiiAcknowledgments xliAbout the Authors xliiiPart I: The OpenCL 1.1 Language and API 1Chapter 1: An Introduction to OpenCL 3What Is OpenCL, or . . . Why You Need This Book 3Our Many-Core Future: Heterogeneous Platforms 4Software in a Many-Core World 7Conceptual Foundations of OpenCL 11OpenCL and Graphics 29The Contents of OpenCL 30The Embedded Profile 35Learning OpenCL 36Chapter 2: HelloWorld: An OpenCL Example 39Building the Examples 40HelloWorld Example 45Checking for Errors in OpenCL 57Chapter 3: Platforms, Contexts, and Devices 63OpenCL Platforms 63OpenCL Devices 68OpenCL Contexts 83Chapter 4: Programming with OpenCL C 97Writing a Data-Parallel Kernel Using OpenCL C 97Scalar Data Types 99Vector Data Types 102Other Data Types 108Derived Types 109Implicit Type Conversions 110Explicit Casts 116Explicit Conversions 117Reinterpreting Data as Another Type 121Vector Operators 123Qualifiers 133Keywords 141Preprocessor Directives and Macros 141Restrictions 146Chapter 5: OpenCL C Built-In Functions 149Work-Item Functions 150Math Functions 153Integer Functions 168Common Functions 172Geometric Functions 175Relational Functions 175Vector Data Load and Store Functions 181Synchronization Functions 190Async Copy and Prefetch Functions 191Atomic Functions 195Miscellaneous Vector Functions 199Image Read and Write Functions 201Chapter 6: Programs and Kernels 217Program and Kernel Object Overview 217Program Objects 218Kernel Objects 237Chapter 7: Buffers and Sub-Buffers 247Memory Objects, Buffers, and Sub-Buffers Overview 247Creating Buffers and Sub-Buffers 249Querying Buffers and Sub-Buffers 257Reading, Writing, and Copying Buffers and Sub-Buffers 259Mapping Buffers and Sub-Buffers 276Chapter 8: Images and Samplers 281Image and Sampler Object Overview 281Creating Image Objects 283Creating Sampler Objects 292OpenCL C Functions for Working with Images 295Transferring Image Objects 299Chapter 9: Events 309Commands, Queues, and Events Overview 309Events and Command-Queues 311Event Objects 317Generating Events on the Host 321Events Impacting Execution on the Host 322Using Events for Profiling 327Events Inside Kernels 332Events from Outside OpenCL 333Chapter 10: Interoperability with OpenGL 335OpenCL/OpenGL Sharing Overview 335Querying for the OpenGL Sharing Extension 336Initializing an OpenCL Context for OpenGL Interoperability 338Creating OpenCL Buffers from OpenGL Buffers 339Creating OpenCL Image Objects from OpenGL Textures 344Querying Information about OpenGL Objects 347Synchronization between OpenGL and OpenCL 348Chapter 11: Interoperability with Direct3D 353Direct3D/OpenCL Sharing Overview 353Initializing an OpenCL Context for Direct3D Interoperability 354Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357Acquiring and Releasing Direct3D Objects in OpenCL 361Processing a Direct3D Texture in OpenCL 363Processing D3D Vertex Data in OpenCL 366Chapter 12: C++ Wrapper API 369C++ Wrapper API Overview 369C++ Wrapper API Exceptions 371Vector Add Example Using the C++ Wrapper API 374Chapter 13: OpenCL Embedded Profile 383OpenCL Profile Overview 38364-Bit Integers 385Images 386Built-In Atomic Functions 387Mandated Minimum Single-Precision Floating-Point Capabilities 387Determining the Profile Supported by a Device in an OpenCL C Program 390Part II: OpenCL 1.1 Case Studies 391Chapter 14: Image Histogram 393Computing an Image Histogram 393Parallelizing the Image Histogram 395Additional Optimizations to the Parallel Image Histogram 400Computing Histograms with Half-Float or Float Values for Each Channel 403Chapter 15: Sobel Edge Detection Filter 407What Is a Sobel Edge Detection Filter? 407Implementing the Sobel Filter as an OpenCL Kernel 407Chapter 16: Parallelizing Dijkstra's Single-Source Shortest-Path Graph Algorithm 411Graph Data Structures 412Kernels 414Leveraging Multiple Compute Devices 417Chapter 17: Cloth Simulation in the Bullet Physics SDK 425An Introduction to Cloth Simulation 425Simulating the Soft Body 429Executing the Simulation on the CPU 431Changes Necessary for Basic GPU Execution 432Two-Layered Batching 438Optimizing for SIMD Computation and Local Memory 441Adding OpenGL Interoperation 446Chapter 18: Simulating the Ocean with Fast Fourier Transform 449An Overview of the Ocean Application 450Phillips Spectrum Generation 453An OpenCL Discrete Fourier Transform 457A Closer Look at the FFT Kernel 463A Closer Look at the Transpose Kernel 467Chapter 19: Optical Flow 469Optical Flow Problem Overview 469Sub-Pixel Accuracy with Hardware Linear Interpolation 480Application of the Texture Cache 480Using Local Memory 481Early Exit and Hardware Scheduling 483Efficient Visualization with OpenGL Interop 483Performance 484Chapter 20: Using OpenCL with PyOpenCL 487Introducing PyOpenCL 487Running the PyImageFilter2D Example 488PyImageFilter2D Code 488Context and Command-Queue Creation 492Loading to an Image Object 493Creating and Building a Program 494Setting Kernel Arguments and Executing a Kernel 495Reading the Results 496Chapter 21: Matrix Multiplication with OpenCL 499The Basic Matrix Multiplication Algorithm 499A Direct Translation into OpenCL 501Increasing the Amount of Work per Kernel 506Optimizing Memory Movement: Local Memory 509Performance Results and Optimizing the Original CPU Code 511Chapter 22: Sparse Matrix-Vector Multiplication 515Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515Description of This Implementation 518Tiled and Packetized Sparse Matrix Representation 519Header Structure 522Tiled and Packetized Sparse Matrix Design Considerations 523Optional Team Information 524Tested Hardware Devices and Results 524Additional Areas of Optimization 538Appendix: Summary of OpenCL 1.1 541The OpenCL Platform Layer 541The OpenCL Runtime 543Buffer Objects 544Program Objects 546Kernel and Event Objects 547Supported Data Types 550Vector Component Addressing 552Preprocessor Directives and Macros 555Specify Type Attributes 555Math Constants 556Work-Item Built-In Functions 557Integer Built-In Functions 557Common Built-In Functions 559Math Built-In Functions 560Geometric Built-In Functions 563Relational Built-In Functions 564Vector Data Load/Store Functions 567Atomic Functions 568Async Copies and Prefetch Functions 570Synchronization, Explicit Memory Fence 570Miscellaneous Vector Built-In Functions 571Image Read and Write Built-In Functions 572Image Objects 573Image Formats 576Access Qualifiers 576Sampler Objects 576Sampler Declaration Fields 577OpenCL Device Architecture Diagram 577OpenCL/OpenGL Sharing APIs 577OpenCL/Direct3D 10 Sharing APIs 579Index 581