AN INTRODUCTION

TO PARALLEL AND

GPU-ACCELERATED

COMPUTING

Chaoran Huang
Learning Group

Basic assumption

We want to compute fast.

If not → get lost.

Modern computers

The Von Neumann architecture(CISC)

The Harvard architecture(RISC)

Parallel Computing

use of multiple processors or computers working together on a common task

Data parallelism

The program models a physical object, which gets partitioned and divided over the processors.
Each processor performs the same task on different data

e.g. Matrix operations

Task parallelism

There is a list of tasks and processors cycle through the list until it is exhausted.
Each processor performs a different task on the same data

e.g. parameter tuning when training machine learning models

Nvidia Titan X (Pascal)

3548 CUDA cores
1417~1531 MHz
12 GB GDDR5X Memory @ 10 Gbps
384-bit memory interface width runs 480GBps
250 W
12,000 million transistors
11 TeraFlops @ FP32

The End

References

https://computing.llnl.gov/tutorials/parallel_comp/
http://nci.org.au/user-support/training/
https://www.tacc.utexas.edu/documents/13601/114120/2-Intro_Parallel_Comp_sept_2011.pdf
https://www.irisa.fr/alf/downloads/collange/cours/gpuprog_ufmg/gpuprog_1.pdf
http://wouterkoolen.info/Talks/ComputerArchitecture.pdf
http://www.edgefxkits.com/blog/difference-between-von-neumann-and-harvard-architecture/
https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-102