Tuesday 19 June 2012

NVIDIA Talking about Kepler at ISC

I am at the International Supercomputing conference this year and have just been sat in on an NVIDIA satellite event "Inside Kepler" presented by Thomas Bradley.

The talk introduced some of the key features of Kepler and explained some extra things that I hadn't previously picked up on in the NVIDIA publicity nor the whitepaper at first glance.

Initially outlined were the SM->SMX change down to the core level and how the effect of halving the clock has helped performance/watt. All fairly standard stuff.

Secondly, the dynamic parallelism functionality was introduced allowing for 32 levels of recursion and work enqueuing from the device. This is the coolest feature I think with Kepler though I doubt it will make its way into OpenCL any time soon (since I am clearly an OpenCL fanboy). In fact throughout the entirety of the talk there was not a single mention of OpenCL which was kind of sad. Though I guess it is good that there is some competition in that space since if AMD wants to push into HPC they will be driving the development of OpenCL (or maybe HSA?) to match that of CUDA.

The hyperqueueing allowing for streams of commands (pretty much the same as OpenCL command queues) was pretty interesting too. Allowing multiple MPI processes to operate on a single GPU.

Finally and most excitingly was the newest instructions available for shuffling and also the improvements for the performance of the atomic functions. The shuffling allows for intra-warp communication between threads. This is very exciting since, perhaps using the "butterfly shuffle" form, you can essentially do halo exchange between threads in a warp. Obviously, you would still have to perform global synchronisation between iterations of your lattice Boltzmann simulation (for example) if it operates across warps. However, you can achieve this pretty easily now using dynamic parallelism!


Finally, it will be interesting to see if the atomic operations (global reduction instructions) performance enhancements (up to 5x for the slowest) have been extended into OpenCL i.e. is it a new set of instructions or is it a lower level enhancement for the same old instruction? It will be pretty clear fairly instantly which is the case as soon as I get my hands on Kepler.

No comments:

Post a Comment

Leave a comment!

Can we just autofill city and state? Please!

Coming from a country that is not the US where zip/postal codes are hyper specific, it always drives me nuts when you are filling in a form ...