Just a short one for once to say, that I couldn't agree more with this and this: the AMD support on Linux really isn't ideal compared to Windows. It's a shame when the software can't keep up with the hardware. Zero copy buffers really should be in Linux! Please! While their Linux support lags behind Windows I will not be buying an APU. At least they do still support Linux unlike some other high profile vendors.
EDIT 31/6/12: Looks like possibly more dark clouds ahead for AMD Linux. Looks like they're only going to be rolling out updates based on game necessity - potentially bad news if your OS doesn't have any games!
All via Phoronix.
Tuesday, 29 May 2012
Customising Emacs
The default emacs, I must admit, is pretty ugly and basic. I use a customisation script at each run - as is common - to customise it to my taste (similar in a way to your familiar .bashrc file). Customisations in emacs are done by way of a socalled '.emacs' file. This is basically a file in your home directory that gets run every time you boot up emacs. I use it to set a nice colour theme, line numbering and some handy shortcuts.
When I refer to the emacs include directory I mean that I have a directory in my home folder called .emacs.d which contains all the *.el files that are elisp scripts for large customisations written by others. The first thing I would recommend doing is add the following to your .emacs file to have emacs look in that directory for extensions:
Next, I quite like having line numbering for the files I'm editing so I can reference around the file quickly.
Put the following file in your emacs include directory then add the following lines to your .emacs:
Next, is to make it look pretty with some colo(u)r themes. I'm not going to rehash the excellent instructions provided for you here but it pretty much follows the standard format. FWIW I use color-theme-calm-forest but there are plenty in there for you to explore.
Showing whitespace is a useful feature to ensure you maintain tidy code. This can be enabled with a simple m-x command to show the whitespace. Futher instruction here.
To switch the cursor between the visible buffers (e.g. in a split screen) with the C-x <direction> command, add the following couple of lines to your .emacs file: it basically remaps the commands to some nice intuitive combinations. Very useful if you don't ever want to have to use the mouse when editing text.
In a similar vain, you can set some compile and recompile hotkeys (to save you typing m-x compile or m-x recompile each time)...
When you first get into emacs you'll notice that the scrolling isn't as nice as it could be, you can use the following couple of lines to soup it up so it's much more smoother and friendlier:
You might also find it useful to configure autocomplete, though I find this can be slightly irritating at times.
Hopefully that's enough to get you going with making emacs more friendly for you. As is usual, you can often find what you want via google which generally takes you to the emacswiki.
When I refer to the emacs include directory I mean that I have a directory in my home folder called .emacs.d which contains all the *.el files that are elisp scripts for large customisations written by others. The first thing I would recommend doing is add the following to your .emacs file to have emacs look in that directory for extensions:
(add-to-list 'load-path "~/emacs.d/")
Next, I quite like having line numbering for the files I'm editing so I can reference around the file quickly.
Put the following file in your emacs include directory then add the following lines to your .emacs:
(require 'linum) (global-linum-mode 1)
Next, is to make it look pretty with some colo(u)r themes. I'm not going to rehash the excellent instructions provided for you here but it pretty much follows the standard format. FWIW I use color-theme-calm-forest but there are plenty in there for you to explore.
Showing whitespace is a useful feature to ensure you maintain tidy code. This can be enabled with a simple m-x command to show the whitespace. Futher instruction here.
To switch the cursor between the visible buffers (e.g. in a split screen) with the C-x <direction> command, add the following couple of lines to your .emacs file: it basically remaps the commands to some nice intuitive combinations. Very useful if you don't ever want to have to use the mouse when editing text.
(global-set-key (kbd "C-x <up>") 'windmove-up) (global-set-key (kbd "C-x <down>") 'windmove-down) (global-set-key (kbd "C-x <right>") 'windmove-right) (global-set-key (kbd "C-x <left>") 'windmove-left)
In a similar vain, you can set some compile and recompile hotkeys (to save you typing m-x compile or m-x recompile each time)...
(global-set-key [(f9)] 'compile) (global-set-key [(f10)] 'recompile)
When you first get into emacs you'll notice that the scrolling isn't as nice as it could be, you can use the following couple of lines to soup it up so it's much more smoother and friendlier:
(defun smooth-scroll (increment) (scroll-up increment) (sit-for 0.05) (scroll-up increment) (sit-for 0.02) (scroll-up increment) (sit-for 0.02) (scroll-up increment) (sit-for 0.05) (scroll-up increment) (sit-for 0.06) (scroll-up increment)) (global-set-key [(mouse-5)] '(lambda () (interactive) (smooth-scroll 1))) (global-set-key [(mouse-4)] '(lambda () (interactive) (smooth-scroll -1)))
So you have line numbers from earlier, now you might want to jump to a line with the familiar ctrl-l command like in eclipse, for example. Simple, remap the goto-line command to c-l.
(global-set-key "\C-l" 'goto-line)
I'm quite a stickler for nice indentation even on my own code. More specifically, I prefer spaces rather than tabs (4 spaces) and I prefer newlines for my curly braces. More can be found on this style, known as allman style. This was used extensively in BSD so you can set that parameter just below along with the others.
(setq-default default-tab-width 4) (setq-default c-basic-offset 4) (setq-default c-default-style "bsd")
And finally, for fun, you can set your frame title :)
(setq frame-title-format "%b - [Your name here]'s Emacs" buffer-file-name)
You might also find it useful to configure autocomplete, though I find this can be slightly irritating at times.
Hopefully that's enough to get you going with making emacs more friendly for you. As is usual, you can often find what you want via google which generally takes you to the emacswiki.
Sunday, 27 May 2012
Visualising MPI Communication With MPE
Visualising with MPE |
A couple of weeks ago I had the task of accelerating some fluid dynamics code using MPI (message passing interface) which uses messages (as you might have guessed) to help parallelise code (versus OpenMP which uses a shared memory model). This was done across four nodes each containing two dual core CPUs (so 8 nodes) and achieved a speedup of 11x which I was fairly pleased with.
The most interesting part of the project was visualising the communication between the nodes. To do this myself would have been nigh-on impossible: dealing with disparate nodes across a large system each with their own timings it would be impossible to synchronise the communications and logging correctly to collect and visualise later.
Fortunately for me there exists the MPE project which supplies performance visualisations for MPI.
Once having configured MPE successfully on the cluster, with MPE is was able to add a couple of lines to my code (see below) to tell it that I wanted a visualisation.
There are two ways of doing this: the first is to have MPE log the images to disk for you to assemble later (using xwd) whilst the second way is to have it spawn a window to show you it in real time. Since this simulation would require around 10,000 simulations and I didn't fancy stitching together so many images, I decided to use the second option: spawn a window via X-forwarding and do a screen cap. Yes, hacky and horrible I know!
Anyway, here is the outline of the code I had to add to get it up and running. As you can see it's all fairly simple and doesn't interfere with any of the pre-existing MPI code. Hats off to ANL for this, it's very useful.
#include "mpe.h" #include "mpe_graphics.h" //...the start of your program here and initialise your MPI env. as per usual MPE_XGraph graph; // Open the MPE graphics window of size 400x400 at 600,-1 int ierr = MPE_Open_graphics( &graph, MPI_COMM_WORLD,NULL, 600, -1, 400, 400, 0 ); // Alternatively you can set the capture file... //ierr = MPE_CaptureFile(graph, "outputimage", 1); if(ierr != MPE_SUCCESS) { printf("Error Launching X world\n"); MPI_Abort(MPI_COMM_WORLD, 1); exit(1); } //...skip out some code here and we enter the main body of our MPI code. MPI_Isend(&cells[ topload ], NX, structTypeTop, (rank+1) % size, 1, MPI_COMM_WORLD, &request); MPI_Isend(&cells[ bottomload ], NX, structTypeBot, (rank+size-1) % size, 2, MPI_COMM_WORLD, &request); // This forces our graph to update after having seen the most recent comms ierr = MPE_Update( graph ); MPI_Recv(&cells [ bottomloadR], NX, structTypeTop, (rank+size-1) %size, 1, MPI_COMM_WORLD, &status); MPI_Recv(&cells [ toploadR ], NX, structTypeBot, (rank+1)%size, 2, MPI_COMM_WORLD, &status); // Again, after receiving let's update our graph ierr = MPE_Update( graph ); // ... continue looping round. When we are finished with MPI let's sync our clocks.. MPE_Log_sync_clocks(); MPE_Finish_log(argv[0]); // And also close our graphics MPE_Close_graphics(&graph);
Friday, 25 May 2012
Getting Started With emacs
Emacs is the text editor that I use day to day for the majority of my tasks, including programming, note taking and scripting.
It's simple, fast and powerful. Here I'm going to attempt to give you a quickstart guide to hopefully give you a leg up over the learning wall that is emacs.
Yes, I accept it's not the most intuitive thing to begin with but bare with it - you'll get used to it and never look back. Gone are the days of moving your hands back and forth to and from the mouse whilst editing text.
Emacs has so many features and addons that an oft quoted phrase is "Emacs is a great operating system, shame about the text editor" - obviously tongue in cheek.
There are other text editors available. But let's not discuss them. Emacs does also have a good little built in tutorial but I thought I would write this one so you can browse at your leisure. It also lets me document the commands and functionality I use most.
Ok so assuming you have successfully installed and setup emacs (not going to cover that - we'll assume you're proficient with synaptic or w/e) the first thing you are greeted with is a fairly drab looking UI:
So this basically shows the main text editing area with a line along the bottom indicating all sorts of modes that you might be in and then followed by the socalled 'minibuffer' - this is an important area (as you'll see imminently). Otherwise, it looks pretty much like your regular text editor right? The main area in the middle is called the main buffer.
So you want to start typing? Easy, click file->visit new file which will then prompt you to choose a name for it - choose one and then click OK. Easy.
Ok maybe that was too easy and a bit cheaty. That has opened opened up a new file of your choosing...but how do you do that without the mouse? When you clicked file you might have notice that next to the visit new file text it said C-x C-f. That's emacs speak for Ctrl-x Ctrl-f. Try it now. When you type Ctrl and x together followed by Ctrl-f together you'll see that the minibuffer is activated (the empty area at the bottom) prompting you to 'find file'. You can use this like a regular command prompt almost and tab-complete your way to the location your pre-existing or non-existent file that you wish to edit/create.
Thats the basics - you can open your file. That's taught you the complete basics. From there all you have to know is that M-x means alt-x (or esc-x on some systems?).
Yes it's a bit fiddly and makes your hands hurt after a while but trust me, it gets easier.
The table below lists some of my most favoured emacs commands that I use on a daily basis. The all follow the same form - either ctrl-x (c-x) or alt-x (m-x) followed by some other modifier.
The all important one to remember is c-g. That cancels whatever you might be in the middle of. e.g. You're in the middle of opening a file and you realise you need to modify the buffer (file that is currently open) so you c-g out of it to get back to the buffer. phew.
Another handy thing to do is to left click with the ctrl modifier pushed down, this will give you a handy little menu to choose between the currently open buffers.
That's the basics...I will most likely post again with details of the customisations I use for smooth scrolling and pretty colour schemes.
It's simple, fast and powerful. Here I'm going to attempt to give you a quickstart guide to hopefully give you a leg up over the learning wall that is emacs.
Yes, I accept it's not the most intuitive thing to begin with but bare with it - you'll get used to it and never look back. Gone are the days of moving your hands back and forth to and from the mouse whilst editing text.
Emacs has so many features and addons that an oft quoted phrase is "Emacs is a great operating system, shame about the text editor" - obviously tongue in cheek.
There are other text editors available. But let's not discuss them. Emacs does also have a good little built in tutorial but I thought I would write this one so you can browse at your leisure. It also lets me document the commands and functionality I use most.
Ok so assuming you have successfully installed and setup emacs (not going to cover that - we'll assume you're proficient with synaptic or w/e) the first thing you are greeted with is a fairly drab looking UI:
So this basically shows the main text editing area with a line along the bottom indicating all sorts of modes that you might be in and then followed by the socalled 'minibuffer' - this is an important area (as you'll see imminently). Otherwise, it looks pretty much like your regular text editor right? The main area in the middle is called the main buffer.
So you want to start typing? Easy, click file->visit new file which will then prompt you to choose a name for it - choose one and then click OK. Easy.
Ok maybe that was too easy and a bit cheaty. That has opened opened up a new file of your choosing...but how do you do that without the mouse? When you clicked file you might have notice that next to the visit new file text it said C-x C-f. That's emacs speak for Ctrl-x Ctrl-f. Try it now. When you type Ctrl and x together followed by Ctrl-f together you'll see that the minibuffer is activated (the empty area at the bottom) prompting you to 'find file'. You can use this like a regular command prompt almost and tab-complete your way to the location your pre-existing or non-existent file that you wish to edit/create.
Thats the basics - you can open your file. That's taught you the complete basics. From there all you have to know is that M-x means alt-x (or esc-x on some systems?).
Yes it's a bit fiddly and makes your hands hurt after a while but trust me, it gets easier.
The table below lists some of my most favoured emacs commands that I use on a daily basis. The all follow the same form - either ctrl-x (c-x) or alt-x (m-x) followed by some other modifier.
The all important one to remember is c-g. That cancels whatever you might be in the middle of. e.g. You're in the middle of opening a file and you realise you need to modify the buffer (file that is currently open) so you c-g out of it to get back to the buffer. phew.
Useful Commands
C-x C-f | Open a buffer |
C-x C-s | Save the file |
C-x C-w | Save the file with a new name |
C-home | Move to the start of the buffer |
C-end | Go to the end of the buffer |
C-space | Begin highlighting |
M-w | Copy |
C-w | Cut |
C-y | Paste |
M-x replace-string | Replace string (find & replace) use the minibuffer to guide you through a string replacement (yes you literally type in replace-string) |
M-x query-replace | Replace string with prompts for each find |
C-s | Search for a given string using the minibuffer |
M-x reverse-region | Reverse the lines in a region |
C-x C-c | Close emacs |
C-x 3 | Split the screen vertically |
C-x 2 | Split the screen horizontally |
C-x 1 | Make this buffer fill the window |
C-x 0 | Close the active window |
C-x C-b | Select a buffer by name |
M-x C++-mode | Change highlighting C++-mode (you can insert your current language, e.g. python-mode). It generally does this automatically depending on the file extension. |
Another handy thing to do is to left click with the ctrl modifier pushed down, this will give you a handy little menu to choose between the currently open buffers.
That's the basics...I will most likely post again with details of the customisations I use for smooth scrolling and pretty colour schemes.
Friday, 18 May 2012
Anki
A piece of software I have been using loads over the past couple of years for studying is Anki. It is absolutely incredible.
Anki is a type of electronic flashcard system that uses spaced repetition to help you learn what's on the sides of the flashcard. After each class I type up my notes into Anki in the form of question and answer (front and back of flashcard). Then, when I get to exams, I don't have to bother leafing through all my notes, all I do is fire up Anki and learn the 150 cards or so that are there. Either on the computer every morning or whilst I'm travelling.
Unlike with regular flashcards, you can actually be much stricter more easily and only pass the card when you only truly know the answer. I use it for Spanish vocab too as well as technical stuff from my course. I would seriously recommend it; my exam scores have gone up over the time I've been using Anki.
Also, when you get round to past papers, you can put all the questions and answers into Anki so that you know them too! Awesome.
You can also embed Latex formulae and images into Anki which is handy. If you have exams coming up, I implore you to try it out.
There are awesome some decks online that are for Anki that help you learn the capitals of the world (for example) as well as vocab, Kanji etc...
Xorg.conf x4 Monitors
As promised, below is my updated xorg.conf. As predicted, it was a bit of a nightmare getting it up and running with lots of trial and error. I followed this guide to using dual output on my integrated graphics from the i7-2600. However I had the added complication that I couldn't use xrandr whilst my nvidia drivers were loaded, so I had to unplug the nvidia card to find out the identity of the outputs for the option lines for the Intel device.
Anyway, it's done now - that's half a day I'm not getting back. Enjoy.
Anyway, it's done now - that's half a day I'm not getting back. Enjoy.
Section "Module" Load "dri" Load "dbe" Load "extmod" Load "type1" Load "freetype" Load "glx" EndSection Section "Monitor" Identifier "Monitor2" VendorName "Unknown" Option "Position" "0 0" ModelName "Samsung SyncMaster" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" Option "PreferredMode" "1920x1080" EndSection Section "Monitor" Identifier "Monitor0" VendorName "Unknown" Option "Position" "-1280 0" ModelName "Dell 197FP" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" Option "PreferredMode" "1280x1024" EndSection Section "Monitor" Identifier "Monitor3" Option "Position" "3840 0" VendorName "Unknown" ModelName "Acer" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" Option "PreferredMode" "1920x1080" EndSection Section "Monitor" Identifier "Monitor1" Option "Position" "1920 0" VendorName "Unknown" ModelName "Dell 196FP" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" Option "PreferredMode" "1280x1024" EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 560" BusID "PCI:1:0:0" Screen 0 Option "NoLogo" "True" EndSection Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 560" BusID "PCI:1:0:0" Screen 1 Option "NoLogo" "True" EndSection Section "Device" Identifier "Device2" Driver "intel" VendorName "Intel Corporation" BoardName "Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller" BusID "PCI:00:02:0" Option "NoLogo" "True" Option "monitor-HMDI1" "Monitor3" Option "monitor-VGA1" "Monitor1" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "TwinView" "0" Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1920x1080" EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection Section "Screen" Identifier "Screen3" Device "Device2" Monitor "Monitor3" DefaultDepth 24 Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1920x1080" #Virtual 3840 2160 EndSubSection EndSection Section "Extensions" Option "Composite" "Enable" EndSection Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" 0 0 Screen 1 "Screen1" LeftOf "Screen0" Screen 2 "Screen2" RightOf "Screen0" Screen 3 "Screen3" RightOf "Screen2" EndSection Section "ServerFlags" Option "Xinerama" "1" EndSection
Tuesday, 15 May 2012
APU clEnqueueWriteBuffers....
AMD APU Block Diagram |
This test is pretty naïve since we should really be using zero-copy memory. However, since I'm a Linux fanboy we're going to have to use AMD's current implementation of the APP SDK which requires explicit clEnqueueWriteBuffers to put data into device visible system memory which then gets piped through the radeon memory bus (among other things). There's an interesting write up of this technology here.
So if we have to do explicit clEnqueueWriteBuffers to get data from OpenCL host side memory into device side global memory, how does the performance of the CPU and GPU weigh up?
Well this chart shows a very basic test: the timing of the clEnqueueWriteBuffer command to execute (CL_TRUE blocking write obviously) for both the CPU and the GPU in an AMD A8-3850 APU to interesting results...
For a small amount of data |
We can see that this trend becomes a lot more marked at 100x as much data - the CPU latency is much much higher than the GPU - presumably the GPU has higher bandwidth and the path the data takes a lot simpler. Bypassing something? Cacheing?
Anyway, this seems to prove a pretty nice point (good fast speed to get data to GPU visible memory) and I look forward to zero-copy objects on Linux ;)
Upgraded to CUDA 5...302.06.03
I recently wrote about how your mileage might vary between the use of mapped and manual buffers...I just updated my kernel to use the latest pre-release NVIDIA driver and reran the tests again to see the effect. Unfortunately, it's not good news...the gap appears to widen! What's going on with mapped memory?
Before...
After....
Before...
After....
Fourth monitor
I just bought a fourth monitor...stay tuned for an update about the nightmare that is writing xorg.conf. Using my discrete GPU and dual head integrated graphics (i7-2600).
Now, if only order tracking actually gave some useful information...
Sunday, 13 May 2012
OpenCL Memory Allocation Strategies
I've been playing with OpenCL quite a lot recently - both on discrete and fused GPU devices - with varying success. OpenCL is an open standard for the programming of heterogenous systems (e.g. CPU + GPU): it defines a host/device model whereby the host builds up data on the device and then enqueues a 'kernel' to do work on that chosen device using the data. It is very similar to NVIDIA's CUDA however it works across many different platforms (using vendor SDKs) and thus allows you to use more than just GPUs. For example, Intel and AMD both have SDKs for their CPUs.
There's quite a lot of uncertainty or ambiguity in OpenCL about the purpose and implementation of some of the functions. For example, the clCreateBuffer function doesn't explicitly define whether the buffer is created on the host or device. I generally take it to mean that the data is created and stored on the host until a clEnqueueWriteBuffer or clEnqueueUnmapMemObject is called. However, ambiguity can arise, especially in shared memory systems such as AMD Fusion (although not under Linux yet ) whereby the fused GPU and CPU share memory and thus completely remove the need for a memory copy. This is known as zero-copy in AMD.
Here I will be testing and benchmarking a little known feature of OpenCL, and that is memory mapping. Specifically, with a discrete GPU. Memory mapping is a feature whereby you create a host side memory buffer, for example with the following command:
There's quite a lot of uncertainty or ambiguity in OpenCL about the purpose and implementation of some of the functions. For example, the clCreateBuffer function doesn't explicitly define whether the buffer is created on the host or device. I generally take it to mean that the data is created and stored on the host until a clEnqueueWriteBuffer or clEnqueueUnmapMemObject is called. However, ambiguity can arise, especially in shared memory systems such as AMD Fusion (although not under Linux yet ) whereby the fused GPU and CPU share memory and thus completely remove the need for a memory copy. This is known as zero-copy in AMD.
Here I will be testing and benchmarking a little known feature of OpenCL, and that is memory mapping. Specifically, with a discrete GPU. Memory mapping is a feature whereby you create a host side memory buffer, for example with the following command:
clCreateBuffer (ctx, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(int)*10000, NULL, &errCode);
which you can then implicitly write to the GPU - letting the implementation handle the underlying working of the read and write for that buffer object.
Assuming we're using the NVIDIA SDK (version 295.40) on my GTX 560 this will create a pinned host side buffer of 40KB. Pinned here means it won't be paged out by the OS and also allows the GPU to use direct memory access (DMA) if it so wishes/requires. To read and write this pinned memory you have to use memory mapping: you map the OpenCL buffer into some regular C pointer than you pass that around freely without having to worry about getting it into OpenCL buffers.
Classically, what we would do now is a clEnqueueWriteBuffer command to send this data over to the GPU. We can see below that using this methodology, the pinned memory gives a serious advantage already over unpinned memory. This enhancement can be seen to be widening at ever increasing sizes. (yes, I know using 'millions of integers' is a weird metric but just times by 4 for megabytes).
OK, so now we know that pinned memory gives a performance advantage. That's not really very exciting since NVIDIA already told us that (page 47).
So what can we do from here? Well as I mentioned earlier, you can map memory to get the GPU to handle the reads and writes across PCIe (although in the NVIDIA guides they don't really talk about this too much).
The good thing about the example above is that you, as the developer, have full control over exactly what is and isn't written/read over the PCIe. When you want to append more data to a device side buffer you just do a clEnqueueWriteBuffer using the requisite offset and data.
Using the fully mapped methodology, things aren't quite so clear and we (I) don't really know how it will perform: what happens when you do a read from global memory on the GPU to memory that is mapped over PCIe? Does it do a sequential read? Does it fetch it in chunks?
Performance of Mapped Memory Objects
Ok so first things first we do our clCreateBuffer as per usual and then map it. Fill it with your data and now, critically, unmap it. (Previously I seem to have gotten away with not unmapping it!). The unmapping ensures consistency for the memory object and enqueues a write across to the device with all the data that has just been unmapped - since as we'll see later it performs just as well as regularly written buffers. The critical point being, there are no clEnqueueWriteBuffers in this process!
I've chosen the following (fairly naïve) scenario whereby you send the data to the device and never care about reading it off; we want some code that will be pretty intensive on the memory accesses to highlight exactly what is going on.
Each kernel runs through a range of 0 -> x memory addresses in two buffers doing a pairwise addition and finally a division by some constant. This will tell us how fast the GPU is at reading memory for ever increasing values of x. It should demonstrate whether any chunking is in effect. For testing we'll just use 100 workitems in a single workgroup.
So the test is: does the kernel run faster using explicit (clEnqueueWriteBuffer) or implicit (clEnqueueUnmapMemObject)? And then, if they perform the same, how does the performance (time) of clEnqueueWriteBuffer compare to that of clEnqueueUnmapMemObject?
The first chart, here, confirms my suspicions: the clEnqueueUnmapMemObject actually simply just writes the data across to the GPU and thus kernels using a single clEnqueueUnmapMemObject or a single clEnqueueWriteBuffer at the beginning perform the same way.
The question now remains, does EnqueueUnmapMemObject perform as well as clEnqueueWriteBuffer? How does the performance compare when we want to modify the data on the device?
Thus...
This gives us some pretty interesting results. It shows that, by manually doing reads and writes from the device you shave off a small yet perceptible amount of time for each iteration. Thus this performance difference must lie in the way that the underlying implementation handles the read and the remapping of the buffer (since we know the unmapping and the write perform identically from earlier on).
Thus...
Map & Unmap or Read & Write?
We have two choices therefore, depending on our use case. Let's imagine we want to write some data, do some computation on it and then read it off and modify it. There are two ways of doing this...- Map -> edit -> unmap -> compute (then rinse and repeat)
- Read -> edit -> write -> compute (rinse and repeat)
This gives us some pretty interesting results. It shows that, by manually doing reads and writes from the device you shave off a small yet perceptible amount of time for each iteration. Thus this performance difference must lie in the way that the underlying implementation handles the read and the remapping of the buffer (since we know the unmapping and the write perform identically from earlier on).
This might just be a quirk of the SDK that will be ironed out in the future, however it does provide some interesting discussion points even for this contrived example. For example, whats the point of mapping and unmapping when you can read and write faster? As mentioned previously, how will these interplay when you start having fused memories? Will you be able to still maintain portability for your code knowing that the mapped method is slower?
This just highlights the importance of performing these little benchmarks for the architecture and setup you have at hand.
Friday, 11 May 2012
GNUplot Snippets
I like to use a lot of GNUplot to make graphs from the command line and more specifically scripts...for example, the script below reads from a csv and emits them in a scatter plot. Pretty straightforward but nicely extensible. If you leave things out (eg. xrange) GNUplot will just decide them for you based on the data set!
Do definitely try GNUplot, I've never looked back! No more point and click excel nightmares.
Do definitely try GNUplot, I've never looked back! No more point and click excel nightmares.
#!/bin/bash # Some code here that generates some pretty csv /usr/bin/gnuplot <<\EOF reset set term png size 500, 270 set key top left set output "graph.png" set datafile separator "," set xrange [0:24000] set xlabel "Integers" set ylabel "Time (s)" plot "test.csv" using 1:2 title "Mapped", "test.csv" using 1:3 title "Unmapped" EOF
Generates the following plot...(obviously depends on your data!)
Merging Two Halves of an AVI
Often I want to join two AVI files together into one long one..usually because it has been split to fit onto a CD or w/e.
The following uses a simple cat to join the files and then you can use mencoder in linux to do a bit of housekeeping to fix it up to the proper avi standards (it needs all sorts of footer data to tell the player how long it is etc.)
The following should do the trick...much easier than having to do the same thing in windows (don't have to download any silly applications)...also possible with ffmpeg but I prefer mencoder :)
The following uses a simple cat to join the files and then you can use mencoder in linux to do a bit of housekeeping to fix it up to the proper avi standards (it needs all sorts of footer data to tell the player how long it is etc.)
The following should do the trick...much easier than having to do the same thing in windows (don't have to download any silly applications)...also possible with ffmpeg but I prefer mencoder :)
cat part1.avi part2.avi > full_tmp.avi mencoder full_tmp.avi -o full.avi -forceidx -ovc copy -oac copy
Thursday, 10 May 2012
Muting from the Command Line in Linux
I often want the sound on my pc to go off after a specific amount of time (usually watching TV in bed) to save it going on all night...this command makes my pc mute after 1800 seconds - the second half might vary if you don't use alsamixer.
sleep 1800 && amixer set PCM 0%
Tuesday, 8 May 2012
Alternating Page Margins in Latex
Recently I had to do a final year project for my Computer Science course. I did it on GPU programming (with OpenCL) which was very fun! Perhaps more on that later...
However, I had to use alternating margins with a very specific set of parameters for A4. Thus, have a look below for my header for latex...
The margins at the binding side could not be less than 40mm and all the others not less than 15mm. Hopefully this should have sorted it out! The numbers are negative because latex includes something like 1inch + the value. Therefore, if you want your margin to be less than 1inch (like I do) you supply a negative number. This page contains some pretty useful guidance...see below for a quick and dirty solution.
However, I had to use alternating margins with a very specific set of parameters for A4. Thus, have a look below for my header for latex...
The margins at the binding side could not be less than 40mm and all the others not less than 15mm. Hopefully this should have sorted it out! The numbers are negative because latex includes something like 1inch + the value. Therefore, if you want your margin to be less than 1inch (like I do) you supply a negative number. This page contains some pretty useful guidance...see below for a quick and dirty solution.
% Tell it to use two sided a4 paper with the article style \documentclass[twoside, a4paper]{article} \setlength{\topmargin}{-1.04cm} \setlength{\oddsidemargin}{1.46cm} \setlength{\evensidemargin}{-1.04cm} \setlength{\textheight}{26.7cm} \setlength{\textwidth}{15.5cm} \setlength{\headheight}{0cm} \setlength{\headsep}{0cm}
Triple Monitors in Linux
So I'm using the latest KDE environment 4.8.2 which I much prefer to gnome and the like. However, once upon a time I had an absolute nightmare configuring X11 for my triple monitor setup.
I have an NVIDIA GTX 560 (because I like to do GPU programming) and I also use the integrated graphics card on my i7. Thus, a headache for X11 configuration. However, after much configuration, installing broken NVIDIA drivers and SDKs (xinerama had a bug which caused it to segfault instantly) I finally came to a solution....hopefully this is useful to someone. (Also this serves as a backup when I inevitably accidently overwrite my previous etc/X11/xorg.conf...
If you have questions, let me know in the comments. This configuration is very specific to my setup (obviously) but it should be relatively clear what to tweak if you need to. For example, the busid can be found from lspci. There is plenty of documentation for this stuff in the forum of frustrated users on ubuntuforums however this might be useful if you're looking for 3 monitors. I specifically wanted to have it such that when I full screened an application it wouldn't span multiple monitors hence the Xinerama.
I'm quite proud of it really :p
I have an NVIDIA GTX 560 (because I like to do GPU programming) and I also use the integrated graphics card on my i7. Thus, a headache for X11 configuration. However, after much configuration, installing broken NVIDIA drivers and SDKs (xinerama had a bug which caused it to segfault instantly) I finally came to a solution....hopefully this is useful to someone. (Also this serves as a backup when I inevitably accidently overwrite my previous etc/X11/xorg.conf...
If you have questions, let me know in the comments. This configuration is very specific to my setup (obviously) but it should be relatively clear what to tweak if you need to. For example, the busid can be found from lspci. There is plenty of documentation for this stuff in the forum of frustrated users on ubuntuforums however this might be useful if you're looking for 3 monitors. I specifically wanted to have it such that when I full screened an application it wouldn't span multiple monitors hence the Xinerama.
I'm quite proud of it really :p
Section "Module" Load "dri" Load "dbe" Load "extmod" Load "type1" Load "freetype" Load "glx" EndSection Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Samsung SyncMaster" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" EndSection Section "Monitor" Identifier "Monitor1" VendorName "Unknown" ModelName "Dell 197FP" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" EndSection Section "Monitor" Identifier "Monitor2" VendorName "Unknown" ModelName "Dell 196FP" HorizSync 30.0 - 81.0 VertRefresh 56.0 - 60.0 Option "DPMS" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "TwinView" "0" Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1920x1080" EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "metamodes" "DFP: nvidia-auto-select +0+0" Option "DisableGLXRootClipping" "True" Option "AllowGLXWithComposite" "True" Option "RenderAccel" "True" Option "AddARGBGLXVisuals" "True" SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection Section "Extensions" Option "Composite" "Enable" EndSection Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" 0 0 Screen 1 "Screen2" LeftOf "Screen0" Screen 2 "Screen1" RightOf "Screen0" EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 560" BusID "PCI:1:0:0" Screen 0 Option "NoLogo" "True" EndSection Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 560" BusID "PCI:1:0:0" Screen 1 Option "NoLogo" "True" EndSection Section "Device" Identifier "Device2" Driver "intel" VendorName "Intel Corporation" BoardName "4 Series Chipset Integrated Graphics Controller" BusID "PCI:00:02:0" Option "NoLogo" "True" EndSection Section "ServerFlags" Option "Xinerama" "1" EndSection
Syncing iPod and Transcoding on Ubuntu
So I am currently running Kubuntu Precise Pangolin and want to sync my ipod. I have 24,000 songs and I want to get them all on my 150GB!
What I want, therefore is to be able to reduce the bitrate of nasty large bitrate songs so I can fit them all in...I like good quality music but I would prefer to have all my music rather than select some (the whole point of me buying an iPod in the first place).
So, like any good programmer I briefly tried the existing software but got bored far too quickly (tried Exaile, Amarok and Banshee all of which were rubbish and overcomplicated). I did previously try iTunes but that was taking days and I had to keep booting up into Windows which was a pain each time (spending more time updating than actually using!)
Anyway, I wrote this little snippet below that might be useful to someone somewhere in the world. It has a couple of dependencies:
Warning: this will erase everything on your ipod everytime you run it - I hope you're tech savvy enough to be able to comment out the offending lines. You'll have to update the IPOD_MOUNTPOINT variable yourself
This will loop through your currently working directory (to infinite depth) hunting out all the mp3s. It will then go through every single mp3 and copy them onto the ipod. If it hits an mp3 that is > 256 kbits it will convert it down to 128kbits. Simples.
In the future I would quite like to make this a bit faster, it currently takes a while sequentially converting and transferring every single song. It would be better to parallelise the conversion across all my cores (I couldn't find a copy of fpMP3Enc easily which is apparently multicore lame). Anyway, what might be nice would be to have the conversions done in a different thread whilst the master thread works through copying all the songs that don't converting. Perhaps make a queue of songs to convert and then do those in the background whilst the main copying takes place? Also, I have the problem that when I want to add songs to my library I will have to do it either by hand or with another script....obviously I won't want to be deleting my entire library each time!
EDIT: This can be parallelised using gnu parallel pretty easily however you get problems if you run more than one gnupod_addsong at once. So you'd probably require something a bit more involved, probably using python to get it working well :p Unless you can do locks in shell? I might investigate some of the stuff here when I get the chance.
What I want, therefore is to be able to reduce the bitrate of nasty large bitrate songs so I can fit them all in...I like good quality music but I would prefer to have all my music rather than select some (the whole point of me buying an iPod in the first place).
So, like any good programmer I briefly tried the existing software but got bored far too quickly (tried Exaile, Amarok and Banshee all of which were rubbish and overcomplicated). I did previously try iTunes but that was taking days and I had to keep booting up into Windows which was a pain each time (spending more time updating than actually using!)
Anyway, I wrote this little snippet below that might be useful to someone somewhere in the world. It has a couple of dependencies:
- gnupod (downloaded from sources here: http://www.gnu.org/software/gnupod/)
- lame (installed from synaptic iirc)
- exiftool (also synaptic iirc)
Warning: this will erase everything on your ipod everytime you run it - I hope you're tech savvy enough to be able to comment out the offending lines. You'll have to update the IPOD_MOUNTPOINT variable yourself
This will loop through your currently working directory (to infinite depth) hunting out all the mp3s. It will then go through every single mp3 and copy them onto the ipod. If it hits an mp3 that is > 256 kbits it will convert it down to 128kbits. Simples.
In the future I would quite like to make this a bit faster, it currently takes a while sequentially converting and transferring every single song. It would be better to parallelise the conversion across all my cores (I couldn't find a copy of fpMP3Enc easily which is apparently multicore lame). Anyway, what might be nice would be to have the conversions done in a different thread whilst the master thread works through copying all the songs that don't converting. Perhaps make a queue of songs to convert and then do those in the background whilst the main copying takes place? Also, I have the problem that when I want to add songs to my library I will have to do it either by hand or with another script....obviously I won't want to be deleting my entire library each time!
EDIT: This can be parallelised using gnu parallel pretty easily however you get problems if you run more than one gnupod_addsong at once. So you'd probably require something a bit more involved, probably using python to get it working well :p Unless you can do locks in shell? I might investigate some of the stuff here when I get the chance.
#!/bin/bash export IPOD_MOUNTPOINT="/media/IPOD/" IFS=" " count=$(find . -type f -name '*.mp3' | wc -l) counter=0 for file in $(find . -type f -name '*.mp3'); do echo "Doing $counter of $count" filename=`echo $file | cut -c3-` res=`exiftool $filename | grep "Audio Bitrate" | awk '{ sub("Audio Bitrate : ",""); print}' | awk '{ sub(" .*",""); print}'` if [ "$res" -gt "256" ]; then rm /tmp/*.mp3 # convert this song echo "Converting $filename" lame --preset 128 "$filename" "/tmp/${filename##*/}" filename="/tmp/${filename##*/}" fi echo "adding $filename" gnupod_addsong $filename let counter=counter+1 done mktunes
Subscribe to:
Posts (Atom)
Can we just autofill city and state? Please!
Coming from a country that is not the US where zip/postal codes are hyper specific, it always drives me nuts when you are filling in a form ...
-
I've been playing with OpenCL quite a lot recently - both on discrete and fused GPU devices - with varying success. OpenCL is an open...
-
Recently I did a complete upgrade of my Kubuntu OS which managed to break one of my few useful utilities - Citrix Xenapp receiver. First i...
-
I am at the International Supercomputing conference this year and have just been sat in on an NVIDIA satellite event "Inside Kepler...