mogrify -quality 75 *.jpg
Friday, 27 July 2012
Quickly and Nicely Compress Your JPG
Wednesday, 25 July 2012
Where have you gone?
I'm sure many of you are hanging on tenterhooks for my next blog post. To be brief, I have just graduated and am taking a (well earned?) break. Have been travelling around Eastern Europe by train and now currently in the south of Spain.
Therefore, I haven't had much of a chance to keep up on news though someone did send me this which was definitely pretty big news and interesting comparison. It's a comparison of video transcoding using the two services. Amazon wins but I think it's pretty likely the gap will narrow shortly. Google to compete with Amazon's EC2 is big!...see my previous post for a look at the Amazon GPU infrastructure. When I get the time I will perform a comparison with the Google GPU infrastructure if it ever becomes available.
Monday, 2 July 2012
Getting Started with A GPU Cluster Instance On AWS
...a 1064 instance (17024 cores) cluster of cc2.8xlarge instances was able to achieve 240.09 TeraFLOPS for the High Performance Linpack benchmark, placing the cluster at #42 in the November 2011 Top500 list.
Pretty impressive!
I decided that it would be interesting to have a play with the GPU instances that one can use on AWS EC2. With each of these you get 2xTesla M2050 GPUs for very good value for money (though relatively expensive compared to other instance types) at $2 per hour.
So, I created my instance using the wizard, most of this is click through common sense stuff. The only thing to be careful with is to select the correct AMI on the first page: chose the one with GPU in the title!
Then the rest is just a matter of clicking through the wizard and downloading your *.pem file for passwordless ssh login (you need to do chmod 400 file.pem to the newly downloaded file to be able to use it):
ssh -i file.pem ec2-user@server-ip
You can get your server IP from the EC2 management console interface under "Public DNS".
OK, so at this stage I'm logged into my GPU instance. The first thing I run is nvidia-smi whereby I am greeted with the following message:
NVIDIA: could not open the device file /dev/nvidiactl (No such file or directory).
Nvidia-smi has failed because it couldn't communicate with NVIDIA driver. Make sure that latest NVIDIA driver is installed and running.
Hmm, that's not very friendly!
After some digging, it turns out I had been using the wrong instance type! During my rapid clicking, I should have selected cg1.4xlarge rather than cc1.4xlarge in one of the dialogs:
Rookie error.
After that, everything seems to be running as normal. Now to run some code.
Compiling
So once we're logged in with nvidia-smi up and running it's time to start compiling some code. This requires a couple of things such as the headers and the libs!After some scraping around, the CUDA headers and libs can be found here and here:
Sunday, 1 July 2012
Using Embedded Webcam on Sony VAIO VGN-SZ61MN
This only works if you have a Ricoh webcam. You can find this out by running lsusb in the terminal and see if it emits any lines with the word "Ricoh" in. I have no idea why Ricoh and Sony are using non standard webcam APIs but that's another story.
In short, run the following and you will have your Ricoh camera running on your VAIO:
sudo add-apt-repository ppa:r5u87x-loader/ppa sudo apt-get update sudo apt-get install r5u87x sudo /usr/share/r5u87x/r5u87x-download-firmware.sh
Friday, 29 June 2012
Memory Access Ordering in OpenCL
What I'm doing is, in each workitem I am reading a single t_speed (see the struct below) from an array of t_speeds.
typedef struct { float speeds[9]; } t_speed;
ld.global.f32 %f190, [%r14+32]; ld.global.f32 %f8, [%r14+28]; ld.global.f32 %f7, [%r14+24]; ld.global.f32 %f187, [%r14+20]; ld.global.f32 %f5, [%r14+16]; ld.global.f32 %f4, [%r14+12]; ld.global.f32 %f194, [%r14+8]; ld.global.f32 %f195, [%r14+4]; ld.global.f32 %f200, [%r14];
Emitting and Reading OpenCL Binary
For clarity, I've omitted quite a lot of error checking here (for example after compilation) so you'll need to add that in yourself!
Write a Binary
Firstly, lets open a normal kernel source file (in the variable srcFile) and load it into memory. Then build that as per usual. We then have to calculate the binary size and use clGetProgramInfo to write the binary into a variable. Finally we emit that to a file.FILE *fIn = fopen(srcFile, "r"); // Error check the fIn here // get the size fseek(fIn, 0L, SEEK_END); size_t sz = ftell(fIn); rewind(fIn); char *file = (char*)malloc(sizeof(char)*sz+1); fread(file, sizeof(char), sz, fIn); const char* cfile = (const char*)file; *m_cpProgram = clCreateProgramWithSource(*m_ctx, 1, &cfile, &sz, &ciErrNum); ciErrNum = clBuildProgram(*m_cpProgram, 1, (const cl_device_id*)m_cldDevices, compilerFlags, NULL, NULL); // Calculate how big the binary is ciErrNum = clGetProgramInfo(*m_cpProgram, CL_PROGRAM_BINARY_SIZES, sizeof(size_t), &kernel_length, NULL); unsigned char* bin; bin = (char*)malloc(sizeof(char)*kernel_length); ciErrNum = clGetProgramInfo(*m_cpProgram, CL_PROGRAM_BINARIES, kernel_length, &bin, NULL); // Print the binary out to the output file fp = fopen(strcat(srcFile,".bin"), "wb"); fwrite(bin, 1, kernel_length, fp); fclose(fp);
Read a Binary
The following extract can be used to build a program from a binary file, emitted or modified from a previous OpenCL run...// Based on the example given in the opencl programming guide FILE *fp = fopen("custom_file.ptx", "rb"); if (fp == NULL) return -1; fseek(fp, 0, SEEK_END); int kernel_length = ftell(fp); rewind(fp); unsigned char *binary = (unsigned char*)malloc(sizeof(unsigned char)*kernel_length+10); fclose(fp); cl_int clStat; *m_cpProgram = clCreateProgramWithBinary(*m_ctx, 1, (const cl_device_id*)m_cldDevices, &kernel_length, (const unsigned char**)&binary, &clStat, &ciErrNum); // Put an error check for ciErrNum here ciErrNum = clBuildProgram(*m_cpProgram, 1, (const cl_device_id*)m_cldDevices, NULL, NULL, NULL);
Wednesday, 20 June 2012
NVIDIA Talking about CARMA (Tegra 3) at ISC
Yesterday at ISC I sat in another session from NVIDIA, this time on CARMA (aka CUDA for ARM or Tegra 3) by Don Becker (of Beowulf fame). This is NVIDIA looking to target developers with a very low power SoC. The board runs at a peak of 48W of LINPACK giving a single precision performance of 200GFLOPS. We were told that under normal loads the board only requires at most 25W which is pretty impressive. In its current incarnation it consists of an ARM A9 CPU attached via x4 PCIe (< PCIe 2) to a Quadro 1000M - a Fermi based laptop graphics card with 96 CUDA cores. This, coupled with 2GB of RAM. Interestingly, this is done via an MXM connection which means in the future you will most likely be able to hotswap the GPU out for another. Much more flexibility for your usage scenario. The board has an incredible array of connectors including 2x HDMI, 2x ethernet, SATA and video in to name just a few. Gives you a lot of flexibility those does make the board itself relatively large. You can of course boot it from the network or from the SD card.
Initially it will be running CUDA 4.2 (downgraded from CUDA 5 recently for some reason) on an Ubuntu 11.04 bases OS using the 3.1.10 Linux kernel.
It was murmured that in the future they might looking at fusing the two memory regions of the CPU and GPU (related to project denver) which would be awesome. Along with this, they might look at supporting ARMv8 64bit.
Basically looking at a much beefier raspberry pi..despite the price I think I would prefer this! Unfortunately OpenCL support would however have to be driven by the community and will not be done by NVIDIA (apparently).
Priced at $629 from seco - you can sign up for one now.
Can we just autofill city and state? Please!
Coming from a country that is not the US where zip/postal codes are hyper specific, it always drives me nuts when you are filling in a form ...

-
Recently I was working on doing a fair comparison between CUDA and OpenCL on a Tesla M2050 GPU. For this, I was looking at the intermediate ...
-
Recently I did a complete upgrade of my Kubuntu OS which managed to break one of my few useful utilities - Citrix Xenapp receiver. First i...
-
I've been playing with OpenCL quite a lot recently - both on discrete and fused GPU devices - with varying success. OpenCL is an open...