Updating creative zen firmware
With CNTK and a cluster of 96 GPUs you can expect a new linear speed of about 90x-95x.Pytorch might be the next library which supports efficient parallelism across machines, but the library is not there yet.
If you train two convolutional nets on separate GPUs on small datasets you will more quickly get a feel for what is important to perform well; you will more readily be able to detect patterns in the cross validation error and interpret them correctly.With a good, solid GPU, one can quickly iterate over deep learning networks, and run experiments in days instead of months, hours instead of days, minutes instead of hours.So making the right choice when it comes to buying a GPU is critical.If you want to parallelize on one machine then your options are mainly CNTK, Torch, Pytorch.
These library yield good speedups (3.6x-3.8x) and have predefined algorithms for parallelism on one machine across up to 4 GPUs.
Even if some Open CL libraries would be available in the future I would stick with NVIDIA: The thing is that the GPU computing or GPGPU community is very large for CUDA and rather small for Open CL.