Blog Post

CUDA with Python

By Prabindh Sundareson in May 2025 under CUDA

CUDA support is inbuilt into many Pythonistic libraries with RAPIDS. In addition, many domain specific libraries are now enabled with CUDA. Also checkout Andy's PyData 2025 talk at https://www.youtube.com/watch?v=Gzp8CdOztTE

GPU L1, L2, Texture Accesses

By Prabindh Sundareson in May 2023 under GPU

…The number of L1 cache banks used depends on the number of texels that must be accessed in parallel…

https://graphics.cs.utah.edu/research/projects/high-order-interpolation/highorderinterpolation.pdf

From “Hardware Adaptive High-Order Interpolation for Real-Time Graphics”, D.Lin et al, HPG, 2021

The A100 GPU includes 40 MB of L2 cache, which is 6.7x larger than V100 L2 cache.The L2 cache is divided into two partitions to enable higher bandwidth and lower latency memory access. Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition. This structure enables A100 to deliver a 2.3x L2 bandwidth increase over V100

The larger and faster L1 cache and shared memory unit in A100 provides 1.5x the aggregate capacity per SM compared to V100 (192 KB vs. 128 KB per SM) to deliver additional acceleration for many HPC and AI workloads.

Frame Rate Amplification

By Prabindh Sundareson in Jan 2023 under Graphics

https://blurbusters.com/frame-rate-amplification-technologies-frat-more-frame-rate-with-better-graphics/

Talks about DLSS, Oculus, and some great work by Cambridge researchers, "Temporal Resolution Multiplexing: Exploiting the limitations of spatio-temporal vision for more efficient VR rendering" Gyorgy Denes∗ et al.

What is the point of the metaverse ?

By Prabindh Sundareson in July 2022 under Metaverse

General information from companies that work in this field.

https://porch.com/advice/vr-metaverse-experts-advice

In addition, apple patents related to (perhaps) AR/VR hardware in the pipeline, including a 3000 ppi device, a virtual paper that is interactive, etc

https://www.patentlyapple.com/augmented-reality/

Nvidia is sponsoring an "Extend the Omniverse" contest, ending Aug 19 2022. Participants need to build an extension using Omniverse Kit.

Join at https://www.nvidia.com/en-us/geforce/contests/omniverse-developer-contest-terms-conditions/

Blog Post

libtorch C++ and libraries

By Prabindh Sundareson in Nov 2021 under LIBTORCH, CUDA

libtorch provides C++ API for the Torch framework.

The below libraries are required for the linking step, for any C++ program using libtorch (with GPU acceleration)

"C:\Users\aaa\Downloads\opencv\build\x64\vc15\lib\opencv_world454.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\caffe2_nvrtc.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\c10.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\c10_cuda.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\torch.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\torch_cpu.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\torch_cuda.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\torch_cuda_cpp.lib"
"C:\Users\aaa\Downloads\libtorch-win-shared-with-deps-1.8.2+cu111\libtorch\lib\torch_cuda_cu.lib"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cublas.lib"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudart.lib"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cudnn.lib"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\cufft.lib"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64\curand.lib"
"C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"

And, the corresponding DLLs from the libtorch package have to be in the current folder or available in PATH

The libtorch LTS 1.8 package itself can be downloaded from

https://download.pytorch.org/libtorch/lts/1.8/cu111/libtorch-win-shared-with-deps-1.8.2%2Bcu111.zip

Blog Post

FFMPEG oft used commands

By Prabindh Sundareson in Jun 2021 under FFMPEG, CUDA

FFMPEG commands for creating SBS videos, scaling, overlays, streaming and CUDA/OpenGL interop code.

https://gist.github.com/prabindh/c8048ac0f4cd6d48e9b682523e5b3c1f

Blog Post

PyTorch and CUDA on Windows

By Prabindh Sundareson in Jun 2021 under CUDA

If torch.cuda.is_available() is returning false, after installing cudatoolkit and pytorch packages via Conda install, check the pytorch version carefully again.

conda list

If this is showing pytorch-cpu, then this is not the right version to use.

Steps

- Uninstall this CPU version of pytorch, installed from conda channels

- Follow the steps outlined in \ Start Locally | PyTorch (https://pytorch.org/get-started/locally/) \

- for example,

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

This ensures a pytorch version that is compatible with CUDA, ie, the package name looks like "pytorch-1.7.1-py3.7_cuda102_cudnn7_0" is installed.

Note - the version "pytorch 1.7.1 py3.7_cuda102_cudnn7_0 pytorch" supports the below compute capabilities - sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. To check the compute capability of the GPU, refer to https://gpupowered.org/mygpu/

Blog Post

Identifying GPU Arch (sm_ ) for CUDA

By Prabindh Sundareson in Apr 2021 under CUDA

A simple web browser-based mechanism to identify the sm version of GPU used on a desktop. This information is required for compiling .cu kernels.

https://gpupowered.org/mygpu/

Blog Post

LEGO GPU - 3dfx Interactive Voodoo 3D accelerator

By Prabindh Sundareson in Nov 2020 under LEGO

User "Bhaal_spawn" has created a fan page for voodoo, a first of its kind 3D accelerator, using LEGO bricks.

https://ideas.lego.com/projects/480e824e-d651-4192-996a-937eb7b4fe98

TPOT Blog Post

TPOT Automated ML pipeline discovery

By Prabindh Sundareson in Oct 2020 under CUDA TPOT

TPOT is a partial Automated Machine Learning toolkit, that can "discover" pipelines given a data-set, including optimal feature engineering, and the pipeline itself. A detailed comparison with manual tuning is necessary here.

The following code illustrates how TPOT can be employed for performing a simple classification task over the Iris dataset.

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

Running this code should discover a pipeline (exported as tpot_iris_pipeline.py)

CUDA Blog Post

No kernel image available - Tensorflow 2.3.0

By Prabindh Sundareson in Sep 2020 under CUDA TENSORFLOW 2.3.0

When moving from versions below 2.3.0 to Tensorflow 2.3.0 (rc0/rc2/release) - the below error might be faced.

Non-OK-status: GpuLaunchKernelstatus: Internal: no kernel image is available for execution on the device

.This is because, TF team took the decision to support only compute capability 7.0, to reduce binary sizes of distribution.

This is outlined in the GPU section of the release notes at,

https://github.com/tensorflow/tensorflow/releases/tag/v2.3.0 - TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.

GRAPHICS Blog Post

gl-transitions with C++ and libANGLE

By Prabindh Sundareson in July 2020 under OPENGLES2 ANGLE

gl-transitions.com provides great special effects for transitions from one surface to another using glsl (ES) shaders. This is targeted for WebGL applications, but thinking about it, why not in native (C++) applications ?

Wrote up this post about how to integrate these shaders directly into native code, using nengl, a wrapper for OpenGLES2 applications. This is using OpenGL ES with EGL context on windows desktops via glfw3 and libANGLE.

Check out the code in github for a Windows application using libANGLE at,

https://github.com/prabindh/nengl

And a more detailed post at,

https://medium.com/@prabindh/using-gl-transitions-for-effects-9e73abfc8fd5

Note - this can be used as is on Linux and other platforms that support OpenGLES2 or OpenGLES3.

CUDA Blog Post

CUDA on WSL2 announced

By Prabindh Sundareson in Jun 2020 under CUDA WSL2

Windows Subsystem on Linux (WSL2) provides a way to use Linux functionality in Windows itself, by running a Linux Kernel in Windows.

This month, Nvidia and Microsoft announced availability of CUDA API in WSL2, as part of the Insider Preview. This enables CUDA based applications to run in Linux on WSL2, on Windows.

Note: These are command line applications.

More info at,

https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-cuda-in-wsl

https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2

PARABRICKS Blog Post

Parabricks installation error with Docker

By Prabindh Sundareson in Apr 2020 under Genomics Parabricks

If you are using Docker version 19.03.5 and nvidia-docker, the installer.py is not setup to check the GPU installation correctly. This can result in errors below and a failed installation, even if docker works correctly with GPU in other applications/container use-cases.

"docker does not have nvidia runtime. Please add nvidia runtime to docker or install nvidia-docker. Exiting..."

Installer.py requires the changes below for successful installation.

https://github.com/prabindh/parabricks-changes/commit/b39f61b8512240bd8c3e7a903f09326fb029893f

Or the complete file below.

https://github.com/prabindh/parabricks-changes/blob/master/installer.py

Further steps in the germline pipeline work as per documentation,

Steps in the Pipeline:
Alignment of Reads with Reference
Coordinate Sorting
Marking Duplicate BAM Entries
Base Quality Score Calibration of the Sample
Apply BQSR for the Sample
Germline Variant Calling
Read more at,
https://www.parabricks.com/germline/

Blog Post

Genomics with Parabricks

By Prabindh Sundareson in Mar 2020 under Genomics Parabricks

Analysis of Genomic data with Parabricks

NVIDIA PARABRICKS

Analyzing genomic data is computationally intensive. Time and cost are significant barriers to using genomics data for precision medicine.

The NVIDIA Parabricks Genomics Analysis Toolkit breaks down those barriers, providing GPU-accelerated genomic analysis. Data that once took days to analyze can now be done in under an hour. Choose to run specific accelerated tools or full commonly used pipelines with outputs specific to your requirements.
https://www.developer.nvidia.com/nvidia-parabricks

Blog Post

Enabling COVID research with the GPU with OpenMM and Folding@Home

By Prabindh Sundareson in Mar 2020 under GPU covid

What is the objective of Folding@Home COVID-19 ?
"After initial quality control and limited testing phases, Folding@home team has released an initial wave of projects simulating potentially druggable protein targets from SARS-CoV-2 (the virus that causes COVID-19) and the related SARS-CoV virus (for which more structural data is available) into full production on Folding@home. This initial wave of projects focuses on better understanding how these coronaviruses interact with the human ACE2 receptor required for viral entry into human host cells, and how researchers might be able to interfere with them through the design of new therapeutic antibodies or small molecules that might disrupt their interaction.

How does Folding@Home for COVID-19 work ?
Step1: Download the installer for your platform - at https://foldingathome.org/start-folding/
Step2: Install on your system with default options (including screensaver, start at boot, etc)
Step3: After install, automatically "Folding@Home" client will launch, or you can start it from the Start menu as a Desktop application.
Step4: Configure to use the GPU when idle, or when using it. Configure your username/join a team name.
Locally, an application called "FAHClient.exe" starts, and in case the firewall needs to be enabled for this application to communicate to the Web (for status and data updates), enable it.

What Compute Units are being used ?
Digging deeper into the running execution runtimes, it seems that both the CPU and integrated GPU and Nvidia GPUs are being used on the Laptop, and in both the cases of GPU, OpenCL kernels are being used.

On the Intel CPU, some of the accelerated FFT primitives are being used.
These are initiated by FahCore_22 executables, that are launched as below:
"C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\xx\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 5876 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
"C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\xx\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 705 -lifeline 5876 -checkpoint 15 -np 6

Thoughts and Followup
Looking at the usage of different engines, the Nvidia GPU can be optimised to use CUDA kernels that can potentially provide improved performance in this case. Raised an issue about this and the scheduler at https://github.com/FoldingAtHome/fah-issues/issues/1326

Blog Post

Accelerated Video Encode Decode with Nvidia GPU

By Prabindh Sundareson in Mar 2020 under GPU Video nvenc

Options for effectively using nvenc and nvdec

From this presentation http://on-demand.gputechconf.com/gtc/2018/presentation/s8601-nvidia-gpu-video-technologies.pdf explains the various nvenc and nvdec and pre/post processing options available via the HW engine, and CUDA API. Target 490 fps on a GP104 GPU for a Transcode session.

In addition, various tools to debug the usage, via nvidia-smi dmon command are explained.

Blog Post

SLIDE - (Sub-LInear Deep learning Engine) Rice Univ

By Prabindh Sundareson in Mar 2020 under CPU Competition Extreme Classifier Algorithms

Using Locally Sensitive Hashing (LSH) to reduce training time

From this rice university paper at MLSys 2020 https://www.cs.rice.edu/~as143/Papers/SLIDE_MLSys.pdf, the authors present a method for fast training to required levels of accuracy, using LSH, for Extreme classification tasks (ex Amazon670 etc)
LSH was introduced in this paper "LSH-SAMPLING BREAKS THE COMPUTATIONAL CHICKEN-AND-EGG LOOP IN ADAPTIVE STOCHASTIC GRADIENT ESTIMATION

Without Hugepages, perf drops by 30%
Without GMA,AVXx,SSE4.x perf drops by another 35%
Benefits from adaptive sampling of active neurons

The code is provided at https://github.com/keroro824/HashingDeepLearning

Blog Post

Avoid Google Colab Disconnect

By Prabindh Sundareson in Mar 2020 under Tips Colab

Tip for preventing Google Colab from disconnecting

From https://www.hackster.io/bandofpv/reading-eye-for-the-blind-with-nvidia-jetson-nano-8657ed
During long operations (ex Training a model), to prevent Google Colab from disconnecting to the server, press Ctrl+ Shift + I to open inspector view. Select the Console tab and enter this:

function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)

Blog Post

Performance improvement with cudf for the groupby operation

By Prabindh Sundareson in Feb 2020 under GPU cudf, rapids with GPU

Rapids cudf improvement on large CSV groupby

Test case at - https://github.com/prabindh/deepnotes/blob/master/rapids/cudf-test.py

Selected chunk size: 1000000
Running CPU, size = 1000000
Pandas Groupby Time = 0.5486793518066406
Pandas Groupby Time = 0.0010442733764648438
Pandas Groupby Time = 0.0006716251373291016
Pandas Groupby Time = 0.0006544589996337891
Pandas Groupby Time = 0.0006563663482666016
Running GPU, size = 1000000
Cudf Groupby Time = 0.015185356140136719
Cudf Groupby Time = 0.0007460117340087891
Cudf Groupby Time = 0.0007538795471191406
Cudf Groupby Time = 0.0006649494171142578
Cudf Groupby Time = 0.0006606578826904297

Blog Post

DeepOps - Deploying GPU clusters

By Prabindh Sundareson in Feb 2020 under GPU Clusters

DeepOps framework for deploying GPU clusters

From https://github.com/NVIDIA/deepops
The DeepOps project encapsulates best practices in the deployment of GPU server clusters and sharing single powerful nodes (such as NVIDIA DGX Systems). DeepOps can also be adapted or used in a modular fashion to match site-specific cluster needs. For example:

An on-prem, air-gapped data center of NVIDIA DGX servers where DeepOps provides end-to-end capabilities to set up the entire cluster management stack
An existing cluster running Kubernetes where DeepOps scripts are used to deploy Kubeflow and connect NFS storage
An existing cluster that needs a resource manager / batch scheduler, where DeepOps is used to install Slurm, Kubernetes, or a hybrid of both
A single machine where no scheduler is desired, only NVIDIA drivers, Docker, and the NVIDIA Container Runtime

A virtual deploy guide is also provided, to test out the deployment on a single machine.

Blog Post

Nvidia-Docker with GPU for Ubuntu 18.04.3

By Prabindh Sundareson in Feb 2020 under GPU Docker with GPU

Nvidia-Docker setup on Ubuntu 18.04.3

This link contains steps for installing Nvidia GPU enabled GPU, on Ubuntu 18.04.3
https://github.com/prabindh/deepnotes/blob/master/docker-18.04.3/docker.txt
Output of the nvidia-smi command running in the container should look like below

Blog Post

RAPIDS for the GPU

By Prabindh Sundareson in Feb 2020 under GPU Rapids ML

Rapids support

Rapids framework is available on Linux, most recent being the 0.11 version. Get the framework corresponding to preferences via the configurator at,
https://rapids.ai/start.html
Rapids framework is not available on Windows, and will show the error "PackagesNotFoundError: The following packages are not available from current channels". For the reasoning on why Rapids is not available via pip, due to the manylinux related issue, read more at https://medium.com/rapids-ai/rapids-0-7-release-drops-pip-packages-47fc966e9472

Blog Post

Enabling OpenCV 4.1.0 pkg-config

By Prabindh Sundareson in May 2019 under OpenCV Frameworks

By default OpenCV4 does not enable package config (pkg-config pc files) generation anymore. But in 4.1.0 atleast, we can force enabling this during configure as below.

Follow instructions in http://www.linuxfromscratch.org/blfs/view/svn/general/opencv.html to download
In cmake configure step, add this "-DOPENCV_GENERATE_PKGCONFIG=ON", and make and make install as described
Now, pkg-config can be used for opencv, with the package name of "opencv4". Detailed output is mentioned in the post https://github.com/opencv/opencv/issues/13154#issuecomment-495978535

$pkg-config --cflags opencv4
-I/usr/include/opencv4/opencv -I/usr/include/opencv4

$ pkg-config --libs opencv4
-lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_dnn_objdetect -lopencv_dpm -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_gapi -lopencv_hfs -lopencv_img_hash -lopencv_line_descriptor -lopencv_quality -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_stereo -lopencv_stitching -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_superres -lopencv_optflow -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_dnn -lopencv_plot -lopencv_videostab -lopencv_video -lopencv_xfeatures2d -lopencv_shape -lopencv_ml -lopencv_ximgproc -lopencv_xobjdetect -lopencv_objdetect -lopencv_calib3d -lopencv_features2d -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_flann -lopencv_xphoto -lopencv_photo -lopencv_imgproc -lopencv_core

Blog Post

CUDA, Keras, Tensorflow versions

By Prabindh Sundareson in May 2019 under GPU Keras

Current working version that is supportive of recent research works

keras - 2.2.4
CUDA 10.0 + matching CUDNN
tensorflow-gpu - 1.13.1

Post

CenterNet based COCO data-set object detection on Windows

By Prabindh Sundareson in April 2019 under GPU Object Detection

CenterNet uses center-points instead of typical bounds of region of interest. Since the default build is on Linux, this post updates the steps for Windows. link https://github.com/prabindh/deepnotes/tree/master/CenterNet, this is derived from xingyizhou et al , CenterNet. Steps and results of Webcam demo updated. GPU loading is 70% (Quadro1000M) at approx 30 fps using default Python code. Refer GPU-z logs in the same folder.https://github.com/prabindh/deepnotes/blob/master/CenterNet/centernet-GPU-Z%20Sensor%20Log.txt

Windows port of CenterNet https://github.com/prabindh/deepnotes/tree/master/CenterNet

Post

Labelling and Training to detect capacitors in a PCB with Yolo (and Squeezedet) Deep learning framework in 1 hour

By Prabindh Sundareson in March 2019 under GPU ML

One of the most time consuming tasks in object detection using deep learning frameworks like Yolo or Caffe, is the manual labelling.
This post shows how to perform labelling automatically with euclidaug and complete the detection task using Yolo in under one hour of work (including autolabelling), for a 3-class model of electronic capacitors in a PCB (Printed Circuit Board). Methods for Squeezedet (that uses the KITTI output mode of euclidaug since squeezedet uses KITTI format) are also shown.

https://github.com/prabindh/yolo-bins/tree/master/capacito

Post

Yolo on the Tegra Jetson Nano with CUDNN

By Prabindh Sundareson under GPU ML Nvidia Jetson Embedded

Binaries for Yolov3, for Nvidia Tegra Nano, based on Ubuntu Linux available in the Jetson Nano Linux image, now available at the repository

https://github.com/prabindh/yolo-bins

Post

Making (and reverse engineering music) with Tensorflow

By Prabindh Sundareson under GPU ML Music

Magenta and its applications (music transcribing - https://piano-scribe.glitch.me/) seem interesting, for the way the onset events in the music are calculated with LSTM, and how the metrics seem much better than previous sota.

https://magenta.tensorflow.org/

Post

C++ Port of Darknet (of YOLO fame) - CUDA and OpenCL

By Prabindh Sundareson in April 2017 under GPU ML

OpenCV3 failures when working with C based DL frameworks, like DeepNet (Made famous by YOLO - http://pjreddie.com/darknet/yolo/) is a common issue.
Here is the latest version of Darknet, ported to C++, fixing many coding bugs along the way. Work involved primarily encapsulation of APIs with C linkages, including undefined headers, bug fixes, and typecasting various allocations to actual types, and using correct Error detection types for CUBLAS. With a port to OpenCL by myestro.
For training with own dataset, and detection, refer to the updated README at,
machine learningyolocaffedarknetc++
Read more about C++ Port of Darknet (of YOLO fame) 0 Comments

Post

GFX2017 Graphics Workshop completed

By Prabindh Sundareson under GPU ML

GFX2017 Graphics workshop was completed on Apr 29th, 2017. Report at the IEEE Site,
link 0 Comments

Post

Introducing Euclid and Euclidaug, a labeller and augment tool for image-datasets

By Prabindh Sundareson under GPU ML

Euclid is a tool for manual labelling of data - sets, such as those found in Deep learning systems that employ Caffe, systems like Tensorflow, SqueezeDet, and YOLO. It is an object / class labelling tool for machine learning frameworks, with applications in Road sign detection, Animal detection, Retail, Defense machinery. A typical usage is as in the IEEE paper "On the Applicability of Deep Learning for Road Signal Recognition", by Vinicios R. Soares et al- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8627071
https://github.com/prabindh/euclid/
This tool runs on Linux and Windows, and is based on Python

Label format support:
digitsdetectnetcaffelabellingeuclid
Read more about Introducing Euclid, a labeller for image-datasets 0 Comments

Integrating Darknet/Yolo and OpenCV3, with Qt5

Submitted by prabindh on Sun, 01/08/2017 - 19:05 / /
Just added a shared-library port of latest Darknet/Yolo framework, that enables easy integration into other frameworks like Qt5.
An example Qt5 application, with OpenCV3, and Darknet is built in below repository.
https://github.com/prabindh/qt5-opencv3-darknet
qt5yoloopencv3
Read more about Integrating Darknet/Yolo and OpenCV3, with Qt5 0 Comments

Post

Impact of Qualcomm-NXP-Freescale on the GPU Ecosystem

By Prabindh Sundareson in November 2016 under GPU HW Ecosystem Qualcomm/NXP/Freescale

The proposed Qualcomm-Nxp-Freescale merger brings a new dimension in terms of GPU variants in the new entity - we have (1) The Vivante GC2000, GC880 series (IMX5,6), (2) The Adreno (erstwhile) Z series, and (3) Qualcomm's Adreno 3 series, Adreno 4 series, and Adreno 5 series.

How do they compare and who is going to win ? Read more at this linked in post

Post

Khronos Chapter Inaugurated

By Prabindh Sundareson in October 2016 under GPU Graphics Khronos

The Khronos chapter at Bangalore was inaugurated recently with participation from key companies - Samsung, Nvidia, AMD, TI, and many more startups and established companies. Read more at this Samsung page, and in this Khronos page . Panel discussion on how Khronos chapter can proceed further in coming years at this Khronos Youtube link

Post

Origins

By Prabindh Sundareson in August 2011 under GPU Graphics WebGL

GPUPowered.Org was started as a WebGL experiment in 2010-11, when WebGL was still in its early stages. The tutorials setup has been used in various presentations every year. Ref http://ewh.ieee.org/r10/bangalore/ces/

GPU Powered

[Algorithms][Applications]

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

TPOT Blog Post

CUDA Blog Post

GRAPHICS Blog Post

CUDA Blog Post

PARABRICKS Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Blog Post

Recent Post

Blog Post

Blog Post

Post

Post

Post

Post

Post

Post

Post

Integrating Darknet/Yolo and OpenCV3, with Qt5

Post

Post

Post

GPU
Powered