Week 10b: Translating NumPy to CuPy

Apr 5, 2020
2 min read

To start off this week we ran tests comparing each aspect of the processing code to determine the breakdown of the processing time. Table 1 shows the differences between the old purely CPU run code and the current GPU FFT code. Framing is done by the CPU in both programs, while FFT is done by the GPU in the current code. Because framing is done by the CPU in the current code, converting between CPU and GPU arrays is needed, this is the current workaround needed while we research a way to implement CuPy to the framing method.

Table 1: Old vs New Code

It can be seen in the table that in the GPU FFT code, even with the additional conversion times, the processing runs twice as fast. As a disclaimer, this is using a personal desktop with an Intel i7-7700k and GTX 1070.

One of the bigger setbacks of implementing CuPy is the lack of translated API modules from NumPy. While some of these have been implemented, there are slight differences in utilizing them. For instance, while np.argmax is implemented via cupy.argmax, the parameters required by both are different. The NumPy version has three parameters, two documented as optional, while CuPy has five with no indication of what is optional. Cupy.argmax also needs a GPU array and previous code in Processing creates a CPU array. Another problem is that “np.argwhere” has not yet been implemented and included in the latest CuPy. This finds the indices of an array where they are nonzero. A workaround we tried was using a combination of “transpose” and “nonzero” but we found that our array was returned as a tuple, which is the form of data required by the framing code. We are still trying alternative solutions using this combination.

Currently, we are following the official CuPy GitHub repository and are following a request made for argwhere implementation. They are currently running tests and are close to completion for the function. Once finished, we will be able to freely pull from the repository and add it for our use, since the official CuPy version will not be updated until more additions can be made for release. In the meantime, we are looking into alternative options as we cannot rely on this being released.

Finally, we are also awaiting a remote login for the newer, more powerful workstation to be set up. There we can get more precise timing comparisons with the intended GPU in addition to testing our future code on. Additionally, progressing is difficult under these circumstances due to Alex’s lack of access to a workstation with a discrete GPU to test GPU code upon. Our recent plan of attack has been separate research and then meetups, where the code can be tested on Trevor’s personal computer and any errors and exceptions, can be researched in the background by the other. They then can offer edits for code revisions as well as suggestions for alternatives. With remote access available both of us can work simultaneously on our own copies of the code. We can not only bounce ideas off each other but also test those ideas right away rather than putting them in a queue.

Week 10b: Translating NumPy to CuPy

Recent Posts

Comments