Week 4 Digging In

Oct 7, 2019
6 min read

Updated: Oct 14, 2019

Weekly Recap

This week we did another system walkthrough with Dr. Asghari. There were some complications in seeing the signal produced by the Menlo Systems unit, but the debugging process with Dr. Asghari was particularly useful to see firsthand. We checked the system in stages, first making sure all components were properly turned on from the amplifier to the Arduino driver. We then checked to see if later fiber connections in the system were properly attached. After these were verified, we looked at the coupler connections after the input stage to see if the system properly distributed 99% of the power to the sample probe and 1% to the free-space collimators. The power meter measured significantly low outputs at both ends to our surprise, but maintained the intended distribution. We resolved to clean the fiber heads before the signal split. This solved the disparity between expected and measured wattage.

Unfortunately, with the consideration of time and the sample probe’s undergoing modification, the runthrough was rescheduled for the following week when the system will be fully intact.

Background Research - Alex Esclamado

This week, I decided to research OCT projects utilizing GPUs in a similar fashion that we aim to do with this capstone.

To begin, optical coherence tomography can be classified between time domain and Fourier domain OCT. For time-domain OCT, the interference signal is measured in time while sampling the signal and generating an A-scan from the reference mirror. An A-scan, known as an Amplitude scan, is the reflectivity profile that holds information about the sample, such as spatial dimensions. Fourier domain OCT does not require mechanical A-scans by recording the interference signal as a function of wavelength. This form of OCT has substantially improved imaging speeds and a better signal-to-noise ratio.

In May 2010, the Journal of Biomedical Optics published the work of Sam van der Jeught at the University of Antwerp in the Netherlands. In his Master thesis centered on real-time resampling, he assessed that central processing units are incapable of coping with the data processing speeds and limits the sample rate significantly. Thus, he implemented a low-cost graphics processing unit to perform the digital signal processing required in Fourier domain OCT. This resulted in a real-time video rate of 25 fps OCT imaging and removed any need for additional processing hardware. Images were processed 1024 x 1024 pixels using a line-scan CCD camera that operated at 25.6 kHz.

The imaging system utilized a Superlum Broadlighter S840 that had a center wavelength of 840 nm and an output power of 20 mW. Their PC system included an Intel Xeon E5405 that was paired with a Geforce 9800GT. This CPU has four cores with a base clock of 2.00 GHz. The graphics card has a base clock of 600 MHz, a memory clock of 900 MHz (1800 MHz effective), a memory size of 512 MB, and a memory bandwidth of 57.60 GB/s. The program was written in NVIDIA’s CUDA software environment and compiled in Microsoft Visual Studio 2008.

Taking into consideration the specs of the system utilized in this project, there is certainly great potential with the hardware LMU possesses and the potential GPU that will be implemented.

System Bottlenecks and GPU Framerates - Trevor Wong

Per my calculations, assuming a 1 MB per frame, the current maximum potential frame rate is 12.5 fps, given that the data transferred from the oscilloscope to the computer is at 100 Mb/s. However the Ethernet cable used is rated for 1 Gb/s, so the theoretical maximum frame rate is 125 fps. This is the biggest bottleneck of the current system. As the system currently only outputs 4 fps, adding a discrete GPU at this stage only has a max potential of 12.5 fps. Thus this issue needs to be solved before any graphics card is implemented. The current plans for this are to identify the cause of this bottleneck. Possible areas of defect are the Ethernet cable and the oscilloscope. A quick option is to test another 1 Gb/s Ethernet to verify if the problem is the cable itself or something to do with the oscilloscope. Another solution that we are currently looking at is to change the data transfer from ethernet to USB-C which is rated at 5 Gb/s. This would allow us a theoretical max of frame rate of 625 fps. If the USB-C solution proves successful, while the Ethernet change does not, then we can identify the possible defect being the Ethernet port. If the USB-C solution does not prove successful, then the problem is most likely the oscilloscope. In this situation, if there is no way to fix the oscilloscope through its settings then we may have to use the secondary oscilloscope to replace the current one. Finally, if all these solutions fail, then we must take a look at other areas, such as the computer or the Python code.

In addition to the identifying and alleviating the system’s limitations, I researched the potential boost in framerate for several graphics cards. The main desktop used houses an Intel i5-6500 which contains an Intel HD Graphics 530 integrated GPU and the secondary desktop has an Intel HD Graphics 3000. The comparison between the current graphics versus the graphics cards in our list are shown in Table 1. The percentage column of the table is based on the HD 530’s performance because that was the system tested and resulted in 4 fps. The equation for calculating the potential fps goes as follows:

fps = 4 + percentage * 4

All data was pulled from https://gpu.userbenchmark.com. This is a site that collects data and calculates effective 3D speeds on each GPU from thousands of samples. The tests consist of four rendering tests: an NBody particle system, a flocking swarm, geometry shading performance, and parallax. With these four tests the effective 3D speed is calculated based on the frame rate per test. The site also allows you to compare graphics cards against each other to attain a better grasp on their performance difference.

According to Table 1, all the graphics cards would be under the 125 fps theoretical max data transfer speed of the system, once properly fixed. However, these numbers are a rough estimate as they are assuming that the code uses purely the integrated graphics to perform the rendering and that the new GPU would take over the entire workload. In addition, while the benchmarks used by the web service serve as a good reference point, they do not exactly match the work our system would put on the GPU.

GPU Conclusions

Overall, any of the listed graphics cards have the potential to make a significant performance boost. However, the main desktop does not have the spacial capacity to mount any of these graphics cards. We want to use the main desktop over the secondary due to it having better components across the board, including a better CPU and twice as much RAM. If possible we would like to either switch chassis or purchase a larger chassis. A backup of all important files would need to be created as a precaution if this plan is put into action. Both Alex and I have experience building computers so the process would not be too difficult, but nevertheless, it is still a risky idea. An external graphics card is also a feasible option, however we would like to refrain from taking this route as it would add another connection to the system, thus introducing the potential for loss. Another option is to build a new computer from the ground up. In this case, we could control every component that goes into the system, opting for higher or lower-end components based on the OCT system’s needs and aiming to obtain a high price-to-performance ratio. Alex and I planned out a quick Intel-based component list which would cost just over $1000 that includes an Intel i5-9400F, an RTX 2070 SUPER, 16GB of DDR4 RAM and a sufficient amount of power, storage, and space. This would not only provide us a significant GPU performance boost, but also a CPU boost. This is only a preliminary list and there is a potential for an AMD Ryzen alternative or refinement of the current list. If none of these options are approved, then we would have to use the secondary desktop.

It should be noted that another advantage of building a new desktop is the longevity of the system. With new-gen hardware, the PC system is built to last for years to come, without the need for upgrades for a long period. Future capstones that aim to build upon this work will not need to dedicate many resources towards improving the PC system and instead can focus upon other aspects.

Looking Ahead

This coming week’s focus is on diligent preparation for the upcoming midterm review with the LMU Seaver faculty. We have scheduled meetings with Dr. Aghari for another complete laboratory run-through and an in-depth explanation of the system. By the end of this week, we will be able to confidently perform our own experiment with the system as well as capably explain it in stages.

Outside of the review, the next steps we plan to take are further familiarization with the Python code and further research on GPU architecture, including NVIDIA’s software environment CUDA. Additionally, we will prepare alternative parts lists for potential PC builds, including a Zen 2 Ryzen based system. The Ryzen platform is more desirable due to higher boosted clock speeds and reduced power consumption. These extra processor cores compared to Intel’s Kaby and Coffee lake microarchitectures are better for multitasking. However, Intel’s cores are more powerful but come at a higher price point. Similarly, Ryzen is considered the best option for budget-oriented systems.

Week 4 Digging In

Recent Posts

Comments