This is the 2nd piece in our series of FPGA use cases. This article talks about the approaches to mapping and the potential of combining mapping ability with autonomy.
Maps are used everywhere, from navigation to city planning. Many of us use Google Maps on a daily basis. Much research has gone into the advancement of mapping techniques, but even Google hasn’t mapped out the entire world yet.
Digging deeper, mapping is also essential for industries like construction. Diagrams and blueprints are crucial for assessing building structures, detecting flaws and mapping underground networks. Traditional 3D surveys are conducted using static lasers, where the scanner is fixed on a tripod and set up at various areas for scanning - time-consuming and manual work.
We’ll look deeper into the various components involved in 3D indoor scanning, and how the industry stands to benefit from technological improvements.
Typical mapping sensors
The most widely researched laser sensor for mapping is light detection and ranging (LiDAR). In a LiDAR system, light is emitted from a rapidly firing laser. This light reflects off the surrounding objects and is recorded when it returns to the LiDAR sensor. This highly iterative process is repeated tens of thousands of times a second to create a digital “3D point cloud” of the environment. The data in this 3D point cloud is converted into a 3D map.
Vision-based sensors have gained in popularity due to its distinct ability to obtain large quantities of information of the surrounding at a lower cost relative to laser sensors. Additionally, the availability of images allows for additional high-level tasks such as people recognition and identification, an essential benefit for the smart city movement around the world.
Some of the most widely researched vision sensors for the purposes of mapping are stereo vision systems and RGB-D cameras.
Stereo vision system
A stereo vision system is a passive range sensing approach, utilising two or more cameras to capture images of an object from different viewpoints. This allows the camera to perceive depth by using stereo disparity, similar to how humans perceive depth by using the different viewpoints of the left and right eyes.
These images are then combined and reconstructed to generate an accurate 3D map. The technique for this involves solving the stereo disparity and matching them. Although the calculations are relatively straight forward, a practical system needs to evaluate at least 10^9 disparities every second - beyond the capability of a single CPU processor.
Stereo algorithms have high degrees of inherent parallelism and can thus be practically implemented on FPGAs.
RGB-D Camera RGB-D cameras have gained popularity as a mapping sensor due to its lower cost and ability to capture per pixel color and depth images at adequate resolutions. RGB-D cameras are also a passive range sensing system, utilizing infrared light patterns. Depth is estimated by sensing the amount of distortion in the light pattern.
The process of RGB-D cameras can be summarized into 5 steps:
RGB camera captures images while depth sensors project infrared light patterns to the surroundings
Depth map preprocessing to reduce noise on data collected by the depth sensors
Camera pose estimation with respect to the depth data
Fusion and reconstruction of 3D map from the data collected
The colouration of the 3D map using RGB images collected
It is important to note that RGB-D cameras often suffer from very specific noise characteristics and challenging data distortions, so the preprocessing and noise reduction processes are an important part of the process. These are also roles that FPGAs are suitable for.
Radar works on the principle of reflection: Radio waves are emitted in specific directions and the echoes and reflections are monitored. A 3D map can be constructed based on the varying angles of the return of the radio waves. Radar’s characteristics lend it to be a complementary add-on to LiDAR. It has a shorter range, but is cost-efficient and can penetrate mediums that LiDAR has issues with, such as dusty or foggy environments.
Adding navigation to the mix
These sensors we just described fulfill half of a SLAM system. As mentioned in this post, SLAM simultaneously localises a sensor with respect to its surroundings, while mapping the structure of the environment. It is a key system for both 3D mapping as well as autonomous navigation.
Because 3D mapping systems already utilize SLAM, there are obvious synergies if the system has enough capacity to implement autonomous navigation as well. The main consideration is processing power, which is the constraint.
Processing power is required for several aspects:
Hosting the AI on edge for autonomous navigation
Fusing data from various sensors
Immediately processing the fused data for SLAM
Utilizing SLAM output for navigation and generating 3D maps with localization.
FPGAs bringing it all together
The most streamlined approach would be to have 1 FPGA handle all components of mapping, localization, and navigation. The flexibility of FPGAs means it can act as the brain for an autonomous mapping device, fulfilling the 4 processing requirements listed above. Multiple studies have shown FPGAs to be the best option for hardware due to its efficiency in both computational power and energy consumption.
Beyond acting as the brain, other significant improvements can be attempted:
The presence of FPGAs enable the incorporation of sensors with higher precision and accuracy into the system (eg. 4K camera)
Improved system robustness as more complementary sensors can be added, compensating for individual sensor weaknesses, while not compromising on the time required for mapping.
FPGAs can also drastically improve processing time for 3D map generation.
A successful end result would see a highly efficient, autonomous device outfitted with a variety of high precision sensors, capable of mapping internal spaces in 3D on the go.