Embedded vision has reached the critical point of explosion

Time:

2021-10-15

Views:

Not long ago, embedded vision has just achieved the first two milestones. The emergence of deep neural network has contributed to the milestone of technical feasibility, which has completely changed the tasks that vision can complete; Moore's law, market economics and structural innovation in specific fields contributed to the second milestone; The third milestone is troublesome - ease of use, and achieving it is a difficult problem.


When a technology achieves three milestones, we call it a critical point: first, it has the technical feasibility to complete important tasks; Secondly, its cost is low enough to achieve these tasks; Third and most importantly, it is also very easy for non professionals to build products with it. These milestones can be used as indicators to show that a technology is ready to turn from a single spark into a prairie fire. At this year's embedded vision summit, we saw clear evidence that embedded vision has reached its critical point.


Not long ago, embedded vision has just achieved the first two milestones. The emergence of deep neural network has contributed to the milestone of technical feasibility, which has completely changed the tasks that vision can complete. Therefore, it is possible to classify images or detect objects in chaotic real-world scenes. In some cases, its accuracy even exceeds that of human beings. To be sure, this is not easy, but it is feasible.


Moore's law, market economics and domain specific architectural innovation contributed to the second milestone. Now, you can buy a micro esp32-cam development board for only $4.99, equipped with a 240 MHz dual core processor and a 2-mp camera module with on-board image signal processor and JPEG encoder; This will greatly squeeze the living space of computer vision, but it is possible, and it is difficult to have a lower price. If the fund is surplus, the choice range will be much larger. For example, for $99, you can buy an NVIDIA Jetson nano developer kit with four core 1.4 GHz CPU, 128 core Maxwell GPU and 4 GB memory, which will be more than enough to complete some higher-end embedded visual processing.


Most importantly, new processors appear every month, with different prices, power and performance, and special architectures are usually used to improve the performance of computer vision and neural network reasoning tasks. For example, new products from Xilinx, cadence and Synaptics.


The third milestone is troublesome - ease of use, and achieving it is a difficult problem. Deep learning does fundamentally change the function of the visual system; But developers must become Superman, be able to design neural networks, collect the required data and train them, not to mention implementing it on embedded systems with limited resources. However, this situation has changed over the past few years for two main reasons.


Firstly, the wide availability of high-quality and well supported visual tools and libraries eliminates the need for developers to build embedded visual systems from scratch. The most famous of these are frameworks such as tensorflow or pytorch and libraries such as OpenCV. However, widely used special neural networks, such as yolov4 or Google perception, have changed the rules of the game. Most developers no longer design neural networks; On the contrary, they prefer to choose a free and ready-made neural network to train it for specific tasks. (of course, training neural networks requires data. Although the available open source data sets are increasing and the technology to increase or reduce the amount of data required is also developing, data collection is still a very challenging task according to different applications.)


These component libraries and tools may be bound to chip suppliers, such as NVIDIA's deepstream SDK, which simplifies the creation of video analysis pipelines. Although deepstream must be used with NVIDIA's Jetson processor, it is still the closest complete solution provided by the supplier (compared with 'providing only one chip'). BDTI and tryolabs have just developed a mask detection smart camera product using deepstream and yolov4.


Secondly, there are more and more tools dedicated to simplifying the design process of creating embedded vision and edge AI systems. For example, edge impulse, a tool that simplifies the development of embedded machine learning and vision systems. For example, the edge impulse platform can train and program the image recognition neural network for the above $4.99 esp32-cam processor; Similarly, for more powerful processors, Intel's devcloud for edge and openvino tools can make embedded visual applications easier to implement at the edge.


Back in the 1990s, wireless communication was still a 'new thing'. At first, it was a high-end magic that a group of RF magicians could achieve. But after it has crossed the critical point, anyone can now buy RF modules for only a few dollars to realize wireless communication in embedded products. In this process, it is no exaggeration to say that billions of wireless units are sold, and its economic impact is huge.


Nowadays, embedded vision is at a similar critical point. Let's look forward to and see the arrival of this moment.