Video Intelligence once considered the exclusive intel of humans has now taken a turn with the advances in Artificial intelligence algorithms and the increasing processing power of AI Gateways.
Intelligence and data-driven decisions based on video and camera are now of prime importance finding its way into applications like Smart Parking, Retail footfall analytics, Traffic Management, and security surveillance. Insights from video and images have the capability of providing you vast amounts of data – both for predictive analytics and historical analytics.
Complementing the infrastructure of multiple cameras present across buildings, airports, retail stores, and other zones, there is now a need for an intelligent gateway to collect the images at a high resolution with connectivity options while also being capable of taking the decision on the edge. With Corazon-AI, we present an efficient Multi-Channel – AI Video Analytics gateway. Through the design of an 8-channel Xilinx Video Codec Unit (VCU) + CNN inference deployed on Corazon-AI, the gateway serves as a low-power heterogeneous compute platform enabling edge computing.
Through the above video integrating our demo applications, we intend to demonstrate the capability and performance of the VCU (Video Codec Unit) available as a hard block IP. The AI Inference Engine and Deep Learning Processing Unit (DPU) are implemented in the PL (programmable logic) side of the device.
Video data from eight RTSP streams from the 8 Cameras are processed alongside high-speed deep Learning analytics performed on each video stream at the edge on Corazon-AI. Given below is an architecture of the video streaming and analytics architecture on Corazon-AI running different models on each of the cameras.
The demo video performing four different algorithms: person detection, object detection, face detection, and vehicle detection (ADAS). All of the AI operations are performed on the eight independent 1080p@30 RTSP video streams simultaneously with four different convolution neural networks running at a different resolution. The overall performance of the 8 channel AI video Analytics application is up to ~80fps on Corazon-AI.
FHD (1080p) IP cameras are used to capture the high-resolution and wider frames of the surveillance streams and the camera uses the Advanced Video Coding (AVC) H.264 standards to encode the video data and transmit over the network using the RTSP. The eight IP cameras are connected via Ethernet cables to a 10-port 1-G Ethernet switch and the encoded RTSP video streams are received onto Corazon-AI via a 1-G PS Ethernet port (RJ45) connected to the Ethernet switch.
The input video streams are decoded using the Xilinx Video Codec Unit (VCU) IP. Video scaling & pre/post-processing on the video is performed by using the software core of the Corazon-AI.
AI-Inference Engine on Corazon-AI – The Deep Learning Processing Unit (DPU) is a configurable computation engine optimized for the convolution neural networks, such as SSD, ResNet, YOLO, VGG, and FPN among others. The high-end 4096 single-core Deep Learning Processing Unit (DPU) implemented in the PL side of the FPGA SoC delivers around 1.2 TOPS of compute performance while running at 300 MHz.
The four different convolution neural networks detailed in Table 1.0 below, SSD, SSD_MOBILENET_V1, Dense Box, and YOLO-V3 are used from the mainstream frameworks such as Caffe, TensorFlow, and Darknet to apply the ML functions on the received video streams using the Xilinx Deep Learning Processing Unit IP and the final output streams are displayed on Display Port monitor.
RTSP Stream | CNN | Framework | Input Size(WxH) | CNN Workload (GPOS) | Application |
CAM1 | SSD | Caffe | 360×360 | 5.9 | Person detection |
CAM2 | |||||
CAM3 | SSD_MOBILENET_V1 | TensorFlow | 300×300 | 2.5 | Object detection |
CAM4 | |||||
CAM5 | Dense-box | Caffe | 320×320 | 0.49 | Face detection |
CAM6 | |||||
CAM7 | Yolo-v3 | Dark-Net | 512×256 | 5.5 | ADAS |
CAM8 |
Table 1.0: Neural Network supported
The Corazon-AI integrated with Xilinx Vitis AI Stack enables faster time to market while reducing complexity. The Xilinx AI Stack includes advanced pre-optimized deep learning models from mainstream frameworks such as Tensor-flow, Caffe, Darknet, and PyTorch.
The Xilinx Vitis AI Stack enables developers to accelerate the development flow of AI applications even without in depth-knowledge of FPGA and deep learning. The Stack support C++/python API’s which provides the programming flexibility to the developers.
Here, we demonstrated the capability of the Corazon AI platform to connect 8 IP cameras and how AI operations are performed on these video streams simultaneously with four different convolution neural networks running at a different resolution.
More information on Corazon AI can be found here.