Deep Learning Strategies for Multi-Camera Drone Vision Systems


Modern drones are becoming increasingly intelligent through the integration of deep learning and computer vision. In a recent study published in the IPSJ Transactions on Computer Vision and Applications, innovative methods for drone detection and tracking using multi-camera systems and advanced algorithms were presented.

Researchers have proposed a system consisting of a static wide-angle camera and a rotating turret with a high-resolution narrow-angle camera. This system can detect small objects at large distances and analyze them in detail using zoomable cameras. At the core of this technology is a modified YOLOv3 architecture optimized for fast and accurate detection.

The system presented in the study consists of several key components, each playing a crucial role in ensuring high accuracy and efficiency of drone detection and tracking.

Wide-angle camera

The wide-angle camera is mounted on a stationary platform with a 16mm focal length lens, providing a field of view of approximately 110°. This allows the camera to cover large areas and monitor over long distances. The camera can transmit images at a resolution of 2000 x 1700 pixels at a speed of approximately 25 frames per second. The wide field of view plays a critical role in initially detecting small objects such as drones on the horizon, enabling the system to quickly respond to new objects within its field of view.

Rotating turret with narrow-angle camera

The second camera in the system is mounted on a rotating turret, allowing it to change its field of view and track objects detected by the wide-angle camera. The narrow-angle camera is equipped with a 300mm focal length lens, providing a field of view of approximately 8.2° and the ability to zoom in more than 35 times. This camera is designed for detailed analysis of objects at long distances, allowing the system to accurately identify and track drones. The turret can quickly rotate and adjust the camera angle to capture high-quality images of target objects.

Main computational unit based on Linux

The central element of the system is the main computational unit, which is a Linux-based computer equipped with an NVIDIA graphics processor. This unit processes images captured by the cameras and executes deep learning algorithms on the graphics processor, which is the NVIDIA Geforce K620 with 2GB of memory. Using GPU allows the system to process large volumes of data in real-time and ensures high performance in executing complex computational tasks. Deep learning algorithms, such as YOLOv3, have been modified and optimized to work within this system, achieving high detection accuracy and speed.

These components work in close integration to ensure high reliability and efficiency of the drone detection and tracking system. The interaction between the wide-angle and narrow-angle cameras, along with the powerful computational unit, enables the system to quickly and accurately respond to the appearance of drones within its field of view, ensuring a high level of security and control.

For drone detection, the system utilizes a modified version of YOLOv3, which efficiently processes images and detects small objects. Unlike standard approaches, this method uses a regression model for rapid object localization in images. The modification of the YOLOv3 architecture included reducing the number of filters while maintaining the number of layers, optimizing the system’s operation on limited GPU resources.

This innovative system ensures high accuracy and speed of detection, making it ideal for use in security and surveillance tasks. It is capable of detecting drones on the horizon, tracking their movements, and, if necessary, analyzing them using the narrow-angle camera.

The research demonstrates the potential of deep learning in enhancing the capabilities of drones to perform complex real-time tasks, including surveillance, security, and rescue operations.