1. Introduction
The agricultural economy in Japan is facing a shrinking workforce due to decreasing population and fewer farm successors. Hence, the efficiency of farm operations needs to be improved for sustaining the agricultural economy. One potential key solution to this issue is to automate farm operations by using robotic farm machines [1]. The latest commercial robotic farm machines in Japan are able to automatically perform operations such as tillage and fertilization without a human driver, but only when they are operated in sight of a farm operator [2]. To further improve farm operation efficiency, complete unmanned driving of farm machines needs to be achieved by remote monitoring and emergency control, even out of the operator’s sight, not only inside farm fields but also on roads [3]. In this article, we present major issues to be solved to enable complete autonomous driving of farm machines by remote monitoring and control, technologies we have developed to solve those issues, including an elemental technology of IOWN (Innovative Optical and Wireless Network) [4], and results of experiments conducted in actual field environments.
IOWN is an initiative for future networks and information processing infrastructure that can provide high-speed, high-capacity communication utilizing innovating technology focused on optics and tremendous computational resources. IOWN consists of three major technical fields: APN (All-Photonics Network), DTC (Digital Twin Computing) and CF (Cognitive Foundation). An element technology of CF for optimizing wireless access is integrated in our experiment.
2. Related Works and Issues in Remote Monitoring and Control of Unmanned Farm Machines
Complete autonomous driving of unmanned farm machines requires wireless communication networks for monitoring and controlling them from remote sites, and they should also be able to drive autonomously in a wide area and various environments including public roads, windbreaks, or buildings. Considering those points of complete autonomous driving of unmanned farm machines, related works and major remaining issues are identified in the following sections.
2.1 Seamless Network Switching
Since complete autonomous driving of unmanned farm machines is intended to be done out of the sight of human operators, the surrounding situations of operating farm machines need to be monitored at a remote site through live videos streamed from them, and emergency operations need to be taken when necessary, e.g., to stop a machine when it is in danger of colliding with an obstacle. All of that data is transmitted via a wireless communication network such as 4G, 5G, and Local 5G. However, a single wireless network cannot always cover the entire area of autonomous driving with enough quality for remote monitoring and control. The radio signal might be weakened by buildings or windbreaks, or an area might simply be outside of the coverage of a specific wireless network.
One solution for those issues is a multipath approach such as MPTCP (Multipath Transmission Control Protocol) [5] and SCTP (Stream Control Transmission Protocol) [6]. However, those technologies have a trade-off between availability of communication and efficiency of network resource usage. High availability can be ensured by using multiple network paths simultaneously for data transmission, but it consumes network resources excessively. On the other hand, although network resource usage can be mitigated by switching the network path when the current path is detected to be down, such reactive redundancy is not necessarily enough to prevent a significant communication error for mission critical use cases such as complete autonomous driving of unmanned vehicles. Therefore, a mechanism is needed to ensure the availability of a communication network for smooth and stable remote monitoring and control by proactively and seamlessly switching the network to use.
2.2 Multi-Layer Safety System
Unmanned farm machines are supported to automatically run both inside and outside of farm fields for complete autonomous driving. Areas outside of farm fields present more risks of collisions, as there are pedestrians, other vehicles, buildings, and other kind of obstacles. Therefore, unmanned farm machines require a safety system that ensures a higher level of safety for complete autonomous driving.
The latest autonomous farm machines are equipped with a tape sensor for contact detection, as well as a 2D scanning sensor for contactless obstacle detection [7]. Additionally, instead of operators directly monitoring the machines in their sight, remote operators can observe the situations of the machines in remote sites by watching the video streamed from the machines via communication networks and take emergency actions when necessary.
However, those safety mechanisms alone are not enough for complete autonomous driving. Vehicle-side sensors are not always able to detect all obstacles due to their range limitations or malfunctions. Remote operators may overlook approaches to obstacles especially when one operator is responsible for monitoring multiple autonomous farm machines at the same time. Therefore, an extra mechanism of obstacle detection is desired for ensuring a higher level of safety. Moreover, the communication network may not always be able to provide enough quality for remote monitoring due to congestion or other situations. In that case, the vehicle-side safety mechanism becomes the last resort for ensuring safety. However, it is not adequate for the reason mentioned above. Therefore, an additional safety mechanism is required that can take emergency actions in the case of quality degradation of communication networks for remote monitoring.
2.3 Redundant Positioning System
Autonomous driving of farm machines requires a precise positioning system so they can run straight without trampling crops when operating using the available path. This path can be traversed by using navigation sensors such as GNSS (Global Navigation Satellite System) & IMU (Inertial Measurement Unit) and a navigation map [7].
However, the main problem associated with a navigation sensor like the GNSS is that the RTK (Real-Time Kinematics) correction signal might not be available at all geographical locations. In addition, several sources of errors (such as ionospheric delays and signal noise) might affect the accuracy and cause measurement time delays as well. Due to these errors, the autonomous tractor can get lost on its path or go in the wrong direction. To avoid these problems, a system is required that can detect the road ahead and navigate the autonomous tractor in the absence of RTK-GNSS signals.
3. System Design
To address those social issues, NTT has been studying the Cooperative Infrastructure Platform technology, which coordinates various elemental technologies of devices, networks, and computing to deliver the added values through the communication infrastructure based on the IOWN. In this experiment, some component technologies of the Cooperative Infrastructure Platform technology have been integrated with advanced technologies of vehicle robotics and machine vision for vehicle control by Hokkaido University to form a unified architecture of complete autonomous driving platform of farm machines. Figure 1 shows a diagram of the system we have developed. Gray functions are the technologies we have introduced as the major component technologies in that architecture.
The Wireless Network Quality Prediction Function (described in Sect. 4.1) and the E2E (End-to-End) Overlay Network Function (Sect. 4.2) are essential components of the Cooperative Infrastructure Platform technology, which realizes the cooperation between different access networks. They are combined to seamlessly switch networks for stable remote monitoring and control utilizing multiple wireless networks. The Wireless Network Quality Prediction Function predicts qualities of multiple networks based on machine learning technologies. Predicted network qualities are used to determine optimal network to be used along with self-driving of the farm machines. When optimal network change is decided, it is notified to the E2E Overlay Network Function, which establishes overlay network among different communication networks such as 5G/4G, local 5G and regional BWA (Broadband Wireless Access) so that applications for remote monitoring and control do not need to care which network is used actually. The E2E Overlay Network then, on the receipt of that notification, switches the path of data transmission on the overlay network seamlessly. This combination enables stable remote monitoring and control even when the vehicle needs to go through some area where the transition between different networks is necessary.
The Human Detection Function (Sect. 4.3) and the Network Cooperated Vehicle Control Function (Sect. 4.5) work with the Autonomous Tractor Control Function (Sect. 4.6) as additional safety mechanisms to the existing in-vehicle obstacle sensors and remote monitoring by remote operators. The Human Detection Function receives video stream from the vehicle to detect human in front of the vehicle, calculates the distance between the vehicle and the detected human, and tells the vehicle what action it needs to take based on the estimated distance. In addition, the Network Cooperated Vehicle Control Function, which is a component of the Cooperative Infrastructure Platform technology, is introduced to realize the cooperation between device and network. It continuously monitors current network quality, and when it detects that remote monitoring is not working properly due to the quality degradation, it also notifies emergency stop to the vehicle, so that the safety can be ensured even when remote monitoring is temporarily unavailable because of the network quality deterioration. Thus, those functions consist of multi-layer safety system of complete autonomous farm machine.
The Road Detection Function (Sect. 4.4) provides precise positioning information to the Autonomous Tractor Control Function even when RTK-GNSS is not available. It detects the edges and the surface of the road by analyzing the video from the camera of the vehicle, and estimates the lateral error of the vehicle against the center of the road detected in the images. That estimated lateral error is input to the vehicle and used in the autonomous navigation system.
The Autonomous Tractor Control System navigates the vehicle automatically based on pre-configured way points, input from the sensors such as IMU and GNSS, notifications from safety systems described above and operations from the remote site. It plays the key role in the complete autonomous driving of the farm machines, and it also realizes the devices-computing cooperation with both the Human Detection and the Road Detection Functions.
4. Technologies
4.1 Wireless Network Quality Prediction
The quality of wireless communication changes fluidly due to the effect of the radio wave environment and network congestion. Especially in wireless accesses that use high frequency bands, the NLOS (Non-Line-of-Sight) condition affects the communication quality significantly. Therefore, the communication quality may suddenly deteriorate due to the shadowing. To address these issues, one workaround is to prepare multiple wireless accesses and switch to a more stable wireless access before the quality deteriorates. Other workarounds in the autonomous driving tractor use case are to reduce the video transmission rate before the quality deteriorates and reducing the driving speed to ensure safety. To achieve them, it is considered effective to predict wireless communication quality and achieve proactive control.
Cradio\(^{\text{®}}\)[8], a multi-wireless access proactive control technology, is an element of IOWN and the Cooperative Infrastructure Platform technology, developed by NTT access service systems laboratories, that grasps and visualizes the quality of wireless access, predicts the quality, and dynamically controls wireless networks. It is a technology group and can be used for stably using wireless accesses in various use cases, such as remote monitoring and control of an autonomous vehicle. In this experiment, we applied a wireless quality prediction technology [9], [10] of Cradio\(^{\text{®}}\) to the autonomous driving of an unmanned tractor.
Figure 2 shows the overview of the wireless quality prediction technology. In the technology, past communication quality information is accumulated in a wireless quality prediction engine arranged on the network, and the predicted value of the wireless communication quality at the position of the terminal is calculated by machine learning. The predicted value is notified to the terminal when the terminal inquires. By using the predicted value, the terminal is able to detect the position of deterioration of the wireless communication quality and deal with the deterioration in advance, such as by selecting optimal wireless access.
Figure 3 shows the architecture of the wireless quality prediction technology used in the experiment. The technology mainly consists of a quality prediction engine and terminal software. The quality prediction engine is software that runs on a general server device and consists of a DB (Database) that stores actual data and a ML (Machine Learning unit) that performs learning by machine learning and calculates estimated values of wireless quality and communication quality. The terminal software runs on the Linux OS (Operating System) of a PC and has a function to measure communication quality, a function for communication with a quality prediction engine, and a function to select the optimal access in accordance with the predicted value. Additionally, the terminal software has functions to communicate with other devices (GNSS, Network Cooperated Vehicle Control and E2E Overlay Network). In this experiment, the PC of the terminal software does not have the wireless communication I/F directly in the housing, and it connects to media converters for each wireless access standard via Ethernet to communicate with wireless base stations.
To predict the wireless quality, the estimated position of the autonomous vehicle is needed. Figure 4 depicts how Cradio\(^{\text{®}}\) predicts possible areas to which the vehicle may go. On the basis of the current and previous location data \(\mathrm{p}_{\mathrm{t}}\) and \(\mathrm{p}_{\mathrm{t}-1}\), which are obtained from GNSS output, it derives velocity \(\mathrm{v}_{\mathrm{t}}\). Direction is calculated by least squares approximation of location data during past \(\mathrm{T}_{\text{dir}}\) seconds. In this experiment, we set 5 seconds as \(\mathrm{T}_{\text{dir}}\). \(\theta\) is the pre-configured angle, which is 90 degrees in this experiment. Radius \(\mathrm{d}_{\mathrm{t}}\) is defined as \(\mathrm{v}_{\mathrm{t}}\cdot \mathrm{T}_{\text{pred}}\), where \(\mathrm{T}_{\text{pred}}\) is the period of prediction.
Prior to the operation, wireless and network quality data are measured as the training dataset along the self-driving area of the autonomous vehicle. Within that area, \(\mathrm{N}_{\text{dat}}\) points of locations are randomly selected as the sampling points. Then, during the operation, those sampling points are filtered by the fan-shaped area described above, and filtered points are used to obtain predicted quality data. In this experiment, predicted throughput of each point is compared with the pre-determined threshold, and the proportion of the points whose predicted throughput is higher than the threshold is derived. The same process is simultaneously performed for multiple wireless accesses. Cradio\(^{\text{®}}\) then finally chooses the network to be used among the networks whose proportions of predicted throughput higher than the threshold exceed pre-determined value, in accordance with the pre-determined priorities. In this experiment, we set higher priority on private networks such as regional BWA or Wireless LAN (Local Area Network) than public networks such as carrier 5G/4G.
4.2 End-to-End Overlay Network
E2E Overlay Network is a component of the Cooperative Infrastructure Platform technology to realize the cooperation between multiple networks. It consists of the user plane based on Segment Routing IPv6 (SRv6) and the control plane similar to LISP (Locator/ID Separation Protocol). In SRv6, paths can be controlled by source routing using path information stored and handled in the device/terminal side. That can reduce the amount of state information that needs to be stored in the network, and it consequently brings higher scalability. In addition, a LISP-based control plane that can handle locations of devices/terminals at the network side enables them to transit among multiple access networks.
Figure 5 shows the architecture of the E2E Overlay Network. CPE (Customer Premises Equipment) is an endpoint that terminates an overlay communication, and a SRv6-based VPN (Virtual Private Network) is established between CPEs. E2E Overlay Network treats LAN address spaces of CPEs as Endpoint ID (EID) space, as well as the WAN (Wide Area Network) address of CPE as a RLoC (Routing Locator). EID space is a unique IP (Internet Protocol) address space allocated to each CPE. CPE encapsulates outgoing packets from EID space and transfers them to the target CPE, while it decapsulates incoming ones toward its own EID space. CPE registers its RLoC and own EID to the controller when connected to the network. When CPE needs to communicate with a new target CPE, it obtains the RLoC of the target CPE from the controller by the target EID. When the network is configured with RLoC alterations, new bindings of EID and RLoC are updated in the controller, and each CPE is also notified with those new RLoCs. This feature enables transit of CPEs among different networks.
CPE can have multiple WAN interfaces and connect to those access networks simultaneously. In this experiment, CPE is designed to have a SRv6 configuration to each WAN interface and use one as the primary interface for the overlay data communication but not use the others (which are regarded as backup interfaces). When the primary interface is disconnected from the network, CPE continues the communication by using one of the backup interfaces as the new primary one. Priorities of those interfaces are determined in accordance with the pre-configured policy in CPE. This feature enables CPEs to establish an overlay network with active/standby network redundancy. Figure 6 shows an overview of this multi-interface feature from both underlay and overlay network perspectives.
In addition, CPE can alter the priorities of WAN interfaces in accordance with the instruction from external functions. After receiving an interface switch instruction between the primary interface and a backup one, CPE starts to use the latter as the new primary interface, and it also notifies facing CPEs of the new primary one so that both directions of communications between CPE pairs can be changed. In this experiment, the E2E Overlay Network is designed to be instructed to switch the network by Cradio\(^{\text{®}}\) (see Sect. 4.1) on the basis of its quality prediction to seamlessly switch networks before communication quality deteriorates.
4.3 Machine Vision for Safety (Human Detection)
We introduce machine vision technology as an extra obstacle detection mechanism. A camera is attached to the tractor and streams the live video that captures the front view of the tractor via wireless network to the remote site. The Human Detection Function receives the video stream and analyzes the contained image frames. When a human is detected in the frame, his/her distance from the tractor is calculated, and the Human Detection Function notifies the Autonomous Tractor Control Function whether or not it needs to take actions to ensure safety on the basis of the distance to detected human.
The safety system defines three ranges of safety: danger, warning, and safe. The closer to the tractor, the higher the danger of the operation status. Figure 7 shows an overview of the three ranges of the redundant safety system.
For each of the three ranges of safety, there is a specific operation status. For danger, the tractor stops until operation is resumed from the remote monitoring system. For warning the traveling speed should be reduced to 0.5 m/s (1.8 km/h). For safety, the tractor operates normally in accordance with the traveling speed setpoint. During automatic operation, the three ranges of safety are displayed in the three-state patlite by a specific color: the pink light is danger, the green light is warning, and the blue light is safe. When the operation status is danger, in addition to the patlite pink light, the tractor also sounds its horn as an alarm. The distances for both the laser scanner and the machine vision systems are obtained by a calibration process. The distances of these three safety ranges are selected in accordance with the tractor’s typical traveling speeds during agricultural labor, which range between 4 and 6 km/h. The distances of these three ranges are not fixed and can be changed if necessary. The three ranges of safety defined in the system are totally independent for both sensing components. For example, in Fig. 7, the laser’s maximum detection range is 5 m, but the danger status is selected as 4 m, which is smaller than the 8 m range from the machine vision.
Figure 8 shows the interactions between related components of our system. Video data is transmitted using WebRTC from the tractor to the Human Detection Function on the remote site, and the safety index determined in accordance with the result of human detection is returned also using WebRTC to the Autonomous Tractor Control Function. Values of the safety index are determined by the estimated distance:
- distance \(\geq\) 12 m: 1 (safe)
- 12 m \(>\) distance \(\geq\) 8 m: 2 (warning)
- 8 m \(>\) distance: 3 (stop)
For the human detection process, we have used YOLOv3 [11]. The training data was created from open-source datasets such as the Common Objects in Context (COCO) dataset [12], the KITTI dataset [13], the PASCAL Visual Object Classes (PASCAL VOC) dataset [14], and nuScenes dataset [15]. In addition, camera images taken during experiments at the Field Science Center for Northern Biosphere of Hokkaido University with a tractor used in the experiment were also used to create additional training data.
Once a human is detected in YOLOv3, the next step is to calculate the distance from the tractor to the human. Since there is a similar relationship between the image and the actual space, the distance between the camera and the target can be obtained by comparing the distance in the image in pixels with the actual distance in meters from the tractor. This relationship can be obtained by using a printed chessboard and OpenCV functions. The resulting pixel-distance relation enables coordinates to be transformed between the image plane and the ground plane. The calibration was verified by standing at an arbitrary point in front of the tractor with the camera fixed, measuring the horizontal distance to the tractor with a tape measure, and checking the measured value with the image position result. This measurement was performed 10 times for each of the three ranges of safety during calibration, and the average value for each range was adopted.
4.4 Machine Vision for Positioning (Road Detection)
As a redundant positioning mechanism for autonomous driving of farm machines in addition to RTK-GNSS, we have introduced machine vision technology to detect the road where the machine is running. Figure 9 shows how it works. Video captured by the camera attached to the tractor is received by the Road Detection Function running on the computer, and it analyzes the image frames to detect edges of the road, estimates the center line of the road, and calculates the lateral error of the tractor against the estimated center line. Calculated lateral errors are then fed to the Autonomous Tractor Control Function, so it can use the positioning information even when RTK-GNSS is not able to provide it.
For road detection process, we use several OpenCV functions. Since the camera we use has a fisheye lens, camera intrinsic parameters acquired through calibration using the chessboard are used to obtain the undistorted images from the original distorted ones. Then, filtering processes for smoothing, color conversion, and morphological transformation are applied to get the binary images showing the right and left edges of the road respectively. Those images are combined using the truth table to obtain the final filtered image. The same process is applied with different threshold values to obtain the binary image showing the road surface. Then, the filtered images are warped to obtain a bird’s eye view on the region of the road, shown in the left of Fig. 10, using a related OpenCV function.
After that, a sliding window method is applied to the warped image, which in turn returns two result images. In the sliding window method, a window of a specified length Len, moves over the image frame data, sample by sample. These windows are represented by the light green rectangles shown in Fig. 10 (center). The statistic is computed over the data in each window. The output for each input sample is the statistic over the window of the current sample and the previous Len–1 sample. For this method, the OpenCV, the NumPy, and the Pandas libraries are used. There are nine sliding windows along the height (720 pixels) of one image frame. Each window searches for the non-zero pixels inside that window. Please remember that the binary image assigns zero to the pixels in black and one to the pixels in white. These non-zero pixel values are stored in an array, and then a second-degree polynomial is fit on these pixel values. This second-degree polynomial gives the left and right edges of the road as well as further help in determining the center of the road. Figure 10 (center) shows the detection results for both the road edges and the road surface. Stabilization of this method is shown in Fig. 10 (right). The sliding windows create a region of interest inside the frame. If non-zero pixels are continuously found inside the region of interest, the sliding window method’s result will be stabilized.
After the second-degree polynomials are obtained from the warped images, the coordinates in the polynomial are converted into real-world coordinates by inverting the warping. Then the final result is overlapped with the image shown in Fig. 11 (right); this is the image without any camera distortion. In Fig. 11 (left), the yellow polygon shows the road surface detected in the region of interest with respect to the left and right edges. The red polyline shows the center of the road. In other words, the left and right edges of the road and the detected road area results are combined to calculate the lateral error. Figure 11 (right) represents the calculation of lateral error, which is done by the correlation between the pixel and real-world distances obtained through the calibration. The left edge is represented by the blue area, the right edge is represented by the red area, and the detected center of the road is represented by the red line. The dashed green line represents the actual center of the image frame. The distance between the dashed green line and the red line represents the lateral error in pixel values. It is then converted into real-world distances and can be transmitted to the Autonomous Tractor Control Function.
4.5 Network Cooperated Vehicle Control
As mentioned in Sect. 2.2, a safety mechanism that can prevent autonomous farm machines from operating self-driving when remote monitoring is not working. In this section, Network Cooperated Vehicle Control technology, developed by NTT Network Technology Laboratories, is presented as a solution to address that issue. It is a component of the Cooperative Infrastructure Platform for network-devices cooperation.
It serves as a coordinator between communication networks and autonomous vehicles such as unmanned farm machines. It works with external network quality measurement functions to continuously observe the communication quality and coordinates the behavior of the vehicle to be optimal to the communication quality [16]. In this experiment, it was designed to interact with the E2E Overlay Network Function to obtain packet loss measurement information, and to determine whether the autonomous tractor can continue self-driving or not on the basis of observed packet losses. Once it considers the communication quality is not good enough for remote monitoring, it then tells the Autonomous Tractor Control Function that emergency stop is necessary since the remote monitoring may not be working. When it detects the recovery of the communication quality, it notifies the Autonomous Tractor Control Function that it can resume self-driving.
Figure 12 shows the relationship between obtained packet loss information and stop/resume instructions to the vehicle. Thresholds of packet loss to determine stop/resume of the vehicle are configured in advance. While measured packet losses are continuously reported from the E2E Overlay Network Function to the Network Cooperated Vehicle Control Function, it counts the number of reports whose losses exceeded the upper threshold (for stop) during the determination period, which is configured in advance too. If the number of reports exceeding the upper threshold matches the pre-configured condition, then Network Cooperated Vehicle Control Function regards that packet losses might affect remote monitoring and that emergency stop is necessary. Similarly, when the number of reports whose packet loss falls below the lower threshold matches the condition in the determination period, then it considers that communication quality recovered and is ready to resume the self-driving. The determination period is introduced to restrain flapping behavior due to instantaneous occurrence and dissipation of packet losses. In this experiment, we designed the emergency stop to be determined when all reports of packet loss exceed the upper threshold and resume to be determined when at least half of the reports of packet loss are below the lower threshold.
Table 1 shows the relationships between the distances to the obstacle (i.e., a human in our experiment) and multi-layered safety mechanisms we have introduced in this experiment. When communication network quality degrades severely, the remote operator and human detection may not work properly because they rely on the video streamed from the vehicle via the network, and it is the most dangerous when the obstacle is between 8 and 4 meters in front of the vehicle as there is no safety mechanism to cover that range except for the Network Cooperated Vehicle Control Function. Therefore, the Network Cooperated Vehicle Control Function is demanded to perform its emergency stop determination quickly enough so that the vehicle can stop within that range, i.e., 4 meters. Considering braking distance, the safe-side emergency stop range is set to 3 meters.
Therefore, as maximum velocity of the vehicle is 1.75 m/s (6.3 km/h) in this experiment, the upper period of the emergency stop operation triggered by the Network Cooperated Vehicle Control Function is 3 m/1.75 m/s = 1.71 s.
Since the Network Cooperated Vehicle Control Function has a determination period and processing time, a free run distance exists between communication quality degrading and the vehicle starting to brake. From the viewpoint of the Network Cooperated Vehicle Control Function, the following periods determine the free run distance:
- receipt of packet loss measurement report from the E2E Overlay Network Function
- determination of emergency stop on the basis of the reports received in the determination period
- internal process for sending stop signal to Autonomous Tractor Control Function
- stop signal transmission to Autonomous Tractor Control Function
- interval of Autonomous Tractor Control Function to send stop command via CAN-BUS to the tractor ECU (Electronic Control Unit)
In this experiment, the interaction cycle between the Network Cooperated Vehicle Control Function and Autonomous Tractor Control Function is 5 Hz, thus the maximum period of item 4 above is 200 msec. Similarly, the maximum period of item 5 above is 100 msec as the Autonomous Tractor Control Function is implemented to work in 10-Hz cycles. Therefore, the Network Cooperated Vehicle Control Function is required to complete its process from 1 to 3 above within \(1.71 - 0.2 - 0.1 = 1.41\) seconds.
4.6 Autonomous Tractor Control System
We have incorporated the autonomous tractor navigation system of Hokkaido University [7] as the fundamental autonomous navigation system. In this experiment, we also introduced two major extensions to the system.
First, we developed a vehicle speed control mechanism on the basis of the engine speed in combination with the engaged gear, in accordance with the CAN interface of the tractor we used for our experiment. During manual operation, the ECU of the tractor we have used takes the engine speed (rpm) in combination with the engaged gear (ranging from 1 to 8) to produce the tractor’s traveling speed. The reason for this specification is that, for tractors performing agricultural labor, the engine speed is more important than the traveling speed because the engine speed and power directly affect the agricultural machinery used. However, the performance of the tractor can be assessed by developing an automatic speed control system. To be compatible with the automatic navigation system developed by Hokkaido University, this automatic speed control system takes the engine speed in combination with the desired speed set point in km/h to estimate the gear required to produce the tractor running speed. The gear is estimated on the basis of look-up tables provided by the manufacturer. These look-up tables contain the relationship between the engine speed, the gear, and the resulting tractor running speed.
Figure 13 shows a schematic comparison between the manual speed control and the automatic speed control system developed for the tractor we used. The top block in the figure represents the manual speed control, which is very simple and similar to most of those in conventional passenger cars. The bottom block in the figure represents the automatic speed control, which calculates the required gear to achieve a tractor running speed as close as possible to the speed setpoint for a given engine speed.
Second, we have modularized the interaction mechanism of the autonomous navigation system with the external systems such as remote-control applications, the Human Detection Function, and the Network Cooperated Vehicle Control Function integrated in this experiment. A previous version of the autonomous navigation system was designed to work only with a specific external system, which is either a remote desktop or GeoMation, which is a remote monitoring and control application provided by Hitachi Solutions [17]. Therefore, we have introduced proxy or abstraction mechanisms internally, as shown in Fig. 14. AutoRun in the bottom left is an autonomous navigation program to actually control the vehicle. AutoRunHub.exe is an implementation of the newly introduced external interaction framework, which mainly consists of Tractor Controller, Channel Interface, and the core AutoRunHub module. In this experiment, we implemented it as a program run on Microsoft Windows. Tractor Controller serves as a proxy of the actual autonomous vehicle, which enables other modules to easily access the vehicle. The core AutoRunHub module is the central component of the framework, which is responsible for transmitting the vehicle information to the external systems via Channel Interface and also sending appropriate signals to the vehicle in accordance with the commands received from the external systems. Channel Interface is the abstraction layer that hides the differences in actual interfaces, protocols, and data formats for interactions with the external systems from AutoRunHub.
AutoRunHub is also responsible for determining the appropriate signal to be sent to the vehicle considering the commands from multiple external systems. It has three types of operation signal: Neutral, Stop, and Go. Normally external systems send Neutral commands to AutoRunHub, and AutoRunHub does not intervene in autonomous navigation performed by AutoRun. When an external system detects danger, it notifies AutoRunHub with a Stop signal, and then AutoRunHub stops the vehicle regardless of signals from other external systems. When all external systems turn to send Neutral signals to AutoRunHub again, the vehicle is considered ready to resume autonomous driving, but it does not until a Go signal is delivered from at least one specific external system operated by human operators.
5. Experiment
We conducted experiments for the technologies presented above, using an actual tractor and commercial communication networks in the two field environments. Experiment setups and results are presented in the following sections.
5.1 Experiment Setups
For our experiments, we used the following fields, equipment, services, and sites. Table 2 shows the combinations of those experiment setups and the evaluated technologies.
- Field:
- Field Science Center for Northern Biosphere of Hokkaido University (see Fig. 15)
- Farm road near Nishiyauchi Farm in Iwamizawa-shi (see Fig. 16)
- Tractor:
- Kubota MR1000A-PC-A [18]
- Camera:
- Kodak PixPro 4KVR360 [19]
- Axis P1378-LE [20]
- Network:
- ipsim prepaid 4G SIM (MVNO of NTT Docomo mobile network) [21] with NEC Aterm MR05LN [22] as a mobile router
- WLAN network connecting to network A
- NTT Docomo 5G mobile network service with SH-52A [23] as a mobile router
- Iwamizawa-shi regional BWA network service [24] with Cathay Tri-Tech CTL-201JC [25] as a mobile router
- Remote site:
- NTT Musashino R&D Center (3-9-11 Midori-cho, Musashino-shi, Tokyo, 180-8585 Japan)
- Iwamizawa-shi New Industry Support Center (1-29 Ariake-cho Minami, Iwamizawa-shi, Hokkaido, 068-0034 Japan)
- Research Faculty of Agriculture, Hokkaido University, Nishi 9-chome, Kita 9-jou, Kita-ku, Sapporo-shi, Hokkaido, 060-8589 Japan)
Remote sites A and C were connected to the tractor-side systems via the Internet. On the other hand, closed network connections were used for remote site B: NTT Docomo access premium service [26] for network C (5G/4G), and direct connection from the radio stations to the building of remote site B for network D. This is because the core network equipment is located in the same building. Combinations of networks A & B and C & D are used mainly for examining seamless network switching by the Wireless Network Quality Prediction and E2E Overlay Network.
5.2 Results
5.2.1 Wireless Network Quality Prediction
The experiment of optimal network selection using wireless quality prediction technology was conducted at two locations: Hokkaido University and Iwamizawa City. As described in Sect. 4.1, the wireless quality prediction technology estimates communication quality such as throughput from past actual measured data using machine learning. In this demonstration experiment, the technology estimated throughput of wireless accesses by a neural network using latitude and longitude information as feature values. The neural network was trained by throughput measured and collected in advance. Besides, when the autonomous deriving tractor ran in the demonstration experiment, the future running position of the tractor was estimated from the location information and speed by the terminal software, and the estimated wireless communication quality at that location was obtained from the wireless quality prediction engine for each base station.
In the experiment in the Hokkaido University environment, a network selection test was conducted between the wireless LAN installed on the farm and the commercial LTE (Long-Term Evolution) wireless access. Measurements/predictions have been done several times at each point along the route shown in the Fig. 15. Figure 17 shows a plot of the measured and estimated uplink throughput of LTE and wireless LAN. As shown in the figure, the estimated values are obtained at a level close to the median with respect to the measured values with large fluctuations. The estimated accuracy by machine learning was RMSE (Root Mean Square Error) = 0.42 Mbps for wireless LAN and RMSE = 0.39 Mbps for LTE. In the optimal access selection operation using these estimated values, it was possible to estimate the quality deterioration of the wireless LAN and determine that the network switching instruction could be issued at a position before the uplink throughput felt below 1 Mbps, which was the necessary bandwidth for video transmission.
In the experiment in the Iwamizawa City environment, regional BWA and commercial 5G/LTE wireless access were used for a network selection test. Figure 18 shows the measured and estimated uplink throughputs of BWA and 5G/LTE. At the start point of the course in the autonomous driving tractor experiment, the BWA was selected on the assumption that the use of local communication was prioritized and the quality was estimated to satisfy the standard value of 1.5 Mbps in this test. The BWA was used at the front of the point blocked by a windbreak in the middle of the autonomous driving course. We predicted the deterioration of uplink throughput of the BWA by estimating values in advance and changed the network to 5G/LTE. For this quality estimation, the throughput was measured in advance at the experimental site, and the neural network was trained with the throughput. As shown in Fig. 18, the estimated accuracy was RMSE = 7.4 Mbps for 5G/LTE lines and RMSE = 0.93 Mbps for regional BWA. The error in the estimated value of 5G/LTE was particularly large due to the large fluctuations of the measured values in 5G/LTE. Keeping up with such changes in the estimation is an issue for the future. In this automatic driving using estimated throughput values, smooth transmission was performed between BWA and 5G/LTE without large disturbance in the transmitted video quality.
5.2.2 End-to-End Overlay Network
For the E2E Overlay Network, we verified that the end-to-end data transmission for remote monitoring could be seamlessly switched among two different networks. Figure 19 shows how packet loss and RTT changed after switching network from regional BWA to 5G/LTE by instruction from Cradio\(^{\text{®}}\), which was measured by fping between the CPE installed in the tractor and the remote monitoring viewer on the remote site. It shows that no significant change of packet loss or RTT was found after switching network. It means that undesired additional packet loss or RTT was avoided by both proactive decision of network switching based on quality prediction by Cradio\(^{\text{®}}\) and smooth network switching by E2E Overlay Network. In addition to this packet-level data, we also confirmed that the video observed at the remote site was continued smoothly after switching network.
5.2.3 Machine Vision for Safety (Human Detection)
The following evaluations concerning the object detection accuracy were made to verify the practicality and safety of the system. Twenty still images were arbitrarily captured from the video when there was an actual detection by the development system. Then, the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) were measured and evaluated from those images as shown in Table 3.
For an evaluation method, the precision, defined in \(\rm TP / (TP + FP)\), and the recall, \(\rm TP / (TP + FN)\), were calculated. The results are summarized in Table 4. As mentioned above, detection must be performed to avoid collision between a person and the tractor. Therefore, it is important to focus on the recall, which is an indicator of the small number of missed items.
Recall was 91%, and since the detection was performed about 20 times per second, it was judged that humans could almost certainly be detected during practical operation. In addition, when checking the examples of false detections, the most common FN took place when a group of many people was detected; but the number of people detected was less than the actual number of persons in the picture.
5.2.4 Machine Vision for Positioning (Road Detection)
Experimental runs were performed to assess the accuracy of the machine vision system for road detection. The experimental runs consisted of traveling a straight path using automatic navigation on a dirt road at the experimental farm of Hokkaido University. The road is shown in Fig. 20. The same pre-determined navigation map was used for the two experimental runs. One was performed using RTK-GNSS as the input of the automatic navigation system, and the other was performed using the machine vision system as the input of the automatic navigation system. The lateral error obtained from the automatic navigation run using the machine vision system for road detection was compared with the lateral error obtained from the automatic navigation run using RTK-GNSS.
Figure 21 shows the lateral error for one traveled path. The lateral error from the RTK-GNSS is less than 0.05 meters, and this data serves as a reference or real data. The lateral error from the machine vision system is less than 0.10 meters. An initial deviation of around 0.40 meters caused by an offset in the initial position of the tractor is also shown in this figure. Such offset is also sometimes present when using RTK-GNSS.
In the case of a paved road, automatic experimental runs are yet to be performed. At this point of development, by using manually driven video samples, lateral errors are obtained. Figure 22 shows the lateral error results from machine vision only. In this figure, a maximum lateral error of 1.2 meters can be seen, but on average, the lateral error is within 0.20 meters. The possible reasons for such a large value of lateral error are illumination, reflection from the road surface, etc.
5.2.5 Network Cooperated Vehicle Control
Parameters for the experiment regarding the Network Cooperated Vehicle Control are shown below:
- interval of ping in fping (option p): 10 msec
- number of ping in fping (option c): 5
- timeout in fping (option t): 150 msec
- determination period: 1000 msec
- upper threshold of packet loss: 20%
To simulate network congestion, we inserted packet loss in 30% at a network node on the path of video transmission between the vehicle and the remote site. We used Raspberry PI 4 (8 GB of RAM) to run the Network Cooperated Vehicle Control Function.
Figure 23 shows the velocity of the autonomous tractor and observed packet losses in the experiment. The autonomous tractor automatically stopped when network congestion was simulated and did not unexpectedly stop before simulated congestion. We also found that the video shown on the remote monitor was stopped when network congestion was inserted, which is the case in which emergency stop is necessary since remote monitoring is not working correctly.
We also found that the Network Cooperated Vehicle Control Function took 1.27 seconds from the receipt of the first packet loss report that exceeded the threshold to the completion of stop signal transmission to the Autonomous Tractor Control Function. This is shorter than the target period of 1.41 seconds as mentioned in Sect. 4.5, so the Network Cooperated Vehicle Control Function performs fast enough for the case we envisioned this time. Please note that an interval from the rise of the packet loss can be seen, which was because the determination time mechanism described in the Sect. 4.5 did not detect constant losses due to some no packet loss reports observed right after the insertion of packet loss.
5.2.6 Autonomous Tractor Control System
We evaluated the performance of a speed control system by performing autonomous driving on four paths in a field of the experimental farm of Hokkaido University. Figure 24 shows the experimental location. The position points of the experimental path were provided by the RTK-GNSS, and this data log is stored in the Hokkaido University System PC. In addition, the traveling speed is recorded by both the RTK-GNSS and the tractor’s speedometer, the data of which is obtained through the CAN-BUS. One experimental run was performed at 1.2 m/s (around 4.2 km/h), and another was performed at 1.7 m/s (around 6.3 km/h). These two speeds were selected because they were close to the typical minimum and maximum speeds of a tractor in working conditions in a field and traveling conditions on a road.
Figure 25 shows a summary of the experimental results for the traveling speed of 1.7 m/s. The left figure shows the traveled path in latitude and longitude coordinates recorded by the GNSS receiver. It can be observed that the four paths are straight and there are no deviation maneuvers or oscillations even after the turning maneuver, which can be interpreted as the overall stability. The right figure compares the traveling speeds. The data recorded by the GNSS receiver is shown in blue, and the velocity recorded by the tractor’s speedometer is shown in orange. For all the four paths, the traveling speeds recorded by both the GNSS receiver and the tractor’s speedometer are very similar, showing a RMSE of only 0.06 m/s; which shows that the automatic speed control system is accurate enough to perform automatic navigation. Results were similar for the test of 1.2 m/s.
We also found that the AutoRunHub framework worked fine with multiple external systems simultaneously: remote desktop control, obstacle sensor, GeoMation, and the Network Cooperated Vehicle Control Function. Especially, when the Network Cooperated Vehicle Control Function told AutoRunHub to stop due to the network congestion and then detected that the congestion was eliminated, AutoRunHub properly waited for the Go signal from GeoMation by human operation as it was designed to do.
6. Conclusion
We identified major issues to be solved for achieving complete autonomous driving of farm machines with remote monitoring and control, designed a unified architecture based on the concept of the Cooperative Infrastructure Platform on the IOWN, developed and integrated the component technologies. Then we evaluated that the architecture with developed component technologies can be applied to the complete autonomous driving of farm machines through the field experiment. Combination of the Wireless Network Quality Prediction and the E2E Overlay Network realized seamless network switching for stable remote monitoring. A multi-layer safety system with the Human Detection and the Network Cooperated Vehicle Control showed that it can ensure safety even when existing safety mechanisms or remote monitoring are not properly working. We also showed that machine vision is applicable as the positioning mechanism to complement the existing RTK-GNSS positioning system with enough accuracy. An autonomous tractor control system was also proved to work properly and accurately with a specific type of tractor in an actual field with various types of external systems for ensuring safe operation.
Acknowledgments
Our experiments have been largely supported by Iwamizawa City and Nishiyauchi Farm in the context of using the regional BWA network and farm roads for field experiments.
References
[1] Ministry of Agriculture, Forestry and Fisheries, “Promotion of smart agriculture,” Ministry of Agriculture, Forestry and Fisheries, https://www.maff.go.jp/e/policies/tech_res/smaagri/attach/pdf/robot-1.pdf, accessed April 23. 2021.
URL
[2] Ministry of Agriculture, Forestry and Fisheries, “Guidelines for ensuring safety regarding automatic driving of agricultural machinery,” Ministry of Agriculture, Forestry and Fisheries, https://www.maff.go.jp/j/kanbo/smart/attach/pdf/gl210326.pdf, accessed April 23. 2021 (in Japanese).
URL
[3] Y. Kikuchi, “Risk analysis of highly automated agricultural machinery and development of safety requirements for machine and use,” Journal of the Japanese Society of Agricultural Machinery and Food Engineers, vol.81, no.5, pp.284–288, 2019 (Written in Japanese.)
[4] J. Sawada, M. Ii, and K. Kawazoe, IOWN: Beyond the Internet (English Edition), NTT Publishing, Tokyo, 2020.
[5] A. Ford, C. Raiciu, M. Handley, O. Bonaventure, and C. Paasch, “TCP extensions for multipath operation with multiple addresses,” Internet Engineering Task Force, RFC 8684, accessed April 26. 2021.
CrossRef
[6] R.R. Stewart, “Stream control transmission protocol,” Internet Engineering Task Force, RFC 4960, accessed April 26. 2021.
URL
[7] N. Noguchi, “Agricultural vehicle robot,” J. Robot. Mechatron., vol.30, no.2, pp.165–172, 2018.
CrossRef
[8] NTT Corporation, “Multi-wireless proactive control technology Cradio,” https://www.ansl.ntt.co.jp/ja/research/movie10_cradio.html, accessed April 26. 2021 (in Japanese).
URL
[9] K. Wakao, K. Kawamura, and T. Moriyama, “Quality-prediction technology for optimal use of multiple wireless access networks,” NTT Technical Review, vol.18, no.6, pp.17–20, June 2020.
[10] K. Wakao, S. Nakayama, K. Kawamura, T. Moriyama, and Y. Takatori, “Wireless communication prediction for base station selection in multiple wireless network system,” Proc. IEICE Gen. Conf. ’2020, online, B-5-120, March 2020 (in Japanese).
[11] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv:1804.02767, 2018.
URL
[12] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick, and P. Dollár, “Microsoft COCO: Common objects in context,” arXiv:1405.0312v3, 2015.
URL
[13] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” International Journal of Robotics Research, vol.32, no.11, Sept. 2013. DOI: 10.1177/0278364913491297
CrossRef
[14] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vision, vol.88, no.2, pp.303–338, June 2010. DOI:10.1007/s11263-009-0275-4
CrossRef
[15] H. Caesar, V. Bankiti, A.H. Lang, S. Vora, V.E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” arXiv:1903.11027v5, May. 2020.
URL
[16] T. Tsubaki, R. Ishibashi, and T. Kuwahara, “A study of remote monitoring support for self-driving car on the basis of network jitter,” Proc. IEICE Gen. Conf. ’2021, online, B-5-20, March 2021 (in Japanese).
[17] Hitachi Solutions, Ltd., “Enterprise geographical information system GeoMation,” https://www.hitachi-solutions.com/pdf/geomation.pdf, accessed April 24. 2021.
URL
[18] Kubota Corporation, “Agri Robo MR1000A,” https://agriculture. kubota.co.jp/product/tractor/agrirobo_mr1000a/lineup/index-pc.html, accessed April 24. 2021.
URL
[19] JK Imaging Ltd., “KODAK PIXPRO 4KVR360 360° VR CAMERA,” https://kodakpixpro.com/docs/manuals/vrcamera/4kvr360/ 4KVR360-specs-web.pdf, accessed April 24. 2021.
URL
[20] Axis Communications AB., “AXIS P1378-LE network camera,” https://www.axis.com/products/axis-p1378-le, accessed April 24. 2021.
URL
[21] WITH Networks Ltd., “Ipsim prepaid,” https://ipsim.net/biz/plans/prepaid.html, accessed April 24. 2021 (in Japanese).
URL
[22] NEC Platforms, Ltd., “LTE mobile router aterm MR05LN,” https://www.aterm.jp/product/atermstation/product/mobile/mr05ln/, accessed April 24. 2021 (in Japanese).
URL
[23] NTT Docomo Inc., “Wi-Fi STATION SH-52A,” https://www.nttdocomo.co.jp/product/sh52a/, accessed April 24. 2021 (in Japanese).
URL
[24] Hamanasu Information Co., Ltd., “Hamanasu BWA service,” https://hamanasu-bwa.com/, accessed April 24. 2021 (in Japanese).
URL
[25] Cathay Tri-Tech., Inc., “LTE/3G module equipped programmable cellular router CTL-201JC,” http://www.cathay.jp/product/m2m/ctl-201jc_ctl-201je.html, accessed April 24. 2021 (in Japanese).
URL
[26] NTT DOCOMO Inc., “Access premium,” https://www.nttdocomo.co.jp/biz/service/premium_lte/, accessed April 24. 2021 (in Japanese).
URL