An Edge Computing Platform of Guide-dog Robot for Visually ...
Transcript of An Edge Computing Platform of Guide-dog Robot for Visually ...
Jinhui Zhu, Yinong Chen, Mei Zhang, Qiang Chen, Ying Guo, Huaqing Min, and Zhilie Chen, "An Edge Computing Platform of
Guide-dog Robot for Visually Impaired", in Proceedings of IEEE 14th International Symposium on Autonomous Decentralized
Systems, Utrecht, the Netherlands, April 2019, pp. 23 - 30.
An Edge Computing Platform of Guide-dog Robot for
Visually Impaired
Jinhui Zhu
School of Software Engineering
South China University of
Technology
Guangzhou, China
Yinong Chen
School of Computing, Informatics
and Decision Systems
Engineering
Arizona State University
Tempe, USA
School of Automation Science
and Control Engineering
South China University of
Technology
Guangzhou, China
Qiang Chen
College of computer &
information science
Southwest University
Chongqing, China
Ying Guo
School of Information Science
and Technology
Qingdao University of Science
and Technology
Qingdao, China
Huaqing Min
School of Software Engineering
South China University of
Technology
Guangzhou, China
Zhilie Chen
EVOC Intelligent Technology
Company Limited
Shen zhen, China
Abstract—Information processing of guide dog robot requires
expensive computing resources to meet real-time performance.
We propose an edge computing framework based on Intel Up-
Squared board and neural compute stick. Image processing and
real-time control are performed on the framework. In addition,
the voice recognition and commanding are also implemented,
which are processed on Amazon cloud service. Vision, speech, and
control services are integrated by ASU VIPLE (Visual
IoT/Robotics Programming Language Environment). A prototype
is developed, which implements the guide dog’s obstacle
avoidance, traffic sign recognition and following, and voice
interaction with human.
Keywords— Guide dog robot, Visually Impaired, Edge
Computing, Neural Compute Stick, Internet of thing, ASU VIPLE,
Embedded system
I. INTRODUCTION
Guide dogs or service dogs are assistance dogs trained to help blind and visual impaired people to move in complex read situations, such as obstacle and traffic signs. It requires a long period of training time to turn out a qualified guided dog. The demand is higher the available guide dogs, and the cost is high, which makes such guide dog not affordable for many visually impaired people. Moreover, dogs are red-green color blind and are not capable of interpreting traffic signs. They cannot interact with people through natural language. As the development of robotics in last decades, guide dog robots with vision and speech can overcome the above disadvantages. The complex information processing and real-time control tasks of guide dog robots, especially vision processing, require a large amount of computing resources to meet real-time performance. The computing power of embedded system is insufficient. Image
processing on cloud service will cause a large delay and occupy network bandwidth. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy [1].
This paper presents an edge computing framework for the guide dog robot system. We use service-oriented architecture [2] and Robot as a Service (RaaS) concepts [3][4], where both software and hardware modules communicate with each through service interface and communication standards. The orchestration among the services are implemented through workflows using a visual programming language.
The two main subsystems, the dog robot and the wheelchair, are shown in Fig. 1. The dog robot has 17 motors for complex movement and a high definition camera is mounted on its head. The wheelchair consists of an Intel Up-Squared IoT board with high-performance CPU and neural compute stick (VPU, Visual Process Unit), which form the core of the framework. It processes images to detect car, traffic lights and so on. The board is also responsible for real-time control of the wheelchair.
Human voice commands from a visually impaired person, is processed using Amazon Alexa and Lambda functions in the cloud. The voice instructions are then sent back to the framework through the Amazon IOT platform. The integration of vision, voice, control and other modules is done through ASU VIPLE (Visual IoT/Robotics Programming Language Environment) [5]. The prototype system realizes the basic functions of the guide dog to avoid obstacles and to follow traffic signal.
2
The contributions of this research are as follows. (1) Using a high-performance embedded board and a neural compute stick to build an edge computing hardware framework for deep learning acceleration; (2) Applying this framework for guide dog robot’s data collection, real-time vision processing and wheelchair control; (3) Using the visual programming software ASU VIPLE to integrate multiple decentralized functional modules with edge computing and cloud computing.
Fig. 1. Guide dog robot and wheelchair robot.
The remainder of this paper is organized as follows. Section II gives a brief review of the related work. Section III discusses the need of the guide dog robot system. Section IV presents the proposed edge computing framework. Section shows the experiment results are. Finally, Section Vi concludes the paper.
II. RELATED WORK
The earliest research on guide dog robot was the MELDOG project in 1985[6]. The studies in [7][8] developed a guide dog robot to lead and recognize a visually-handicapped person using a laser range finder sensor. The research in [9] built an outdoor navigation system using GPS and tactile-foot feedback, which were wore on blind pedestrian.
More guide dog projects [10][11][12][13][14][15] used cameras to perceive the world. The smart phone streams video to the laptop on the robot to process [10][11]. In contract, the images from smartphone are sent to cloud service to process [12]. In [15], the system used an embedded system with two cameras, among other sensors. However image processing is on remote server, which takes longer time than an edge computing server. The study in [13] used GPU to handles the video processing, mapping and localization. The work in [14], a convolutional neural network was trained to recognize the activities taking place around the camera mounted on the real guide dog and to provide feedback to the visual impaired human user.
III. SYSTEM REQUIREMENT
We set up our robot guide dog system to have the following basic functions.
(1) Visual function: The dog robot can visually observe the
surrounding environment, identify obstacles, identify traffic
signals and vehicles when crossing the road.
(2) Language function (understanding and speaking): The
dog robot can hear the natural language of the blind, understand
the user's intentions, and respond with the voice.
(3) Decision function: The robot can integrate various
environmental information and blind person’s intentions to
make correct decisions. For example, if the blind person
commands the advancement, and the robot sees that the vehicle
is passing, the robot's decision is not to execute the forward
command. This is the Intelligent disobedience that a very
intelligent guide dog should have.
(4) Mobile function: The robot has the ability to move
autonomously. A blind person with limited mobility can take a
self-propelled wheelchair.
IV. IMPLEMENTATION OF PROPOSED SYSTEM
A. Hardware Architecture
The whole hardware architecture are organized in three levels, as shown in Fig. 2. The lower level is the senor and actuator. The upper level is the amazon cloud devices. The middle level is the resource of edge computing, which consists of CPU and Compute Stick VPU.
Fig. 2. Guide dog robot and wheelchair robot.
The main components of the system are showed elaborated as following
• UPSquared: Intel’s UP Squared is world’s fastest maker board with high performance and low power consumption. The board in the project has Intel(R) Celeron(R) CPU N3350 @ 1.10GHz, 2GB LPDDR4, and 32GB eMMC. A 40-pin GP-bus and 3 usb3.0 port provide the freedom for maker to connect other devices.
• Intel Neural Compute Stick: The stick is designed for deep neural network (DNN) inference which powered
Wi-Fi
WIFI AP
Amazon Cloud
Bluetooth
Wi-Fi
Wi-Fi
Wi-Fi
4G
Dog RobotCamera
Neural Compute Stick
Notebook MobilePhone
AmazonEchoWheel chair
or
Up Squared
Wire
Edge Computing
Up Squared
Wi-Fi AP
3
by high performance intel® Movidius Vision Process Unit(VPU). We can simply think of the stick as a GPU running on a USB stick.
• Dog Robot: The dog robot is constructed from Robotis robot kit. The dog action depends on 17 AX-12 servo motors. The dog can detect obstacles by IR sensor. The controller CM-530 runs the motion code and task code. The dog communicates with other system with Bluetooth 4.0 module.
• Camera: GoPro HERO4 Session is smaller and lighter than any other previous GoPro camera, yet delivers the same stunning professional image quality. It has built-in battery and Wi-Fi. So that it can be mounted on the dog robot’s head as the dog’s eye.
• Wheelchair Robot: The prototype uses a different wheeled mobile robot. Two motors are driven by a L298N dual H bridge motor driver. The two ultrasonic distance sensors are mounted on the front position. UP Squared board is its main controller.
• Amazon Echo Dot: Echo Dot is a hands-free, voice-controlled device with a small built-in speaker. It uses Alexa to play music, control smart home devices, make calls, answer questions, set timers and alarms, and more. We use Echo Dot for voice input and out and the Alex behind for voice processing.
B. Software framework and Algorithm
The software framework used in the guide dog systems is shown on Fig. 3.
Fig. 3. Software Framework.
1) Vision The image capture model receives live stream from the
camera through H264 and UDP protocol. Then the images are dispatched to different image processing modules for detecting different kinds of objects. We use deep neural network for complex object detection. In addition, we use recognition methods based on manual selection features to detect traffic light
and cone. Finally, the fused vision information is provided as a vision service.
a) Object Detection
In this paper, we use a single deep neural network named SSD (Single Shot MultiBox Detector) [16] to detect objects such as car, bottle, and so on. SSD is a fast single-shot object detector for multiple categories. A key feature of the model is the use of multi-scale convolutional bounding box outputs attached to multiple feature maps at the top of the network. The neural network model is trained on a power PC/GPU, then exported to the neural compute stick. The image from the camera is inferred on the stick which is faster than on embedded CPU. Fig. 4 (a) demonstrates object detection.
b) Traffic Cone Detection
First, the red regions are extracted by color segmentation in the same way as traffic light detection. Second, the edges are detected by canny algorithm and approximate a few polygonal contours. Third, the contours that are convex hulls are selected. Finally, according to the shape of a traffic cone on the ground, a contour is recognized as a cone where the upper points on the contour must be within the width of the lower points. A cone is detected in Fig. 4 (b).
c) Traffic Light detection
The image is converted to HSI color space. After color segmentation with user defined threshold and smoothing by a median filter, Circle Hough Transform is used to detect circular object. The circle with appropriate radius would recognized as traffic light. Fig. 4 (c) and (d) show recognition of red and green traffic light.
Fig. 4. Image Processing Demo.
2) Speech To use vocal commands, we created a custom skill set
in the Alexa Skill Set for the Amazon Echo Dot. The invocation opens up the program that contains the robot
Text
MQTT
PublishAWS IoT
Speech Service
MQTT Subscriber
Human
Command
Image
Intent
Sensor/Action
Image
capture
Object
Detection
Traffic Sign
Detection
Obstacle
Detection
World
Information
Vision Service
Motion
control
Guide Dog Service
Sense
Wheelchair Robot Service
Motor
ControlSense
Speech Service Provider
Alexa
Custom Skill
Life Assistant
Robot
Command
AWS Lambda Function
Guide Dog Service Provider
Decision Center
Cross-Road
WonderService
Provider
Service
Provider
Proxy
TTS
Communication
Voice
(a) Object detection (b) Cone detection
(c) Red traffic light detection (d) Green traffic light detection
4
commands, and the user may speak in a natural manner to the guide dog robot in order for it to perform the commands that the user wants it to execute. AWS Lambda, in which Node.js 6.10 is running, executes the commands defined in the Alexa Skill Set, and it sends them to the AWS IoT, where the other components of the robot are connected.
3) Dog robot The program running on CM530 controller of the dog
includes motion code and task code. The dog’s actions are edited and saved through RoboPlus Motion, shown in Fig. 5. The saved motion file can be executed by Task code. The Task Code is a set of motions to perform certain actions. The robot moves according to the task codes. RoboPlus Task shown in Fig. 6 is a piece of program to make writing these task codes easier.
Fig. 5. Robot Motion Programming.
Fig. 6. Robot Task Programming
4) Wheelchair robot The real-time control program running on UP Squared
board has three threads. The main thread is in charge for network connection. Sensor thread reads sonar sensor values, then it sends to the decision center. Motion thread receives command from decision center, then it controls two motors. All data and commands are formatted as in JSON format. The interface to IO on Up Squared board is based on Intel mraa library.
5) Decision Center The decision center is the brain of the system. The center
collects information from vision and speech modules, and it
sends commands to dog robot and wheelchair. VIPLE code makes the integration easy. VIPLE is based on Microsoft’s Robotics Developer Studio (MRDS) and Visual Programming Languages (VPL) [2][5]. It was used to integrate several modules of our project together. Robot/IoT controller is the proxy to connect to other modules. Some complex processing functions, such as VisualParse, are wrapped as VIPLE Code activity written in C#. Finally, separate programs are combined together to form a state diagram and workflow for the entire project. Fig. 7 demonstrates VIPLE code that implements the function of cross road avoiding cars.
V. EXPERIMENTS
Based on the developed framework, we conducted various experiments. We will show two experiments here that verify the functionality of the system. In addition, two experiments designed and conducted to test time performance.
A. Avoid Obstacle
The guide dog robot and the wheelchair turn around the obstacle, when the vision module recognizes the obstacle in front. The motion sequence of the robots is shown in Fig. 8.
B. Cross road
The blind person says “go forward” to cross the road. If the dog robot detects that there is a vehicle in front, the dog prompts the user by voice and wait, as shown in Fig. 9. Similarly, the dog will not follow the user’s instructions when it sees a red traffic light, as shown in Fig. 10.
Fig. 7. VIPLE sample code for Cross Road
5
Fig. 8. Avoid Obstacle.
Fig. 9. Cross road meeting a car.
Fig. 10. Cross road meeting red traffic light.
C. Time Performance and Resource Usage
The first experiment was to compare the time performance of the system processing images with different hardware configurations. The laptop is Lenovo ThinkPad x220, which has Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz and 4GB memory. The video stream from GoPro camera is 25 fps (frames per second). The experiment set sample rate as 1/25, which means that the vision module only process 1/25 frames.
The measured data is shown in TABLE I. Fig. 11(a) and (b) show that using Compute VPU is faster than using CPU, and CPU time consumption is lower. When using VPU for deep network inference, the laptop is slower than the embedded Up Square board. The reason is the USB port speed. The laptop is USB2, whereas the embedded board is USB3. When the image is processed on CPU without VPU, the speed is slower and the
CPU occupancy rate is higher. As shown in Fig. 11(c), the usage rate is basically stable in all cases, and the use of VPU consumes less memory.
TABLE I. PERFORMANCE AND USAGE WITH DIFFERENT CONFIGURATIONS
Case Computer DNN inference
device
Time
(ms)
CPU
(%)
Mem
(%)
1 UPSquared VPU 120.0 13.6 8.6
2 Laptop VPU 141.1 15.5 4.4
3 Laptop CPU 371.7 44.7 9.2
4 UPSquared CPU 894.5 99.2 18.6
The second experiment is to test the effect of sampling rate parameter on time performance. This experiment is tested on embedded board only with neural compute stick. The average processing time for each frame is about 120ms, thus, the fastest processing speed is about 8fps and the corresponding sample rate should be 1/3. Fig. 12 and Fig. 13Error! Reference source not found. show the performance and usage with sample rates 1/3, 1/10, 1/20 and 1/25. When the system runs at the highest speed, the occupancy rates of CPU and memory are acceptable.
TABLE II. PERFORMANCE AND USAGE WITH DIFFERENT SAMPLE RATE
Case Sample rate FPS Time (ms) CPU
(%)
Mem
(%)
1 1/3 8.0 114.8 41.2 8.7
2 1/10 2.7 119.6 21.4 8.7
3 1/20 1.4 120.2 18.3 8.8
4 1/25 1.3 120.3 15.5 8.8
(a) Time
6
Fig. 11. Performance and Usage with different hardware configurations.
Fig. 12. Performance and Usage with different sample rates
(b) CPU usage
(c) Memory usage
(a) Time Performance
(b) Usage
7
Fig. 13. Histogram of Performance and Usage with different sample rates.
I. CONCLUSIONS
In this paper, we presented a guide dog robot prototype and developed an edge computing framework for the guide dog robot system. The intel high-performance embedded board plus a neural compute stick met the needs of video and audio processing with real-time control. VIPLE visual programming environment effectively integrated multiple decentralized services and devices. In our future work, computing resource of smart phone will be considered, and the algorithm of vision processing needs to be improved.
ACKNOWLEDGMENT
This work is partly supported by the Fundamental Research Funds for China Scholarship Council Program (Grant no. 201706155094), Science and Technology Planning Project of Guangzhou (Grant no. 201707010437), Science and Technology Planning Project of Guangdong Province, China (Grant no. 2017B090910005), the Central Universities (Grant no. 2015zz100).
REFERENCES
[1] Shi W, Cao J, Zhang Q, et al. Edge computing: Vision and challenges[J]. IEEE Internet of Things Journal, 2016, 3(5): 637-646.
[2] Yinong Chen, Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services, 6th edition, Kendall Hunt Publishing, 2018.
[3] Yinong Chen, Hualiang Hu, "Internet of Intelligent Things and Robot as a Service", Simulation Modelling Practice and Theory, Volume 34, May 2013, Pages 159–171.
[4] Yinong Chen, Zhihui Du, and Marcos Garcia-Acosta, M., "Robot as a Service in Cloud Computing", In Proceedings of the Fifth IEEE International Symposium on Service Oriented System Engineering (SOSE), Nanjing, June 4-5, 2010, pp.151-158.
[5] VIPLE, http://neptune.fulton.ad.asu.edu/VIPLE/.
[6] Tachi, K. Tanie, K. Komoriya and M. Abe, "Electrocutaneous Communication in a Guide Dog Robot (MELDOG)," in IEEE Transactions on Biomedical Engineering, vol. BME-32, no. 7, pp. 461-469, July 1985.
[7] Saegusa, Shozo, et al. "Development of a guide-dog robot: Leading and recognizing a visually-handicapped person using a LRF." Journal of Advanced Mechanical Design, Systems, and Manufacturing 4.1 (2010): 194-205.
[8] Saegusa, Shozo, et al. "Development of a guide-dog robot: human–robot interface considering walking conditions for a visually handicapped person." Microsystem Technologies 17.5-7 (2011): 1169-1174.
[9] Velázquez, Ramiro, et al. "An Outdoor Navigation System for Blind Pedestrians Using GPS and Tactile-Foot Feedback." Applied Sciences 8.4 (2018): 578.
[10] Wei, Yuanlong, Xiangxin Kou, and Mincheol Lee. "Development of a guide-dog robot system for the visually impaired by using fuzzy logic based human-robot interaction approach." Control, Automation and Systems (ICCAS), 2013 13th International Conference on. IEEE, 2013.
[11] Y. Wei and M. Lee, "A guide-dog robot system research for the visually impaired," 2014 IEEE International Conference on Industrial Technology (ICIT), Busan, 2014, pp. 800-805.
[12] J. Jakob and J. Tick, "Concept for transfer of driver assistance algorithms for blind and visually impaired people," 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl'any, 2017, pp. 000241-000246.
[13] Chaudhari, Gaurao, and Asmita Deshpande. "Robotic assistant for visually impaired using sensor fusion." 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 2017.
[14] J. Monteiro, J. P. Aires, R. Granada, R. C. Barros and F. Meneguzzi, "Virtual guide dog: An application to support visually-impaired people through deep convolutional neural networks," 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 2267-2274.
[15] W. M. Elmannai and K. M. Elleithy, "A Highly Accurate and Reliable Data Fusion Framework for Guiding the Visually Impaired," in IEEE Access, vol. 6, pp. 33029-33054, 2018.
[16] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
Reference to this paper:
Jinhui Zhu, Yinong Chen, Mei Zhang, Qiang Chen, Ying Guo, Huaqing Min, and Zhilie Chen, "An Edge
Computing Platform of Guide-dog Robot for Visually Impaired", in Proceedings of IEEE 14th International
Symposium on Autonomous Decentralized Systems, Utrecht, the Netherlands, April 2019, pp. 23 - 30.