An Edge Computing Platform of Guide-dog Robot for Visually ...

7
Jinhui Zhu, Yinong Chen, Mei Zhang, Qiang Chen, Ying Guo, Huaqing Min, and Zhilie Chen, "An Edge Computing Platform of Guide-dog Robot for Visually Impaired", in Proceedings of IEEE 14th International Symposium on Autonomous Decentralized Systems, Utrecht, the Netherlands, April 2019, pp. 23 - 30. An Edge Computing Platform of Guide-dog Robot for Visually Impaired Jinhui Zhu School of Software Engineering South China University of Technology Guangzhou, China [email protected] Yinong Chen School of Computing, Informatics and Decision Systems Engineering Arizona State University Tempe, USA [email protected] [email protected] School of Automation Science and Control Engineering South China University of Technology Guangzhou, China [email protected] Qiang Chen College of computer & information science Southwest University Chongqing, China [email protected] Ying Guo School of Information Science and Technology Qingdao University of Science and Technology Qingdao, China [email protected] Huaqing Min School of Software Engineering South China University of Technology Guangzhou, China [email protected] Zhilie Chen EVOC Intelligent Technology Company Limited Shen zhen, China [email protected] Abstract—Information processing of guide dog robot requires expensive computing resources to meet real-time performance. We propose an edge computing framework based on Intel Up- Squared board and neural compute stick. Image processing and real-time control are performed on the framework. In addition, the voice recognition and commanding are also implemented, which are processed on Amazon cloud service. Vision, speech, and control services are integrated by ASU VIPLE (Visual IoT/Robotics Programming Language Environment). A prototype is developed, which implements the guide dog’s obstacle avoidance, traffic sign recognition and following, and voice interaction with human. Keywords— Guide dog robot, Visually Impaired, Edge Computing, Neural Compute Stick, Internet of thing, ASU VIPLE, Embedded system I. INTRODUCTION Guide dogs or service dogs are assistance dogs trained to help blind and visual impaired people to move in complex read situations, such as obstacle and traffic signs. It requires a long period of training time to turn out a qualified guided dog. The demand is higher the available guide dogs, and the cost is high, which makes such guide dog not affordable for many visually impaired people. Moreover, dogs are red-green color blind and are not capable of interpreting traffic signs. They cannot interact with people through natural language. As the development of robotics in last decades, guide dog robots with vision and speech can overcome the above disadvantages. The complex information processing and real-time control tasks of guide dog robots, especially vision processing, require a large amount of computing resources to meet real-time performance. The computing power of embedded system is insufficient. Image processing on cloud service will cause a large delay and occupy network bandwidth. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy [1]. This paper presents an edge computing framework for the guide dog robot system. We use service-oriented architecture [2] and Robot as a Service (RaaS) concepts [3][4], where both software and hardware modules communicate with each through service interface and communication standards. The orchestration among the services are implemented through workflows using a visual programming language. The two main subsystems, the dog robot and the wheelchair, are shown in Fig. 1. The dog robot has 17 motors for complex movement and a high definition camera is mounted on its head. The wheelchair consists of an Intel Up-Squared IoT board with high-performance CPU and neural compute stick (VPU, Visual Process Unit), which form the core of the framework. It processes images to detect car, traffic lights and so on. The board is also responsible for real-time control of the wheelchair. Human voice commands from a visually impaired person, is processed using Amazon Alexa and Lambda functions in the cloud. The voice instructions are then sent back to the framework through the Amazon IOT platform. The integration of vision, voice, control and other modules is done through ASU VIPLE (Visual IoT/Robotics Programming Language Environment) [5]. The prototype system realizes the basic functions of the guide dog to avoid obstacles and to follow traffic signal.

Transcript of An Edge Computing Platform of Guide-dog Robot for Visually ...

Page 1: An Edge Computing Platform of Guide-dog Robot for Visually ...

Jinhui Zhu, Yinong Chen, Mei Zhang, Qiang Chen, Ying Guo, Huaqing Min, and Zhilie Chen, "An Edge Computing Platform of

Guide-dog Robot for Visually Impaired", in Proceedings of IEEE 14th International Symposium on Autonomous Decentralized

Systems, Utrecht, the Netherlands, April 2019, pp. 23 - 30.

An Edge Computing Platform of Guide-dog Robot for

Visually Impaired

Jinhui Zhu

School of Software Engineering

South China University of

Technology

Guangzhou, China

[email protected]

Yinong Chen

School of Computing, Informatics

and Decision Systems

Engineering

Arizona State University

Tempe, USA

[email protected]

[email protected]

School of Automation Science

and Control Engineering

South China University of

Technology

Guangzhou, China

[email protected]

Qiang Chen

College of computer &

information science

Southwest University

Chongqing, China

[email protected]

Ying Guo

School of Information Science

and Technology

Qingdao University of Science

and Technology

Qingdao, China

[email protected]

Huaqing Min

School of Software Engineering

South China University of

Technology

Guangzhou, China

[email protected]

Zhilie Chen

EVOC Intelligent Technology

Company Limited

Shen zhen, China

[email protected]

Abstract—Information processing of guide dog robot requires

expensive computing resources to meet real-time performance.

We propose an edge computing framework based on Intel Up-

Squared board and neural compute stick. Image processing and

real-time control are performed on the framework. In addition,

the voice recognition and commanding are also implemented,

which are processed on Amazon cloud service. Vision, speech, and

control services are integrated by ASU VIPLE (Visual

IoT/Robotics Programming Language Environment). A prototype

is developed, which implements the guide dog’s obstacle

avoidance, traffic sign recognition and following, and voice

interaction with human.

Keywords— Guide dog robot, Visually Impaired, Edge

Computing, Neural Compute Stick, Internet of thing, ASU VIPLE,

Embedded system

I. INTRODUCTION

Guide dogs or service dogs are assistance dogs trained to help blind and visual impaired people to move in complex read situations, such as obstacle and traffic signs. It requires a long period of training time to turn out a qualified guided dog. The demand is higher the available guide dogs, and the cost is high, which makes such guide dog not affordable for many visually impaired people. Moreover, dogs are red-green color blind and are not capable of interpreting traffic signs. They cannot interact with people through natural language. As the development of robotics in last decades, guide dog robots with vision and speech can overcome the above disadvantages. The complex information processing and real-time control tasks of guide dog robots, especially vision processing, require a large amount of computing resources to meet real-time performance. The computing power of embedded system is insufficient. Image

processing on cloud service will cause a large delay and occupy network bandwidth. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy [1].

This paper presents an edge computing framework for the guide dog robot system. We use service-oriented architecture [2] and Robot as a Service (RaaS) concepts [3][4], where both software and hardware modules communicate with each through service interface and communication standards. The orchestration among the services are implemented through workflows using a visual programming language.

The two main subsystems, the dog robot and the wheelchair, are shown in Fig. 1. The dog robot has 17 motors for complex movement and a high definition camera is mounted on its head. The wheelchair consists of an Intel Up-Squared IoT board with high-performance CPU and neural compute stick (VPU, Visual Process Unit), which form the core of the framework. It processes images to detect car, traffic lights and so on. The board is also responsible for real-time control of the wheelchair.

Human voice commands from a visually impaired person, is processed using Amazon Alexa and Lambda functions in the cloud. The voice instructions are then sent back to the framework through the Amazon IOT platform. The integration of vision, voice, control and other modules is done through ASU VIPLE (Visual IoT/Robotics Programming Language Environment) [5]. The prototype system realizes the basic functions of the guide dog to avoid obstacles and to follow traffic signal.

Page 2: An Edge Computing Platform of Guide-dog Robot for Visually ...

2

The contributions of this research are as follows. (1) Using a high-performance embedded board and a neural compute stick to build an edge computing hardware framework for deep learning acceleration; (2) Applying this framework for guide dog robot’s data collection, real-time vision processing and wheelchair control; (3) Using the visual programming software ASU VIPLE to integrate multiple decentralized functional modules with edge computing and cloud computing.

Fig. 1. Guide dog robot and wheelchair robot.

The remainder of this paper is organized as follows. Section II gives a brief review of the related work. Section III discusses the need of the guide dog robot system. Section IV presents the proposed edge computing framework. Section shows the experiment results are. Finally, Section Vi concludes the paper.

II. RELATED WORK

The earliest research on guide dog robot was the MELDOG project in 1985[6]. The studies in [7][8] developed a guide dog robot to lead and recognize a visually-handicapped person using a laser range finder sensor. The research in [9] built an outdoor navigation system using GPS and tactile-foot feedback, which were wore on blind pedestrian.

More guide dog projects [10][11][12][13][14][15] used cameras to perceive the world. The smart phone streams video to the laptop on the robot to process [10][11]. In contract, the images from smartphone are sent to cloud service to process [12]. In [15], the system used an embedded system with two cameras, among other sensors. However image processing is on remote server, which takes longer time than an edge computing server. The study in [13] used GPU to handles the video processing, mapping and localization. The work in [14], a convolutional neural network was trained to recognize the activities taking place around the camera mounted on the real guide dog and to provide feedback to the visual impaired human user.

III. SYSTEM REQUIREMENT

We set up our robot guide dog system to have the following basic functions.

(1) Visual function: The dog robot can visually observe the

surrounding environment, identify obstacles, identify traffic

signals and vehicles when crossing the road.

(2) Language function (understanding and speaking): The

dog robot can hear the natural language of the blind, understand

the user's intentions, and respond with the voice.

(3) Decision function: The robot can integrate various

environmental information and blind person’s intentions to

make correct decisions. For example, if the blind person

commands the advancement, and the robot sees that the vehicle

is passing, the robot's decision is not to execute the forward

command. This is the Intelligent disobedience that a very

intelligent guide dog should have.

(4) Mobile function: The robot has the ability to move

autonomously. A blind person with limited mobility can take a

self-propelled wheelchair.

IV. IMPLEMENTATION OF PROPOSED SYSTEM

A. Hardware Architecture

The whole hardware architecture are organized in three levels, as shown in Fig. 2. The lower level is the senor and actuator. The upper level is the amazon cloud devices. The middle level is the resource of edge computing, which consists of CPU and Compute Stick VPU.

Fig. 2. Guide dog robot and wheelchair robot.

The main components of the system are showed elaborated as following

• UPSquared: Intel’s UP Squared is world’s fastest maker board with high performance and low power consumption. The board in the project has Intel(R) Celeron(R) CPU N3350 @ 1.10GHz, 2GB LPDDR4, and 32GB eMMC. A 40-pin GP-bus and 3 usb3.0 port provide the freedom for maker to connect other devices.

• Intel Neural Compute Stick: The stick is designed for deep neural network (DNN) inference which powered

Wi-Fi

WIFI AP

Amazon Cloud

Bluetooth

Wi-Fi

Wi-Fi

Wi-Fi

4G

Dog RobotCamera

Neural Compute Stick

Notebook MobilePhone

AmazonEchoWheel chair

or

Up Squared

Wire

Edge Computing

Up Squared

Wi-Fi AP

Page 3: An Edge Computing Platform of Guide-dog Robot for Visually ...

3

by high performance intel® Movidius Vision Process Unit(VPU). We can simply think of the stick as a GPU running on a USB stick.

• Dog Robot: The dog robot is constructed from Robotis robot kit. The dog action depends on 17 AX-12 servo motors. The dog can detect obstacles by IR sensor. The controller CM-530 runs the motion code and task code. The dog communicates with other system with Bluetooth 4.0 module.

• Camera: GoPro HERO4 Session is smaller and lighter than any other previous GoPro camera, yet delivers the same stunning professional image quality. It has built-in battery and Wi-Fi. So that it can be mounted on the dog robot’s head as the dog’s eye.

• Wheelchair Robot: The prototype uses a different wheeled mobile robot. Two motors are driven by a L298N dual H bridge motor driver. The two ultrasonic distance sensors are mounted on the front position. UP Squared board is its main controller.

• Amazon Echo Dot: Echo Dot is a hands-free, voice-controlled device with a small built-in speaker. It uses Alexa to play music, control smart home devices, make calls, answer questions, set timers and alarms, and more. We use Echo Dot for voice input and out and the Alex behind for voice processing.

B. Software framework and Algorithm

The software framework used in the guide dog systems is shown on Fig. 3.

Fig. 3. Software Framework.

1) Vision The image capture model receives live stream from the

camera through H264 and UDP protocol. Then the images are dispatched to different image processing modules for detecting different kinds of objects. We use deep neural network for complex object detection. In addition, we use recognition methods based on manual selection features to detect traffic light

and cone. Finally, the fused vision information is provided as a vision service.

a) Object Detection

In this paper, we use a single deep neural network named SSD (Single Shot MultiBox Detector) [16] to detect objects such as car, bottle, and so on. SSD is a fast single-shot object detector for multiple categories. A key feature of the model is the use of multi-scale convolutional bounding box outputs attached to multiple feature maps at the top of the network. The neural network model is trained on a power PC/GPU, then exported to the neural compute stick. The image from the camera is inferred on the stick which is faster than on embedded CPU. Fig. 4 (a) demonstrates object detection.

b) Traffic Cone Detection

First, the red regions are extracted by color segmentation in the same way as traffic light detection. Second, the edges are detected by canny algorithm and approximate a few polygonal contours. Third, the contours that are convex hulls are selected. Finally, according to the shape of a traffic cone on the ground, a contour is recognized as a cone where the upper points on the contour must be within the width of the lower points. A cone is detected in Fig. 4 (b).

c) Traffic Light detection

The image is converted to HSI color space. After color segmentation with user defined threshold and smoothing by a median filter, Circle Hough Transform is used to detect circular object. The circle with appropriate radius would recognized as traffic light. Fig. 4 (c) and (d) show recognition of red and green traffic light.

Fig. 4. Image Processing Demo.

2) Speech To use vocal commands, we created a custom skill set

in the Alexa Skill Set for the Amazon Echo Dot. The invocation opens up the program that contains the robot

Text

MQTT

PublishAWS IoT

Speech Service

MQTT Subscriber

Human

Command

Image

Intent

Sensor/Action

Image

capture

Object

Detection

Traffic Sign

Detection

Obstacle

Detection

World

Information

Vision Service

Motion

control

Guide Dog Service

Sense

Wheelchair Robot Service

Motor

ControlSense

Speech Service Provider

Alexa

Custom Skill

Life Assistant

Robot

Command

AWS Lambda Function

Guide Dog Service Provider

Decision Center

Cross-Road

WonderService

Provider

Service

Provider

Proxy

TTS

Communication

Voice

(a) Object detection (b) Cone detection

(c) Red traffic light detection (d) Green traffic light detection

Page 4: An Edge Computing Platform of Guide-dog Robot for Visually ...

4

commands, and the user may speak in a natural manner to the guide dog robot in order for it to perform the commands that the user wants it to execute. AWS Lambda, in which Node.js 6.10 is running, executes the commands defined in the Alexa Skill Set, and it sends them to the AWS IoT, where the other components of the robot are connected.

3) Dog robot The program running on CM530 controller of the dog

includes motion code and task code. The dog’s actions are edited and saved through RoboPlus Motion, shown in Fig. 5. The saved motion file can be executed by Task code. The Task Code is a set of motions to perform certain actions. The robot moves according to the task codes. RoboPlus Task shown in Fig. 6 is a piece of program to make writing these task codes easier.

Fig. 5. Robot Motion Programming.

Fig. 6. Robot Task Programming

4) Wheelchair robot The real-time control program running on UP Squared

board has three threads. The main thread is in charge for network connection. Sensor thread reads sonar sensor values, then it sends to the decision center. Motion thread receives command from decision center, then it controls two motors. All data and commands are formatted as in JSON format. The interface to IO on Up Squared board is based on Intel mraa library.

5) Decision Center The decision center is the brain of the system. The center

collects information from vision and speech modules, and it

sends commands to dog robot and wheelchair. VIPLE code makes the integration easy. VIPLE is based on Microsoft’s Robotics Developer Studio (MRDS) and Visual Programming Languages (VPL) [2][5]. It was used to integrate several modules of our project together. Robot/IoT controller is the proxy to connect to other modules. Some complex processing functions, such as VisualParse, are wrapped as VIPLE Code activity written in C#. Finally, separate programs are combined together to form a state diagram and workflow for the entire project. Fig. 7 demonstrates VIPLE code that implements the function of cross road avoiding cars.

V. EXPERIMENTS

Based on the developed framework, we conducted various experiments. We will show two experiments here that verify the functionality of the system. In addition, two experiments designed and conducted to test time performance.

A. Avoid Obstacle

The guide dog robot and the wheelchair turn around the obstacle, when the vision module recognizes the obstacle in front. The motion sequence of the robots is shown in Fig. 8.

B. Cross road

The blind person says “go forward” to cross the road. If the dog robot detects that there is a vehicle in front, the dog prompts the user by voice and wait, as shown in Fig. 9. Similarly, the dog will not follow the user’s instructions when it sees a red traffic light, as shown in Fig. 10.

Fig. 7. VIPLE sample code for Cross Road

Page 5: An Edge Computing Platform of Guide-dog Robot for Visually ...

5

Fig. 8. Avoid Obstacle.

Fig. 9. Cross road meeting a car.

Fig. 10. Cross road meeting red traffic light.

C. Time Performance and Resource Usage

The first experiment was to compare the time performance of the system processing images with different hardware configurations. The laptop is Lenovo ThinkPad x220, which has Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz and 4GB memory. The video stream from GoPro camera is 25 fps (frames per second). The experiment set sample rate as 1/25, which means that the vision module only process 1/25 frames.

The measured data is shown in TABLE I. Fig. 11(a) and (b) show that using Compute VPU is faster than using CPU, and CPU time consumption is lower. When using VPU for deep network inference, the laptop is slower than the embedded Up Square board. The reason is the USB port speed. The laptop is USB2, whereas the embedded board is USB3. When the image is processed on CPU without VPU, the speed is slower and the

CPU occupancy rate is higher. As shown in Fig. 11(c), the usage rate is basically stable in all cases, and the use of VPU consumes less memory.

TABLE I. PERFORMANCE AND USAGE WITH DIFFERENT CONFIGURATIONS

Case Computer DNN inference

device

Time

(ms)

CPU

(%)

Mem

(%)

1 UPSquared VPU 120.0 13.6 8.6

2 Laptop VPU 141.1 15.5 4.4

3 Laptop CPU 371.7 44.7 9.2

4 UPSquared CPU 894.5 99.2 18.6

The second experiment is to test the effect of sampling rate parameter on time performance. This experiment is tested on embedded board only with neural compute stick. The average processing time for each frame is about 120ms, thus, the fastest processing speed is about 8fps and the corresponding sample rate should be 1/3. Fig. 12 and Fig. 13Error! Reference source not found. show the performance and usage with sample rates 1/3, 1/10, 1/20 and 1/25. When the system runs at the highest speed, the occupancy rates of CPU and memory are acceptable.

TABLE II. PERFORMANCE AND USAGE WITH DIFFERENT SAMPLE RATE

Case Sample rate FPS Time (ms) CPU

(%)

Mem

(%)

1 1/3 8.0 114.8 41.2 8.7

2 1/10 2.7 119.6 21.4 8.7

3 1/20 1.4 120.2 18.3 8.8

4 1/25 1.3 120.3 15.5 8.8

(a) Time

Page 6: An Edge Computing Platform of Guide-dog Robot for Visually ...

6

Fig. 11. Performance and Usage with different hardware configurations.

Fig. 12. Performance and Usage with different sample rates

(b) CPU usage

(c) Memory usage

(a) Time Performance

(b) Usage

Page 7: An Edge Computing Platform of Guide-dog Robot for Visually ...

7

Fig. 13. Histogram of Performance and Usage with different sample rates.

I. CONCLUSIONS

In this paper, we presented a guide dog robot prototype and developed an edge computing framework for the guide dog robot system. The intel high-performance embedded board plus a neural compute stick met the needs of video and audio processing with real-time control. VIPLE visual programming environment effectively integrated multiple decentralized services and devices. In our future work, computing resource of smart phone will be considered, and the algorithm of vision processing needs to be improved.

ACKNOWLEDGMENT

This work is partly supported by the Fundamental Research Funds for China Scholarship Council Program (Grant no. 201706155094), Science and Technology Planning Project of Guangzhou (Grant no. 201707010437), Science and Technology Planning Project of Guangdong Province, China (Grant no. 2017B090910005), the Central Universities (Grant no. 2015zz100).

REFERENCES

[1] Shi W, Cao J, Zhang Q, et al. Edge computing: Vision and challenges[J]. IEEE Internet of Things Journal, 2016, 3(5): 637-646.

[2] Yinong Chen, Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services, 6th edition, Kendall Hunt Publishing, 2018.

[3] Yinong Chen, Hualiang Hu, "Internet of Intelligent Things and Robot as a Service", Simulation Modelling Practice and Theory, Volume 34, May 2013, Pages 159–171.

[4] Yinong Chen, Zhihui Du, and Marcos Garcia-Acosta, M., "Robot as a Service in Cloud Computing", In Proceedings of the Fifth IEEE International Symposium on Service Oriented System Engineering (SOSE), Nanjing, June 4-5, 2010, pp.151-158.

[5] VIPLE, http://neptune.fulton.ad.asu.edu/VIPLE/.

[6] Tachi, K. Tanie, K. Komoriya and M. Abe, "Electrocutaneous Communication in a Guide Dog Robot (MELDOG)," in IEEE Transactions on Biomedical Engineering, vol. BME-32, no. 7, pp. 461-469, July 1985.

[7] Saegusa, Shozo, et al. "Development of a guide-dog robot: Leading and recognizing a visually-handicapped person using a LRF." Journal of Advanced Mechanical Design, Systems, and Manufacturing 4.1 (2010): 194-205.

[8] Saegusa, Shozo, et al. "Development of a guide-dog robot: human–robot interface considering walking conditions for a visually handicapped person." Microsystem Technologies 17.5-7 (2011): 1169-1174.

[9] Velázquez, Ramiro, et al. "An Outdoor Navigation System for Blind Pedestrians Using GPS and Tactile-Foot Feedback." Applied Sciences 8.4 (2018): 578.

[10] Wei, Yuanlong, Xiangxin Kou, and Mincheol Lee. "Development of a guide-dog robot system for the visually impaired by using fuzzy logic based human-robot interaction approach." Control, Automation and Systems (ICCAS), 2013 13th International Conference on. IEEE, 2013.

[11] Y. Wei and M. Lee, "A guide-dog robot system research for the visually impaired," 2014 IEEE International Conference on Industrial Technology (ICIT), Busan, 2014, pp. 800-805.

[12] J. Jakob and J. Tick, "Concept for transfer of driver assistance algorithms for blind and visually impaired people," 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl'any, 2017, pp. 000241-000246.

[13] Chaudhari, Gaurao, and Asmita Deshpande. "Robotic assistant for visually impaired using sensor fusion." 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 2017.

[14] J. Monteiro, J. P. Aires, R. Granada, R. C. Barros and F. Meneguzzi, "Virtual guide dog: An application to support visually-impaired people through deep convolutional neural networks," 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 2267-2274.

[15] W. M. Elmannai and K. M. Elleithy, "A Highly Accurate and Reliable Data Fusion Framework for Guiding the Visually Impaired," in IEEE Access, vol. 6, pp. 33029-33054, 2018.

[16] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.

Reference to this paper:

Jinhui Zhu, Yinong Chen, Mei Zhang, Qiang Chen, Ying Guo, Huaqing Min, and Zhilie Chen, "An Edge

Computing Platform of Guide-dog Robot for Visually Impaired", in Proceedings of IEEE 14th International

Symposium on Autonomous Decentralized Systems, Utrecht, the Netherlands, April 2019, pp. 23 - 30.