Post on 13-Sep-2020
Outdoor Augmented Reality with Geo-Object Identification based on Deep Learning Network
Prof Dr Du Qingyun; Rao Jinmeng; Qiao Yanjun
qydu@whu.edu.cn
目 录contents PART 01
PART 02
PART 03
PART 04
Background
Roadmap
Methodology & Results
Outlook
01
PART ONE
Background
01 Background
Trend• Indoor to Outdoor• Wearable computing• LBS & Augmented Reality• Timeliness and Acurracy
Technologies
Image Registration• Non-visual Perception• Visual Identification
Domains
game、social activities、life、education…
01 1.Algorithms of Visual Identification
输入
卷积
C1特征映射S1特征映射
C2特征映射S2特征映射
输出
全连接层子采样 卷积 子采样 卷积
分类器
Feature points - based Deep Learning - based
• Complicated computing
• Error-prone
• Timeliness not guaranteed
• Human involved
• GPU acceleration needed
• High accuracy
• fast
• End-to end black box computing
01 2.Deep Learning
B
D
A
C1
2
3
4
Research in Deep Learning
Development:• CNN was firstly used in IMageNet
competition in 2012• Errors reduced by 10% to 16.4%
compared to feature points based approach
• Continue to reduce till to 3.57%
Applications:• Image Classification• Object Identification• Gesture Evaluation• Image Segmentation• Face Recognition
Mainstream Models:(Convolutional Neural network )
R-CNN、Fast R-CNN、Faster R-CNN、SSD、YOLO
Advantages:• Highly reliable• Timeliness• End-to-end• Training once, using
everywhere
02
PART TWO
Roadmap
02 Roadmap
• 2D attributes• 3D Geometry• HCI
LBS
• Mapping from 3D object to 2D screen
coordinate system
• Reversed coordinate system mapping
02 0401 03
Sensor based non-visual context perception
Visual Identification Model• Simplified CNN
Mixed Tracing and Registration
• surrounding retrieval• set limit to scope • Attribute association
AugumentedInformation
3D Image registration
03
PART THREE
Methodology and Results
03 Methodology
A B C
E D
Model DesignPrototype system
design
System Implementation
Model Training
Execution and Test
03 1.Model Design
Choosing a mode
• Based on Google research testing on mainstream CNN models, SSD (Single Shot Multibox Detector) has better performance on detection and identification of larger object
• SSD was simplified in model structure to facilitate using on mobile platform with reasonable performance
SSD
Simplified SSD
3682个检测
框
非极大值抑
制
SSD
224
224
3
ResNet-18
经过Conv3_2、Conv4_2、Conv5_2
Conv3_2 Conv4_2 Conv5_2Conv8_2 Conv9_2
Pool10
全局池化
28
2864
14
14128
7
7 256
4
4
2
2512 256 256
11
3*3*64 3*3*128 3*3*256Conv:1*1*256Conv:3*3*512
Conv:1*1*128Conv:3*3*256
全局池化
每个像素预测3个检测框
每个像素预测5个检测框
58.2mAP
33fps
03 1.Model DesignModel parameters
Convolutional Layer Output size Previous 18-layered network Simplified network
Conv1 112*112 7*7,64,stride:2 7*7,64,stride:2
Conv2_x 56*56 3 ∗ 3,643 ∗ 3,64
∗ 23 ∗ 3,323 ∗ 3,32
∗ 2
Conv3_x 28*28 3 ∗ 3,1283 ∗ 3,128
∗ 23 ∗ 3,643 ∗ 3,64
∗ 2
Conv4_x 14*14 3 ∗ 3,2563 ∗ 3,256
∗ 23 ∗ 3,643 ∗ 3,64
∗ 2
Conv5_x 7*7 3 ∗ 3,5123 ∗ 3,512
∗ 23 ∗ 3,2563 ∗ 3,256
∗ 2
Layer Kernel
Conv8_1 1*1*256,stride:1
Conv8_2 3*3*512,stride:2
Conv8_11 1*1*128、stride:1
Conv8_22 3*3*256、stride:2
Pool10 1*1,aver. Pooling layer
Layer default box No Default box Ratio Remark
Conv3_2 3 [1,2,0.5] In [1,2,0.5],
each value
represents th e
ratio of length
and width of
default box
Conv4_2 5 [1,2,0.5,3,1/3]
Conv5_2 5 [1,2,0.5,3,1/3]
Conv8_2 5 [1,2,0.5,3,1/3]
Conv9_2 5 [1,2,0.5,3,1/3]
Pool10 5 [1,2,0.5,3,1/3]
Pre network
Connectednetwork
Parameter of Default Box
03 2.Model Training
Data capture Data HandlingTraining
Environment• Mark the bounding
box for objects and their category
• Generate XML configuration file dynamically
• 3 Categories of object• 300 images for each
Category
• Platform:MXNET• Processor:GPU• OS:Linux Ubuntu14.04• Language:python
03 2.Model TrainingResult
Model training lasted for 12 hours with 224 times of iteration to reach convergence. mAP to 58.2%.0.03 second to finish the detection for one imageModel can be packaged into .so library for mobile usage
03 3. Prototype System Design
General Framework
图像处理模块
位置姿态计算模块 空间数据查询模块
增强信息模块
图像的输入
与输出
信息查询与
关联
二维和三维
信息显示
相机获取图像 图像转换技术 SSD引擎 检测结果解析
磁力传感器
加速度传感器
GPS传感器SQLite属性查
询周边查询
文字信
息设计
三维纹
理设计三维图
形设计
坐标转
换算法
相对位置
摄像机方向
摄像机姿态
视野内POI
Image input and output
Info query and assoc.
2D & 3D Display
Image Handling
Image Capture
Image Transfor-mation
SSD Engine
Detection result
interpretation
Position and Gesture Computing
GPS
Magnetic Sensor
Acceleration Sensor
Relative Position
Camera Orientation
Camera Gesture
POI
Spatial Query
SQLite Attribute Query
Surrounding Query
Augmented Information
Text Design
Texture Design
Graphics Design
Coordin-ate Trans.
03 3. Prototype System Design
Image Processing Model
安卓相机 捕获场景图片 YUV图像转化为RGB图像
图像输出
检测信息
手机显示SSD
引擎
卷积运算
边框线性回归
Softmax分类
特征层提取
Mobile Display
Image Output
Detection Information
Extraction of Feature Layer
Convolutional Computing
Linear Regression of Bounding Box
SoftMax Classification
SSD Engine
Android Camera Capture On-site Image YUV to RGB
03 3. Prototype System Design
1 根据视景体来计算X,Y轴的长度(刻度),
top=Z*arctan(θ
2),根据屏幕的长宽比width/height来计算出
right的值,right=top*width/height。
2. 计算出(u,v)坐标的占比,u/width,v/height。
3. 计算(x,y)坐标系下的坐标,x=(u/width)*2*right-right,y=top-(v/height)*2top。
4. 计算三维图形的四个坐标:定义宽度b_width为1和高度b_height为1
U
VX
Y(u,v)
(x,y)坐标转换 top
right
width
heightX
Y
(x,y,0) (x+1,y,0)
11
(x+1,y+1,0)(x,y+1,0)
3D Image Registration Algorithm
03 3. Prototype System Design
Augmented Info
id cn_name ob-name longitude latitude attribute Picture
1 SRES build. SRES 30.522716 114.35506 text label Binary
2 3S Statue 3SM 30.52798 114.355703 text label Binary
3 Teaching build. TEB 30.527728 114.356254 text label Binary
2D Attribute Info
Label Info
03 3. Prototype System Design
Interface Design
SurfaceView
真实物体
检测框类别与概率
三维图形及纹理
开始
暂停
GLSurfaceView
图片预览框
目标
目标
类别-概率
类别-概率
相机屏
幕
位置描述
周边POI
位置描述
周边POI
点击
点击
Detection Box
Real Object
Category and Probability
3D Graphics and Texure
Image Preview
Start
Pause
Camera Screen
Click
Category and Probability
Place DescriptionSurrounding POI
Place DescriptionSurrounding POI
Category and Probability
Object
Object
Click
03 4.System Implementation Development EnvironmentAndroid,Version 4.4.2,SDK: 25,JDK:java7Tool: Android Studio2.2.3,Language: java,Database:SQLite3
WhatActivity
onCreate() onPause() onDestroy() initCamera() startService() stopService() StreamIt(){}
setCallback() setImageByte() onSurfaceChanged() onSurfaceCreated() onDrawFrame() loadShader() loadProgram()
TeleRender
Forward() getOutput() nativeGetOutput() nativeForward()
Predictor
openDatabase() getARInfo()
SqliteTools
draw() setRotates() setTextureCoordinates() loadBitmap() loadGLTexture() setVertices() setIndices() setColor()
Mesh onSurfaceCreated() onSurfaceChanged() onDrawFrame() setBitmap() reDraw()
OpenGLRenderer
setBox() reSetObject()
ArObject
Class
03 4.System Implementation
Testing Statistics
Time Category Average confidence Ave time lapse(ms)
8:30 -9:00 SRES Build. 0.89 1486
3S Statue 0.87 1438
Teaching Build. 0.91 1509
11:30-12:00 SRES Build. 0.92 1502
3S Statue 0.91 1549
Teaching Build. 0.93 1527
17:30-18:00 SRES Build. 0.91 1456
3S Statue 0.86 1523
Teaching Build. 0.92 1501
03 4.System Implementation System execution
Testing with two objects HCI
03 4.System Implementation
04
PART SIX
Outlook
04 What was achieved
01
02
03
Introducing spatial information into mixed image tracing and registration
Identify the category of object using both visual info and non-visual spatial info, by which searching scope is much smaller by surrounding query
Using deep learning in outdoor augmented reality
Based on SSD model, by modifying pre-network, network structure was simplified. Geographic object identification can be 90% accurate and less than 1.5s is needed for one image
Implementing Geo-AR system on mobile phone
Localize SSD engine on Android platform. Integrating with sensor system of mobile phone, using local SQLite database, 3D registration was realized with algorithms of 2D-3D transformation
04 Outlook
Structure optimization of CNN
Tracing Accuracy Improvement
Augmented Information Enrichment
Thank you for attention!
AAG 2018, 10-14 April, New Orlean