Self-Adaptive FPGA-Based Image Processing Filters … FPGA-Based Image Processing Filters Using...
Transcript of Self-Adaptive FPGA-Based Image Processing Filters … FPGA-Based Image Processing Filters Using...
Self-Adaptive FPGA-Based Image ProcessingFilters Using Approximate Arithmetics
Jutta Pirkl, Andreas Becher, Jorge Echavarria, Jürgen Teich, and Stefan WildermannHardware/Software Co-Design, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)SCOPES’17, St. Goar, Germany, June 12, 2017
Approximate Computing – A New Design Paradigm
Portable battery-powered devices Rapid workload increase
Underlying Idea
Trading accuracy of computations against disproportionate improvementswith respect to power consumption and/or performance and/or circuitarea.
Sources: https://www.lsr.com/embedded-wireless-modules/tiwiconnect, http://www.closdurocnoir.com/category/battery,http://blog.apterainc.com/business-intelligence/the-data-tsunami-is-coming.-is-your-company-ready
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 1
MotivationProblem:• Error-tolerance depends on both input data and application context• Quality-configurability is the key principle of prospective AC platforms1
Dynamic PartialReconfiguration
ApproximateComputing
Adaptation ofthe approx-
imation level
Approach: Dynamic autonomous swapping of filters with different degreesof approximation utilizing reconfigurable hardware
1S. Venkataramani et al. “Approximate computing and the quest for computing efficiency”. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference(DAC). June 2015, pp. 1–6.
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 2
Outline
Concepts of Self-Adaptive Image ProcessingApproximate 2D-Convolution FiltersQuality EvaluationReconfiguration Management
Experimental ResultsQuality-Configurable Control MechanismPartial Reconfiguration Overhead
Summary
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 3
Concepts of Self-Adaptive Image Processing
Approximate 2D-Convolution Filters
• Basic filter building block: 2D-convolution filter wrapperwith a kernel size of 3×3
n
m
Filter kernel
Y [m,n] = ∑i
∑j
H[i, j] ·X [m− i,n− j]
Output Filter kernel Input
• Parallel Multiply-Accumulate (MAC) operation in a pipelined adder treestructure
• Replacement of all adders by the same approximate version
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 4
Approximate Adder Structures on FPGAs2
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
LUT6_2. . .LUT6_2LUT5LUT6_2. . .LUT6_2
a0 b0a... b...am−2 bm−2am−1 bm−1am bma... b...an−1 bn−1
s0s...sm−2sm−1sms...sn−1 m
all1 c0c...cmc...cout
Most Significant Part Least Significant Part
Case 1: Carry suppressionMSP LSP
0 1 1 1 1 1 311 1 1 1
+ 0 0 1 0 1 1 111 0 0 0 1 0 3934 42
2A. Becher et al. “A LUT-based approximate adder”. In: Proceedings of the 24th Annual IEEE International Symposium on Field-Programmable CustomComputing Machines. FCCM ’16. Washington DC, USA, May 2016.
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 5
Approximate Adder Structures on FPGAs2
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
LUT6_2. . .LUT6_2LUT5LUT6_2. . .LUT6_2
a0 b0a... b...am−2 bm−2am−1 bm−1am bma... b...an−1 bn−1
s0s...sm−2sm−1sms...sn−1 m
all1 c0c...cmc...cout
Most Significant Part Least Significant Part
Case 1: Carry suppressionMSP LSP
0 1 1 1 1 1 311 1 1 1
+ 0 0 1 0 1 1 111 0 0 0 1 01 1 1 39 42
⇒ error reduction mechanism2A. Becher et al. “A LUT-based approximate adder”. In: Proceedings of the 24th Annual IEEE International Symposium on Field-Programmable Custom
Computing Machines. FCCM ’16. Washington DC, USA, May 2016
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 5
Approximate Adder Structures on FPGAs2
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
i0 i1 i2 i3 i4
o0 o1
LUT6_2. . .LUT6_2LUT5LUT6_2. . .LUT6_2
a0 b0a... b...am−2 bm−2am−1 bm−1am bma... b...an−1 bn−1
s0s...sm−2sm−1sms...sn−1 m
all1 c0c...cmc...cout
Most Significant Part Least Significant Part
Case 1: Carry suppressionMSP LSP
0 1 1 1 1 1 311 1 1 1
+ 0 0 1 0 1 1 111 0 0 0 1 01 1 1 39 42
⇒ error reduction mechanism
Case 2: Carry predictionMSP LSP
0 1 1 1 1 1 3111 1 1 1
+ 0 0 1 1 1 1 151 0 1 1 1 0 46
⇒ no approximation error2A. Becher et al. “A LUT-based approximate adder”. In: Proceedings of the 24th Annual IEEE International Symposium on Field-Programmable Custom
Computing Machines. FCCM ’16. Washington DC, USA, May 2016
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 5
Case Study – Approximate Gaussian Lowpass Filter
m = 1 m = 2
m = 4 m = 6
m = 8 m = 10
Artifacts:• Brightness decrease→ underestimating adder
• “Cartoon effect”
m = 9
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 6
Impact of the Carry Chain Splitting Point on the Output Quality
Dependency of the average Peak Signal-to-Noise Ratio (PSNR) on mamong the Kodak Lossless True Color Image Suite3
2 3 4 5 6 7 8 9 10 1120
25
30
35
40
45
50
55
60
Splitting Position of the Carry Chain (m)
Ave
rage
PS
NR
[dB
]
3R. Franzen. True Color Kodak Images. http://r0k.us/graphics/kodak/.
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 7
Quality Evaluation
• Problem: Requirement of a no-reference metric to assess the quality atruntime
• Approach: Feature extraction from the histograms of in- and output images
0
2,000
4,000
6,000
Freq
uenc
y
Input Image
0 50 100 150 200 2500
2,000
4,000
6,000
m = 1
0 50 100 150 200 2500
2,000
4,000
6,000
m = 3
0 50 100 150 200 250
0
2,000
4,000
6,000
m = 5
0 50 100 150 200 2500
2
4
6
·104
Gray level
m = 7
0 50 100 150 200 2500
2
4
6
·104 m = 9
0 50 100 150 200 250
• More and more pixels are mapped onto exactly the same brightness values→ “cartoon”-effect
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 8
Quality Evaluation
0
2
4
6·104
Gray level
Freq
uenc
y m = 1
0 50 100 150 200 2500
2
4
6·104
Gray level
m = 7
0 50 100 150 200 2500
2
4
6·104
Gray level
m = 9
0 50 100 150 200 250
Gauss kernel:
1
16
(1 2 12 4 21 2 1
)• Distinctive peaks created by the all1-signal→ erroneous sums are mapped onto 2n−m values
• Example: m = 9, output bit width afternormalization n = 8
before normalization: x20 · · ·x9 | 111111111after normalization by 16: x11x10x9 | 11111
→ smallest collection bin at 00011111b (31d )→ further peaks at a distance of 25 = 32
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 9
Quality Evaluation
0
2
4
6·104
Gray level
Freq
uenc
y m = 1
0 50 100 150 200 2500
2
4
6·104
Gray level
m = 7
0 50 100 150 200 2500
2
4
6·104
Gray level
m = 9
0 50 100 150 200 250
Gauss kernel:
1
16
(1 2 12 4 21 2 1
)• Distinctive peaks created by the all1-signal→ erroneous sums are mapped onto 2n−m values
• Example: m = 9, output bit width afternormalization n = 8
before normalization: x20 · · ·x9 | 111111111after normalization by 16: x11x10x9 | 11111
→ smallest collection bin at 00011111b (31d )→ further peaks at a distance of 25 = 32
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 9
Quality Evaluation
• Amount of counters to sample thehistograms constitutes a trade-offbetween overhead and fidelity
• Counting of the pixels with graylevels 31, 63, 127 and 191 in bothin- and output image
Definition QM
Ratio of the maximum peakheight of the four bins andthe corresponding amountin the input image
• Large metric value indicates badquality
Progression of QMwith increasing m
1 2 3 4 5 6 7 8 9 10 110
5
10
15
20
25
30
Splitting Position of the Carry Chain (m)
QM
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 10
Case Study – Approximate Gaussian Lowpass Filter
m = 1 m = 2
m = 4 m = 6
m = 8 m = 10
Artifacts:• Brightness decrease→ underestimating adder
• “Cartoon effect”
m = 9
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 10
Reconfiguration Management
• Objective:Minimization of the critical path while maintaining a given quality boundary→ successive approximation of m to the fastest configuration mopt
• Based on a bang-bang controller with integrated hysteresis
Decision logic for setting the degree of approximation
• Case 1: Quality is still acceptable→ in- or decrement the splitting position m in the direction of mopt
• Case 2: Quality boundary is exceeded→ in- or decrement m in the opposite direction of mopt
• Case 3: Quality metric is within dead zone→ keep configuration
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 11
Experimental Results
Results – Input-Based Adaptivity
System behavior at runtime for the approximate Gaussian filter
100 200 300 400 500 600 700 800 900 1,0000
1
2
3
4
5
6
7
τQM
Frames
QM
100 200 300 400 500 600 700 800 900 1,0002
4
6
8
10
Frames
m
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 12
Results – Input-Based Adaptivity
System behavior at runtime for the approximate Gaussian filter
100 200 300 400 500 600 700 800 900 1,0000
1
2
3
4
5
6
7
τQM
Frames
QM
static m = 6
100 200 300 400 500 600 700 800 900 1,0002
4
6
8
10
Frames
m
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 12
Results – Input-Based Adaptivity
System behavior at runtime for the approximate Gaussian filter
100 200 300 400 500 600 700 800 900 1,0000
1
2
3
4
5
6
7
τQM
Frames
QM
static m = 5static m = 6
100 200 300 400 500 600 700 800 900 1,0002
4
6
8
10
Frames
m
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 12
Results – Input-Based Adaptivity
System behavior at runtime for the approximate Gaussian filter
100 200 300 400 500 600 700 800 900 1,0000
1
2
3
4
5
6
7
τQM
Frames
QM
dynamicstatic m = 5static m = 6
100 200 300 400 500 600 700 800 900 1,0002
4
6
8
10
Frames
m
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 12
Results – Input-Based Adaptivity
System behavior at runtime for the approximate Gaussian filter
100 200 300 400 500 600 700 800 900 1,0000
1
2
3
4
5
6
7
τQM
Frames
QM
dynamicstatic m = 5static m = 6
100 200 300 400 500 600 700 800 900 1,0002
4
6
8
10
Frames
m
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 12
Results – Requirement-Based Adaptivity
System evaluation results4,5
low medium high
123456789
Quality Requirement
mav
g
DoA
aspen redkayak snowmnt touchdownpass pedestrian area demo video
low medium high0
10
20
30
40
50
60
Quality RequirementP
SN
Rav
g[d
B]
Output Quality
4Test videos: Derf’s collection + self-shot demo video, resolution of 640×480, grayscale 8 bits/pixel
5Evaluation Parameters: Adaptation rate of 2sec at a frame rate of 30 fps
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 13
Analysis of the Partial Reconfiguration Overhead
Reconfiguration time for the partial bitstreams6
XC7Z020 Partial Bitstream
Bitstream Size [KB]7 637.44Reconfiguration Time [ms] 1.62Download Rate [MB/s] 374.48
• Approximately linear correlation between configuration time andbitstream size
• Remaining time slot for the filtering process at 30 fps:• 33.33 ms - 1.62 ms = 31.71 ms• Partial reconfiguration requires 4.86 % of the time frame
6This table contains only the largest bitstream among the approximate variants for the Gaussian filter which determines the slowest transfer
7Full binary bitstream size for the xc7z020 device: 4,045,564 Bytes
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 14
Summary
Summary
• Challenge: Input-dependent approximation error behavior requiresself-adaptive methods
• Proposition of a no-reference metric for online output qualitymonitoring based on histogram information
• Our concept offers• better exploitation of a given error tolerance than static approximation• a user control knob to select the desired output quality at runtime
Thank you for listening!
Any questions?
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 15
Summary
• Challenge: Input-dependent approximation error behavior requiresself-adaptive methods
• Proposition of a no-reference metric for online output qualitymonitoring based on histogram information
• Our concept offers• better exploitation of a given error tolerance than static approximation• a user control knob to select the desired output quality at runtime
Thank you for listening!
Any questions?
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 15
Summary
• Challenge: Input-dependent approximation error behavior requiresself-adaptive methods
• Proposition of a no-reference metric for online output qualitymonitoring based on histogram information
• Our concept offers• better exploitation of a given error tolerance than static approximation• a user control knob to select the desired output quality at runtime
Thank you for listening!
Any questions?
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 15
Summary
• Challenge: Input-dependent approximation error behavior requiresself-adaptive methods
• Proposition of a no-reference metric for online output qualitymonitoring based on histogram information
• Our concept offers• better exploitation of a given error tolerance than static approximation• a user control knob to select the desired output quality at runtime
Thank you for listening!
Any questions?
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 15
Backup Slides
System Level Overview
User application+ Reconfiguration Manager
Linux Kernel
Driver Modules
/dev/image_filter
/dev/xdevcfg HW
/SW
Inte
rface
FilterController
QualityEvaluation
Reconfigurable Partition
FilterWrapper
SoC
Processing System Programmable Logic Main Memory
PR m = 1m = 2
m = 3· · ·
Software
• Reconfiguration Manager :Quality-control loop
• Linux device drivers as hardwareinterfaces
Hardware
• Approximate Filter Operators:Partial bitstreams for variousdegrees of approximation
• Quality Evaluation:Online quality monitoring
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 16
Correlation Between the Proposed Quality Metric and PSNR
30 40 50 600
5
10
15
20
PSNR [dB]
QM
Inverse relation: Increasing tendency of the metric with decreasing PSNR
Jutta Pirkl | Hardware/Software Co-Design (FAU) | Self-Adaptive FPGA-Based Image Processing SCOPES’17 17