IEEE 2013-2014 Project titles

VLSI 2013-2014 IEEE TITLES

Zuara Technologies

Battle with bugs

Zuara Technologies,

82, Station Road, Radha nagar, Chrompet,

Chennai – 44,

Contact No: 9677465689/9790891931

Mail : [email protected]

Web: www.zuaratech.com

mailto:[email protected]

Zuara Technologies Battle with bugs

No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].

Web site: www.zuaratech.com

82, Station road, Radha nagar, Chrompet, Chennai-44

Mob.No: 09677465689,

Mail id: [email protected]

1. Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low

Adaptation-Delay

In this paper, we present an efficient architecture for the implementation of a delayed least mean

square adaptive filter. For achieving lower adaptation-delay and area-delay-power efficient

implementation, we use a novel partial product generator and propose a strategy for optimized

balanced pipelining across the time-consuming combinational blocks of the structure. From

synthesis results, we find that the proposed design offers nearly 17% less area-delay product

(ADP) and nearly 14% less energy-delay product (EDP) than the best of the existing systolic

structures, on average, for filter lengths N=8, 16, and 32. We propose an efficient fixed-point

implementation scheme of the proposed architecture, and derive the expression for steady-state

error. We show that the steady-state mean squared error obtained from the analytical result

matches with the simulation result. Moreover, we have proposed a bit-level pruning of the

proposed architecture, which provides nearly 20% saving in ADP and 9% saving in EDP over

the proposed structure before pruning without noticeable degradation of steady-state-error

performance.

2. Critical-Path Analysis and Low-Complexity Implementation of the LMS Adaptive

Algorithm

This paper presents a precise analysis of the critical path of the least-mean-square (LMS)

adaptive filter for deriving its architectures for high-speed and low-complexity implementation.

It is shown that the direct-form LMS adaptive filter has nearly the same critical path as its

transpose-form counterpart, but provides much faster convergence and lower register

LOW POWER VLSI





Mob.No: 09677465689,


complexity. From the critical-path evaluation, it is further shown that no pipelining is required

for implementing a direct-form LMS adaptive filter for most practical cases, and can be realized

with a very small adaptation delay in cases where a very high sampling rate is required. Based on

these findings, this paper proposes three structures of the LMS adaptive filter: (i) Design 1

having no adaptation delays, (ii) Design 2 with only one adaptation delay, and (iii) Design 3 with

two adaptation delays. Design 1 involves the minimum area and the minimum energy per sample

(EPS). The best of existing direct-form structures requires 80.4% more area and 41.9% more

EPS compared to Design 1. Designs 2 and 3 involve slightly more EPS than the Design 1 but

offer nearly twice and thrice the MUF at a cost of 55.0% and 60.6% more area, respectively.

3. Efficient Integer DCT Architectures for HEVC

In this paper, we present area- and power-efficient architectures for the implementation of

integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video

Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to

derive parallel architectures for 1-D integer DCT of different lengths. We also show that the

proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32

DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed

architecture could be pruned to reduce the complexity of implementation substantially with only

a marginal affect on the coding performance. We propose power-efficient structures for folded

and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the

proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy

per sample (EPS) compared to the direct implementation of the reference algorithm, on average,

for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%

saving in EPS can be achieved by the proposed pruning algorithm with nearly the same





Mob.No: 09677465689,


throughput rate. The proposed architecture is found to support ultrahigh definition 7680 × 4320

at 60 frames/s video, which is one of the applications of HEVC.

4. An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply

Operator

Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.

In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for

increasing performance. We investigate techniques to implement the direct recoding of the sum

of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient

recoding technique and explore three different schemes by incorporating them in FAM designs.

Comparing them with the FAM designs which use existing recoding schemes, the proposed

technique yields considerable reductions in terms of critical delay, hardware complexity and

power consumption of the FAM unit.

5. Improved design of high-frequency sequential decimal multipliers

Hardware implementation of decimal arithmetic operations has become a hot topic for research

during the last decade. Among various operations, decimal multiplication is considered as one of

the most complicated dyadic operations, which requires high-cost hardware implementation.

Therefore, the processor industry has opted to use the sequential decimal multipliers to reduce

the high cost of parallel architectures. However, the main drawback of iterative multipliers is

their high latency. In this reported work, the focus has been on reducing the latency of decimal

sequential multipliers while maintaining a low cost of area. Consequently, a high-frequency

sequential decimal multiplier is proposed whose cycle time is reduced to the latency of a binary

half-adder plus that of a decimal multiply-by-two operation, which overall is less than that of a

decimal carry-save adder. The synthesis results reveal that the proposed sequential multiplier





Mob.No: 09677465689,


works with a higher clock frequency than the fastest previous decimal multiplier which in turn

leads to overall latency advantage.

6. On-Chip Codeword Generation to Cope With Crosstalk

Capacitive and inductive coupling between bus lines results in crosstalk induced delays. Many

bus encoding techniques have been proposed to improve the performance. Existing

implementation techniques and mapping algorithms in the literature only apply the specific

encoding. This paper presents the first generalized framework for a stall-free on-chip codeword

generation strategy that is scalable and easy to automate. It is applicable to the coupling aware

encoding techniques that allow recursive codeword generation. The proposed implementation

strategy iteratively generates codewords without explicitly enumerating them. Codeword

mapping relies on graph-based representation that is unique to the given encoding technique. The

codewords are calculated on-chip using basic function blocks, such as adders and multiplexers.

Three encoding techniques were implemented using the proposed strategy. Experimental results

show significant reduction in the area overhead and power dissipation over the existing method

that uses random logic to implement the codec.

7. Effects of Random Delay Errors in Continuous-Time Semi-Digital Transversal

Filters

The implementation of transversal filters requires basic circuit elements such as adders,

multipliers and (unit) delay elements. The filters designed under infinite precision of these

elements may behave differently when implemented with components with limited accuracy. In

fact, the effects of the coefficient inaccuracies in analog and digital transversal filters have been

investigated extensively in the literature [1], [2]. On the other hand, the effects of the unit delays

with limited precision have not received similar attention. In this paper, we find that such effects





Mob.No: 09677465689,


especially in very high frequency continuous-time semi-digital transversal filters may not be

ignored. As an example, we analyze the impact of delay errors in the implementation of the

direct modulation transmitter. Specifically, we provide the analytical statistical performance

bounds and confirm the results with simulations.

8. Digitally Synthesized Stochastic Flash ADC Using Only Standard Digital Cells

It is demonstrated in this paper that it is possible to synthesize a stochastic flash ADC entirely

from Verilog code and a standard digital library. An analog comparator is introduced that is

constructed from two cross-coupled 3-input digital NAND gates, and can be described in

Verilog. The synthesized comparators have random, Gaussian offsets that are used as virtual

voltage references to make a flash ADC. A piecewise-linear inverse Gaussian CDF function is

used to correct the nonlinearity introduced by the Gaussian offset distribution. The prototype IC

is fabricated in 90 nm CMOS and implements a 2047-comparator version of the proposed

architecture. All components including the comparators, the ones adder, and the peicewise

inverse Gaussian function are all implemented in Verilog. Conventional digital synthesis and

place-and-route is then used to generate the physical layout, making this the first fully

synthesized ADC. SNDR of 35.9 dB (without calibration) is achieved at 210 MSPS from the

Verilog synthesized design.

9. Memory Footprint Reduction for Power-Efficient Realization of 2-D Finite Impulse

Response Filters

We have analyzed memory footprint and combinational complexity to arrive at a systematic

design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D)

finite impulse response (FIR) filter. We have presented novel block-based structures for

separable and non-separable filters with less memory footprint by memory sharing and memory-





Mob.No: 09677465689,


reuse along with appropriate scheduling of computations and design of storage architecture. The

proposed structures involve L times less storage per output (SPO), and nearly L times less energy

consumption per output (EPO) compared with the existing structures, where L is the input block-

size. They involve L times more arithmetic resources than the best of the corresponding existing

structures, and produce L times more throughput with less memory band-width (MBW) than

others. We have also proposed separate generic structures for separable and non-separable filter-

banks, and a unified structure of filter-bank constituting symmetric and general filters. The

proposed unified structure for 6 parallel filters involves nearly 3.6L times more multipliers, 3L

times more adders, (N2-N+2) less registers than similar existing unified structure, and computes

6L times more filter outputs per cycle with 6L times less MBW than the existing design, where

N is FIR filter size in each dimension. ASIC synthesis result shows that for filter size (4 × 4),

input-block size L=4, and image-size (512 × 512), proposed block-based non-separable and

generic non-separable structures, respectively, involve 5.95 times and 11.25 times less area-

delay-product (ADP), and 5.81 times and 15.63 times less EPO than the corresponding existing

structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO

than the corresponding existing structure.

10. Improved matrix multiplier design for high-speed digital signal processing

applications

A transistor level implementation of an improved matrix multiplier for high-speed digital signal

processing applications based on matrix element transformation and multiplication is reported in

this study. The improvement in speed was achieved by rearranging the matrix element into a

two-dimensional array of processing elements interconnected as a mesh. The edges of each row

and column were interconnected in torus structure, facilitating simultaneous implementation of





Mob.No: 09677465689,


several multiplications. The functionality of the circuitry was verified and the performance

parameters for example, propagation delay and dynamic switching power consumptions were

calculated using spice spectre using 90 nm CMOS technology. The proposed methodology

ensures substantial reduction in propagation delay compared with the conventional algorithm,

systolic array and pseudo number theoretic transformation (PNTT)-based implementation, which

are the most commonly used techniques, for matrix multiplication. The propagation delay of the

implemented 4 × 4 matrix multiplierwas only ~2 μs, whereas the power consumption of the

implemented 4 × 4 matrix multiplier was ~3.12 mW only. Improvement in speed compared with

earlier reported matrix multipliers, for example, conventional algorithm, systolic array and

PNTT-based implementation was found to be ~67, ~56 and ~65%, respectively.

11. High Step-Up High-Efficiency Interleaved Converter With Voltage Multiplier

Module for Renewable Energy System

A novel high step-up converter, which is suitable for renewable energy system, is proposed in

this paper. Through a voltage multiplier module composed of switched capacitors and coupled

inductors, a conventional interleaved boost converter obtains high step-up gain without operating

at extreme duty ratio. The configuration of the proposed converter not only reduces the current

stress but also constrains the input current ripple, which decreases the conduction losses and

lengthens the lifetime of the input source. In addition, due to the lossless passive clamp

performance, leakage energy is recycled to the output terminal. Hence, large voltage spikes

across the main switches are alleviated, and the efficiency is improved. Even the low voltage

stress makes the low-voltage-rated MOSFETs be adopted for reductions of conduction losses and

cost. Finally, the prototype circuit with 40-V input voltage, 380-V output, and 1000-W output

power is operated to verify its performance. The highest efficiency is 97.1%.





Mob.No: 09677465689,


12. Ultra-High Throughput Low-Power Packet Classification

Packet classification is used by networking equipment to sort packets into flows by comparing

their headers to a list of rules, with packets placed in the flow determined by the matched rule. A

flow is used to decide a packet's priority and the manner in which it is processed. Packet

classification is a difficult task due to the fact that all packets must be processed at wire speed

and rulesets can contain tens of thousands of rules. The contribution of this paper is a hardware

accelerator that can classify up to 433 million packets per second when using rulesets containing

tens of thousands of rules with a peak powerconsumption of only 9.03 W when using a Stratix III

field-programmable gate array (FPGA). The hardware accelerator uses a modified version of the

HyperCuts packet classification algorithm, with a new pre-cutting process used to reduce the

amount of memory needed to save the search structure for large rulesets so that it is small

enough to fit in the on-chip memory of an FPGA. The modified algorithm also removes the need

for floating point division to be performed when classifying a packet, allowing higher clock

speeds and thus obtaining higher throughputs.

13. Low-Cost Low-Power ASIC Solution for Both DAB+ and DAB Audio Decoding

DAB+ is the upgraded version of digital audio broadcasting (DAB). DAB and DAB+ coexist in

many countries, so receivers are required to be compatible with both standards. In this paper, a

solution integrating an MPEG1-LayerII (MP2) decoder and an advanced audio coding

(AAC) low-complexity (AAC LC) decoder is proposed to provide basic audio decoding for both

DAB and DAB+. It also utilizes simple methods to improve high frequencies and stereo quality





Mob.No: 09677465689,


instead of complicated spectrum band replication and parametric stereo. A highly integrated low-

power audio decoder design compatible with DAB/DAB+ and using a purely ASIC approach is

presented. As a result of the system structure optimization and hardware sharing, the audio

decoder is fabricated in 1P4M 0.18- μm CMOS technology using only 3.2 mm2 silicon area

(including 147 456 bits RAM and 170 496 bits ROM). The powerconsumption of the audio

decoder is 10.4 mW for DAB audio decoding and 8.5 mW for DAB+ audio decoding.

Laboratory and field tests show that the function is correct and the audio quality is good for

receiving both DAB and DAB+. The audio decoder is thus proven to be a low-cost low-

power solution for the two existing DAB standards.

14. Low-Power Digital Signal Processor Architecture for Wireless Sensor Nodes

Radio communication exhibits the highest energy consumption in wireless sensor nodes. Given

their limited energy supply from batteries or scavenging, these nodes must trade data

communication for on-the-node computation. Currently, they are designed around off-the-

shelf low-power microcontrollers. But by employing a more appropriate processing element, the

energy consumption can be significantly reduced. This paper describes the design and

implementation of the newly proposed folded-tree architecture for on-the-node data processing

in wireless sensor networks, using parallel prefix operations and data locality in hardware.

Measurements of the silicon implementation show an improvement of 10-20× in terms of energy

as compared to traditional modern micro-controllers found in sensor nodes.





Mob.No: 09677465689,


15. Area–Delay–Power Efficient Carry-Select Adder

In this brief, the logic operations involved in conventional carry select adder (CSLA) and binary

to excess-1 converter (BEC)-based CSLA are analyzed to study the data dependence and to

identify redundant logic operations. We have eliminated all the redundant logic operations

present in the conventional CSLA and proposed a new logic formulation for CSLA. In the

proposed scheme, the carry select (CS) operation is scheduled before the calculation of final-

sum, which is different from the conventional approach. Bit patterns of two anticipating carry

words (corresponding to $c_{rm in} = 0 hbox{and} 1$) and fixed $c_{rm in}$ bits are used for

logic optimization of CS and generation units. An efficient CSLA design is obtained using

optimized logic units. The proposed CSLA design involves significantly less area and delay than

the recently proposed BEC-based CSLA. Due to the small carry-output delay, the proposed

CSLA design is a good candidate for square-root (SQRT) CSLA. A theoretical estimate shows

that the proposed SQRT-CSLA involves nearly 35% less area–delay–product (ADP) than the

BEC-based SQRT-CSLA, which is best among the existing SQRT-CSLA designs, on average,

for different bit-widths. The application-specified integrated circuit (ASIC) synthesis result

shows that the BEC-based SQRT-CSLA design involves 48% more ADP and consumes 50%

more energy than the proposed SQRT-CSLA, on average, for different bit-widths.

16. An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply

Operator

Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.

In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for

increasing performance. We investigate techniques to implement the direct recoding of the sum

of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient





Mob.No: 09677465689,


recoding technique and explore three different schemes by incorporating them in FAM designs.

Comparing them with the FAM designs which use existing recoding schemes, the proposed

technique yields considerable reductions in terms of critical delay, hardware complexity

and power consumption of the FAM unit.

17. Improved design of high-frequency sequential decimal multipliers

Hardware implementation of decimal arithmetic operations has become a hot topic for research

during the last decade. Among various operations, decimal multiplication is considered as one of

the most complicated dyadic operations, which requires high-cost hardware implementation.

Therefore, the processor industry has opted to use the sequential decimal multipliers to reduce

the high cost of parallel architectures. However, the main drawback of iterative multipliers is

their high latency. In this reported work, the focus has been on reducing the latency of decimal

sequential multipliers while maintaining a low cost of area. Consequently, a high-frequency

sequential decimal multiplier is proposed whose cycle time is reduced to the latency of a binary

half-adder plus that of a decimal multiply-by-two operation, which overall is less than that of a

decimal carry-save adder. The synthesis results reveal that the proposed sequential multiplier

works with a higher clock frequency than the fastest previous decimal multiplier which in turn

leads to overall latency advantage.

18. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for

Efficient FIR Filter Implementation

Multiple constant multiplication (MCM) scheme is widely used for implementing transposed

direct-form FIR filters. While the research focus of MCM has been on more effective common

subexpression elimination, the optimization of adder-trees, which sum up the computed sub-





Mob.No: 09677465689,


expressions for each coefficient, is largely omitted. In this paper, we have identified the resource

minimization problem in the scheduling of adder-tree operations for the MCM block, and

presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based

implementation of FIR filters. Experimental result shows that up to 15% reduction of area and

11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved

on the top of already optimized adder/subtractor network of the MCM block.

19. Improved matrix multiplier design for high-speed digital signal processing

applications

A transistor level implementation of an improved matrix multiplier for high-speed digital signal

processing applications based on matrix element transformation and multiplication is reported in

this study. The improvement in speed was achieved by rearranging the matrix element into a

two-dimensional array of processing elements interconnected as a mesh. The edges of each row

and column were interconnected in torus structure, facilitating simultaneous implementation of

several multiplications. The functionality of the circuitry was verified and the performance

parameters for example, propagation delay and dynamic switching power consumptions were

calculated using spice spectre using 90 nm CMOS technology. The proposed methodology

ensures substantial reduction in propagation delay compared with the conventional algorithm,

systolic array and pseudo number theoretic transformation (PNTT)-based implementation, which

are the most commonly used techniques, for matrix multiplication. The propagation delay of the

implemented 4 × 4 matrix multiplierwas only ~2 μs, whereas the power consumption of the

implemented 4 × 4 matrix multiplier was ~3.12 mW only. Improvement in speed compared with

earlier reported matrix multipliers, for example, conventional algorithm, systolic array and

PNTT-based implementation was found to be ~67, ~56 and ~65%, respectively.





Mob.No: 09677465689,


20. A Novel Distortion Model and Lagrangian Multiplier for Depth Maps Coding

In three-dimensional videos (3-DV) coding systems, depth maps are not used for viewing but for

rendering virtual views. Therefore, the traditional rate distortion criterion (including distortion

criterion, and Lagrangian multiplier) is not suitable for depth map coding. In order to design an

effective rate distortion criterion for depth maps, the relationship between the distortion of

synthesized virtual view and the coding error of depth maps is analyzed in detail. Through the

analysis, a polynomial model revealing the relationship between the coding error of depth maps

and the distortion of synthesized virtual view is derived. Model parameters are estimated by

utilizing camera parameters and features of the texture video corresponding to the depth map.

Based on the model, a virtual view-based Lagrangian multiplierfor depth map coding is also

proposed. Experimental results demonstrated the accuracy of the model. The squared correlation

coefficients between the actual distortion of virtual view and the estimated distortion are all

larger than 0.98 for all tested sequences. When incorporating the proposed model and

Lagrangian multiplier into the mode decision procedure of joint model version 18.5 (JM18.5) of

H.264/AVC, a maximum 0.470 dB BD PSNR and an average 0.251 dB BD PSNR can be

achieved.

21. Dual-Basis Superserial Multipliers for Secure Applications and Lightweight

Cryptographic Architectures

Cryptographic algorithms utilize finite-field arithmetic operations in their computations. Due to

the constraints of the nodes which benefit from the security and privacy advantages of these

algorithms in sensitive applications, these algorithms need to be lightweight. One of the well-

known bases used in sensitive computations is dual basis (DB). In this brief, we present low-





Mob.No: 09677465689,


complexity superserial architectures for the DB multiplication over GF(2m

). To the best of our

knowledge, this is the first time that such a multiplier is proposed in the open literature. We have

performed complexity analysis for the proposed lightweight architectures, and the results show

that the hardware complexity of the proposed superserial multiplier is reduced compared with

that of regular serial multipliers. This has been also confirmed through our application-specific

integrated circuit hardware- and time-equivalent estimations. The proposed superserial

architecture is a step forward toward efficient and lightweight cryptographic algorithms and is

suitable for constrained implementations of cryptographic primitives in applications such as

smart cards, handheld devices, life-critical wearable and implantable medical devices, and

constrained nodes in the blooming notion of Internet of nano-Things.

22. Multifunction Residue Architectures for Cryptography

A design methodology for incorporating Residue Number System (RNS) and Polynomial

Residue Number System (PRNS) in Montgomery modular multiplication in GF(p) or GF(2n)

respectively, as well as a VLSI architecture of a dual-field residue arithmetic

Montgomery multiplier are presented in this paper. An analysis of input/output conversions

to/from residue representation, along with the proposed residue Montgomery multiplication

algorithm, reveals common multiply-accumulate data paths both between the converters and

between the two residue representations. A versatile architecture is derived that supports all

operations of Montgomery multiplication in GF(p) and GF(2n), input/output conversions, Mixed

Radix Conversion (MRC) for integers and polynomials, dual-field modular exponentiation and

inversion in the same hardware. Detailed comparisons with state-of-the-art implementations

prove the potential of residue arithmetic exploitation in dual-field modular multiplication.





Mob.No: 09677465689,


1. Physical Layer Encryption in OFDM-PON Employing Time-Variable Keys From

ONUs

We propose and experimentally demonstrate a dynamic encryption method to realize physical

layer security for orthogonal frequency division multiplexing passive optical network (OFDM-

PON). In our scheme, encryption of the downstream signal is obtained by applying exclusive or

(xor) operation between optical network units' (ONUs') downstream signals and received

upstream signals at the optical line terminal side. The upstream signals are used as secure keys

for corresponding ONUs. Then the encrypted downstream signals are sent to the ONU sides,

where the downstream signal can be retrieved by applying xor operation again between the

encrypted downstream signal and the stored upstream signal. Since each ONU cannot obtain the

upstream signals of other ONUs, only the ONU itself can recover its downstream signal from the

encrypted downstream signal. Moreover, the secure key is dynamically changing along with the

upstream signal, significantly improving the security of the downstream signal for the OFDM-

PON system. A 5-Gb/s 16-quadrature amplitude modulation OFDMsignal with xor-based

encryption has been successfully implemented over a 25-km standard single-mode fiber.

Experimental results verify that the encryption scheme can effectively prevent eavesdropping by

malicious users.

Signal processing & Communications





Mob.No: 09677465689,


2. Channel Quantization Using Constellation Based Codebooks for Multiuser MIMO-

OFDM

In this paper, we propose clustered quantization techniques for multiuser multi-input/multi-

output (MIMO) orthogonal frequency division multiplexing (OFDM) using constellation based

codebooks. Constellation based codebooks provide scalability and efficient codeword search

capability, which are key features for practical multiuser MIMO-OFDM systems with a large

number of antennas. The proposed clustered quantization scheme quantizes consecutive

subcarriers into a single codeword that minimizes aggregated quantization errors. We base our

new clustering techniques on two constellation based quantization methods, namely equal-

magnitude angular quantization (EMAQ) and squared-lattice angular quantization. New efficient

codebook search algorithms are proposed for the clustered quantization. In addition, we propose

new constellations to guarantee different users quantize channels into distinct codewords. One is

a rotated M-PSK constellation suitable for randomly-distributed user scenarios, and the other is a

random phase equal-magnitude (RPEM) constellation suitable for ill-conditioned user scenarios.

Thus, full spatial multiplexing gain can be achievable even with small number of users. Finally, a

near-sphere codeword search algorithm is proposed for the RPEM. In simulations, the proposed

clustered quantization shows up to 50% higher throughput compared to conventional fixed-pilot

channel quantization. Also, we show our new constellations for EMAQ improve throughput

almost 35% compared to the standard EMAQ.

3. Impulse Noise Estimation and Removal for OFDM Systems

Orthogonal Frequency Division Multiplexing (OFDM) is a modulation scheme that is widely

used in wired and wireless communication systems. While OFDM is ideally suited to deal with

frequency selective channels and AWGN, its performance may be dramatically impacted by the





Mob.No: 09677465689,


presence of impulse noise. In fact, very strong noise impulses in the time domain might result in

the erasure of whole OFDM blocks of symbols at the receiver. Impulse noise can be mitigated by

considering it as a sparse signal in time, and using recently developed algorithms for sparse

signal reconstruction. We propose an algorithm that utilizes the guard band null subcarriers for

the impulse noise estimation and cancellation. Instead of relying on ell_1 minimization as done

in some popular general-purpose compressive sensing schemes, the proposed method jointly

exploits the specific structure of this problem and the available a priori information for sparse

signal recovery. The computational complexity of the proposed algorithm is very competitive

with respect to sparse signal reconstruction schemes based on ell_1 minimization. The proposed

method is compared with respect to other state-of-the-art methods in terms of achievable rates

for an OFDM system with impulse noise and AWGN.

4. A Low Complexity PAPR Reduction Scheme for OFDM Systems via Neural

Networks

Peak-to-average power ratio (PAPR) reduction is one of the key components in orthogonal

frequency division multiplexing (OFDM) systems. Among various PAPR reduction techniques,

artificial neural network (NN) has been one of the powerful techniques in reducing the PAPR

due to its good generalization properties with flexible modeling and learning capabilities. In this

letter, we propose a new method that uses NNs trained on the active constellation extension

(ACE) signals to reduce the PAPR of OFDM signals. Unlike other NN based techniques, the

proposed method employs a receiver NN unit, at the OFDM receiver side, achieving significant

bit error rate (BER) improvement with low computational complexity.





Mob.No: 09677465689,


5. An Adaptive Allocation Scheme in Multiuser OFDM Systems with Time-Varying

Channels

Previously, a scheme in [1] is proposed for the subcarrier, bit, and power allocation problem to

minimize the total transmit power for multiuser orthogonal frequency division multiplexing

systems in downlink transmission. However, it is a batch mode which may not be so efficient in

terms of computational complexity for slowly time-varying channel environments. The solution

of the current frame can be obtained with slight modification from that of the previous frame in

an adaptive fashion. By utilizing this property, we propose a scheme to obtain the solution in the

adaptive fashion, which offers comparable performance with a reduced complexity compared to

the previously proposed method and other existing suboptimal methods. Based on a derived

expression composed of the channel gains, the numbers of assigned subcarriers, and the data

rates along with the new proposed processing procedures, the proposed adaptive scheme is able

to track the channel variation for the solution adjustment in a faster speed compared to the

original batch mode method. Simulation results reveal that the proposed adaptive scheme has the

competitive performance compared with those of the optimal and the existing schemes while the

computational complexity and the number of iterations are both reduced.

6. Channel estimation and symbol detection for OFDM systems using data-nulling

superimposed pilots

A novel data-nulling superimposed pilot scheme for orthogonal frequency division multiplexing

(OFDM) systems is proposed, where the input data vector is spread over all the subcarriers by a

precoding matrix and then nulled at certain subcarriers for the insertion of training pilots. This

method avoids the loss of the data rate for frequency-division multiplexed pilots, but results in





Mob.No: 09677465689,


the distortion of input data. To mitigate the distortion introduced by the nulling operation, a

simple iterative reconstruction scheme is used to improve the detection performance.

7. Designing Hardware-Efficient Fixed-Point FIR Filters in an Expanding

Subexpression Space

This paper presents a practical method for designing fixed-point FIR filters. The proposed

method takes both the filter's magnitude response and its hardware cost into consideration in the

design process. The method constructs a basis set based on the fixed-point coefficients that have

been synthesized already. The elements in the basis set are used to synthesize the undetermined

fixed-point coefficients later. Thus, this basis set expands gradually along with the progress of

the coefficient design. The method employs some strategies to speed up the design process. For

example, a complexity estimation strategy helps us stop digging deeper in some branches of the

search tree, and a solution prediction strategy for high-order FIR filters helps us design fixed-

point FIR filters of length equal to a few hundreds. Applying the proposed method to design

twenty benchmark cases, we can obtain hardware-efficient results in a reasonable design time. In

two long filter design cases, our design results are better than those designed by the other

methods.

8. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for

Efficient FIR Filter Implementation

Multiple constant multiplication (MCM) scheme is widely used for implementing transposed

direct-formFIR filters. While the research focus of MCM has been on more effective common

subexpression elimination, the optimization of adder-trees, which sum up the computed sub-





Mob.No: 09677465689,


expressions for each coefficient, is largely omitted. In this paper, we have identified the resource

minimization problem in the scheduling of adder-tree operations for the MCM block, and

presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based

implementation of FIR filters. Experimental result shows that up to 15% reduction of area and

11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved

on the top of already optimized adder/subtractor network of the MCM block.

9. Optimal Memory for Discrete-Time FIR Filters in State-Space

In this correspondence, we propose an efficient estimator of optimal memory (averaging

interval) for discrete-time finite impulse response (FIR) filters in state-space. Its crucial property

is that only real measurements and the filter output are involved with no reference and noise

statistics. Testing by the two-state polynomial model has shown a very good correspondence

with predicted values. Even in the worst case of the harmonic model, the estimator demonstrates

practical applicability.

10. On Efficient Design of High-Order Filters With Applications to Filter Banks and

Transmultiplexers With Large Number of Channels

This paper proposes a method for designing high-order linear-phase finite-length impulse

response (FIR) filters which are required as, e.g., the prototype filters in filter banks (FBs) and

transmultiplexers (TMUXs) with a large number of channels. The proposed method uses the

Farrow structure to express the polyphase components of the desired filter. Thereby, the only

unknown parameters, in the filterdesign, are the coefficients of the Farrow subfilters. The





Mob.No: 09677465689,


number of these unknown parameters is considerably smaller than that of the direct filter design

methods. Besides these unknown parameters, the proposed method needs some predefined

multipliers. Although the number of these multipliers is larger than the number of unknown

parameters, they are known a priori. The proposed method is generally applicable to any linear-

phase FIR filter irrespective of its order being high, low, even, or odd as well as the impulse

response being symmetric or antisymmetric. However, it is more efficient forfilters with high

orders as the conventional design of such filters is more challenging. For example, to design a

linear-phase FIR lowpass filter of order 131071 with a stopband attenuation of about 55 dB,

which is used as the prototype filter of a cosine modulated filter bank (CMFB) with 8192

channels, our proposed method requires only 16 unknown parameters. The paper gives design

examples for individual lowpass filters as well as the prototype filters for fixed and flexible

modulated FBs.

11. Frequency Estimation of Distorted and Noisy Signals in Power Systems by FFT-

Based Approach

This paper focuses on the accurate frequency estimation of power signals corrupted by a

stationary white noise. The noneven item interpolation FFT based on the triangular self-

convolution window is described. A simple analytical expression for the variance of noise

contribution on the frequency estimation is derived, which shows the variances of frequency

estimation are proportional to the energy of the adopted window. Based on the proposed method,

the noise level of the measurement channel can be estimated, and optimal parameters (e.g.,

sampling frequency and window length) of the interpolation FFT algorithm that minimize the





Mob.No: 09677465689,


variances of frequency estimation can thus be determined. The application in a power quality

analyzer verified the usefulness of the proposed method.

12. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and

Calibration Approaches

The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral

analysis because of its computational efficiency. However, the presence of multiple tones

frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the

use of a large number of FFT points that results in large area and power overheads. In this paper,

an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular

harmonic and intermodulation components. This accurate FFT analysis approach is based on

coherent sampling, but it requires a significantly smaller number of points to make

the FFT realization more suitable for on-chip built-in testing and calibration applications that

require area and power efficiency. The technique was assessed by comparing the simulation

results from the proposed method of single and multiple tones with the simulation results

obtained from the FFT of coherently sampled tones. The results indicate that the proper selection

of test tone frequencies can avoid spectral leakage even with multiple narrowly spaced tones.

When low-frequency signals are captured with an analog-to-digital converter (ADC) for on-chip

analysis, the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth

limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation

(IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50

dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS





Mob.No: 09677465689,


technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its

estimated power consumption is 6.47 mW.

13. A 16-Core Processor With Shared-Memory and Message-Passing Communications

A 16-core processor with both message-passing and shared-memory inter-core communication

mechanisms is implemented in 65 nm CMOS. Message-passing communication is enabled in a 3

× 6 Mesh packet-switched network-on-chip, and shared-memory communication is supported

using the shared memory within each cluster. The processor occupies 9.1 mm2 and operates fully

functional at a clock rate of 750 MHz at 1.2 V and maximum 800 MHz at 1.3 V. Each core

dissipates 34 mW under typical conditions at 750 MHz and 1.2 V while executing embedded

applications such as an LDPC decoder, a 3780-point FFT module, an H.264 decoder and an LTE

channel estimator.





Mob.No: 09677465689,


1. Subjective evaluation of HEVC and AVC/H.264 in mobile environments

This paper compares the quality of AVC/H.264 and HEVC encoded video in low bandwidth

mobile environments. In this study, the focus within the mobile environment is smart phones.

The key characteristics of a smart phone are smaller screen size, which is usually 3.5 inches

diagonal to 5.0 inches diagonal for high end smart phones and typical cellular network

bandwidth, which is 3G or faster. Subjective evaluations were conducted to evaluate the user

experience on a mobile device with a small screen size and video coded at 200 and 400 Kbps.

The studies showed compelling evidence that a user's experience in low bandwidth mobile

environments is very similar between HEVC and AVC/H.264. The results suggest the benefits of

HEVC over AVC/H.264 in a mobile environment with lower video bitrates and resolutions are

not as clear.

2. Efficient Integer DCT Architectures for HEVC

In this paper, we present area- and power-efficient architectures for the implementation of

integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video

Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to

derive parallel architectures for 1-D integer DCT of different lengths. We also show that the

proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32

DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed

Audio, Image and Video Processing





Mob.No: 09677465689,


architecture could be pruned to reduce the complexity of implementation substantially with only

a marginal affect on the coding performance. We propose power-efficient structures for folded

and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the

proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy

per sample (EPS) compared to the direct implementation of the reference algorithm, on average,

for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%

saving in EPS can be achieved by the proposed pruning algorithm with nearly the same

throughput rate. The proposed architecture is found to support ultrahigh definition 7680 × 4320

at 60 frames/s video, which is one of the applications of HEVC.

3. Improved Method to Select the Lagrange Multiplier for Rate-Distortion Based

Motion Estimation in Video Coding

The motion estimation (ME) process used in the H.264/AVC reference software is based on

minimizing a cost function that involves two terms (distortion and rate) that are properly

balanced through a Lagrangian parameter, usually denoted as λmotion. In this paper we propose

an algorithm to improve the conventional way of estimating λmotion and, consequently, the ME

process. First, we show that the conventional estimation of λmotion turns out to be significantly

less accurate when ME-compromising events, which make the ME process to perform poorly,

happen. Second, with the aim of improving the coding efficiency in these cases, an efficient

algorithm is proposed that allows the encoder to choose between three different values of

λmotion for the Inter 16x16 partition size. To be more precise, for this partition size, the

proposed algorithm allows the encoder to additionally test λmotion=0 and λmotionarbitrarily

large, which corresponds to minimum distortion and minimum rate solutions, respectively. By





Mob.No: 09677465689,


testing these two extreme values, the algorithm avoids making large ME errors. The

experimental results on video segments exhibiting this type of ME-compromising events reveal

an average rate reduction of 2.20% for the same coding quality with respect to the JM15.1

reference software of H.264/AVC. The algorithm has been also tested in comparison with a

state-of-the-art algorithm called context adaptive Lagrange multiplier. Additionally, two

illustrative examples of the subjective performance improvement are provided.

4. Low Power Motion Estimation Based on Probabilistic Computing

As CMOS technology driven by Moore's law has approached device sizes in the range of 5-20

nm, noise immunity of such future technology nodes is predicted to decrease considerably,

eventually affecting the reliability of computations through them. A shift in the design paradigm

is expected from 100% accurate computations to probabilistic computing with accuracy

dependent on the target application or circuit specifications. One model developed for CMOS

technology that emulates the erroneous behavior predicted is termed probabilistic CMOS

(PCMOS). In this paper, we propose a PCMOS-based architecture implementation for

traditional motion estimation algorithms and show that up to 57% energy savings are possible for

different existing motion estimation algorithms. Furthermore, algorithmic modifications are

proposed that can enhance the energy savings to 70% with a PCMOS architectural

implementation. About 1.8-5 dB improvement in peak signal-to-noise ratio under energy savings

of 57% to 70% for two different motion estimation algorithms is shown, establishing the

resilience of the proposed algorithm to probabilistic computing over the comparable

conventional algorithm.





Mob.No: 09677465689,


5. Two-layer motion estimation algorithm for video coding

A novel two-layer motion estimation which searches motion vectors on two layers with partial

distortion measures in order to reduce the overwhelming computational complexity

of motion estimation (ME) in video coding is proposed. A layer is an image which is derived

from the reference frame such that the summation of a block of pixels in the reference frame

determines the point of a layer. It has been noted on different video sequences that

many motion vectors on the layers are the same as those searched on the reference frame.

Experimental results on a wide variety of video sequences show that the proposed algorithm

achieves both fast speed and good motion prediction quality when compared with the state-of-

the-art fast block matching algorithms.

6. H.264-based hierarchical two-layer lossless video coding method

An efficient lossless coding technique is very important for storage and transmission applications

of error sensitive information such as medical, seismic and digital artistic data. In this study, the

authors proposed an H.264-based advance video coding (H.264/AVC)-based hierarchical

lossless coding method, where the input video will be firstly encoded by H.264/AVC coder with

a quantisation parameter (QP) selector in the base layer and the coded error is encoded by a QP-

adaptive Rice coder in the enhancement layer. To reduce encoding time, the QP selector can be

simplified to select the nearly optimal QP. Simulation results show that the proposed hierarchical

lossless coding architecture achieves better compression ratio than the traditional H.264/AVC-

based lossless coding systems. Since the proposed system could provide both lossy and lossless

coding services at the same time, the proposed lossless video coding system has advantages of

efficiency and flexibility for practical applications. Experimental results show that the proposed





Mob.No: 09677465689,


coding system can provide less coding bits and reduce coding complexity compare with H.264-

differential pulse code modulation.

7. A cache-aware motion estimation organization for a hardware-based H.264 encoder

The video resolution required for many types of video content has increased as technology has

advanced. For the real-time encoding of the high resolutions such as full high definition (FHD),

quad-FHD (QFHD) and beyond, various fast motion estimation (ME) algorithms have been

researched. Caches are used for many fast MEs in a hardware-based encoder, in order to increase

local memory utilization and thereby reduce external memory access. However, most previous

works do not pay attention to the amount of cache access from multiple MEs. In a multi-core

environment for high resolution videos, access conflicts directly affect the computation time. In

this paper, various types of caches are compared in terms of the size, hit ratio, cache port

conflicts and hardware overhead. To reduce the amount of cache access associated with the basic

shared cache, zigzag snake scan and selective data-storage schemes are proposed for integer and

fractional MEs, respectively. Additionally, the cache access arbitration hides the computation

delay which arises due to a cache port conflict in a pipeline system. The proposed schemes are

applicable for the existing cache design achieving a good scalability in a multi-core environment.

Simulation results show that the ME computation time reduced by the proposed schemes is

comparable to that of the dual-port shared cache which shows the least amount of port conflicts.

8. An Overview of Information Hiding in H.264/AVC Compressed Video

Information hiding refers to the process of inserting information into a host to serve specific

purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video





Mob.No: 09677465689,


domain are surveyed. First, the general framework of information hiding is conceptualized by

relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by

using various data representation schemes such as bit plane replacement, spread spectrum,

histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which

information hiding takes place are then identified, including prediction process, transformation,

quantization, and entropy coding. Related information hiding methods at each venue are briefly

reviewed, along with the presentation of the targeted applications, appropriate diagrams, and

references. A timeline diagram is constructed to chronologically summarize the invention of

information hiding methods in the compressed still image and video domains since 1992. A

comparison among the considered information hiding methods is also conducted in terms of

venue, payload, bitstream size overhead, video quality, computational complexity, and video

criteria. Further perspectives and recommendations are presented to provide a better

understanding of the current trend of information hiding and to identify new opportunities for

information hiding in compressed video.

9. Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching

Estimating the geometric and reflective properties of the environment is important for a wide

range of applications of space-time audio processing, from acoustic scene analysis to room

equalization and spatial audio rendering. In this manuscript, we propose a methodology for

frequency-subband in-situ estimation of the reflection coefficients of planar surfaces. This is a

rather challenging task, as the reflection coefficients depend on the frequency and the angle of

incidence and their estimate is highly sensitive to background noise and interfering sources. Our

method is based on the assumption that we know the geometry of the reflectors; the position and





Mob.No: 09677465689,


the radiation pattern of the source; the position and the spatial response of the array. Applying

beamforming algorithms on a single set of measured sensor data, we estimate the angular

distribution of the acoustic energy (angular pseudospectrum) that impinges on a microphone

array. We then apply a two-step iterative estimation technique based on an Expectation-

Maximization (EM) algorithm. The first step estimates the scaling factors. The second one infers

the reflection coefficients from the scaling factors. Under the assumption of additive white

Gaussian noise, we finally determine the reflection coefficients with a Maximum Likelihood

(ML) estimation method. The effectiveness and the accuracy of the proposed technique are

assessed through experiments based on measured data.

10. Speech Processing on a Reconfigurable Analog Platform

We describe architectures for audio classification front ends on a reconfigurable analog platform.

Real-time implementation of audio processing algorithms involving discrete-time signals tend to

be power-intensive. We present an alternate continuous-time system implementation of a noise-

suppression algorithm on our reconfigurable chip, while detailing the design considerations. We

also describe a framework that enables future implementations of other

speech processing algorithms, classifier front ends, and hearing aids.

11. Nonlinear Audio Systems Identification Through Audio Input Gaussianization

Nonlinear audio system identification generally relies on Gaussianity, whiteness and stationarity

hypothesis on the input signal, although audio signals are non-Gaussian, highly correlated and

non-stationary. However, since the physical behavior of nonlinear audio systems is input-





Mob.No: 09677465689,


dependent, they should be identified using natural audio signals (speech or music) as input,

instead of artificial signals (sweeps or noise) as usually done. We propose an identification

scheme that conditions audio signals to fit the desired properties for an efficient identification.

The identification system consists in (1) a Gaussianization step that makes the signal near-

Gaussian under a perceptual constraint; (2) a predictor filterbank that whitens the signal; (3) an

orthonormalization step that enhances the statistical properties of the input vector of the last step,

under a Gaussianity hypothesis; (4) an adaptive nonlinear model. The proposed scheme enhances

the convergence rate of the identification and reduces the steady state identification error,

compared to other schemes, for example the classical adaptive nonlinear identification.

12. Low Distortion Switching Amplifier With Discrete-Time Click Modulation

An all-digital Class-D amplifier based on a discrete-time implementation of the click modulator

is presented. The algorithm is able to generate binary signals with separated baseband, displacing

the harmonic content produced by the modulation process above certain frequency chosen by the

designer. Perfect demodulation can be achieved by a simple low-pass filter. Previous

implementations of the discrete-time click modulator reported in the literature suffer from

aliasing in the frequency domain. The approach proposed here avoids aliasing, without the

necessity to increase (interpolate) the sampling frequency of the signals. Following a brief

theoretical introduction, the performance of the proposed architecture is demonstrated by

experimental measurements performed on an H-bridge amplifier. An 88 dB signal-to-noise ratio

(SNR) and a total harmonic distortion (THD) + N less than 0.04% is attainable over the

entire audio band, extending from 20 Hz up to 20 kHz; on the other hand, no traces of IMD

appear above the predicted noise floor. These performance indices are obtained for switching





Mob.No: 09677465689,


rates as low as 40 kHz. The reduction of the switching frequency provides more flexibility for

the design of the demodulation stage allowing to trade off between the complexity of the

demodulation filter and the achievable efficiency of the switching stage.

13. ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for

Real-Time Segmentation of High Definition Video

Background identification is a common feature in many video processing systems. This paper

proposes two hardware implementations of the OpenCV version of the Gaussian mixture model

(GMM), a background identification algorithm. The implemented version of the algorithm

allows a fast initialization of the background model while an innovative, hardware-oriented,

formulation of the GMM equations makes the proposed circuits able to perform real-time

background identification on high definition (HD) video sequences with frame size 1920 × 1080.

The first of the two circuits is designed with commercial field-programmable gate-array (FPGA)

devices as target. When implemented on Virtex6 vlx75t, the proposed circuit process 91 HD fps

(frames per second) and uses 3% of FPGA logic resources. The second circuit is oriented to the

implementation in UMC-90 nm CMOS standard cell technology, and is proposed in two

versions. Both versions can process at a frame rate higher than 60 HD fps. The first version uses

the constant voltage scaling technique to provide a low power implementation. It provides silicon

area occupation of 28847 μm2 and energy dissipation per pixel of 15.3 pJ/pixel. The second

version is designed to reduce silicon area utilization and occupies 21847 μm2with an energy

dissipation of 49.4 pJ/pixel.





Mob.No: 09677465689,


14. VLSI Architecture Design of Guided Filter for 30 Frames/s Full-HD Video

Filtering is widely used in image and video processing for various applications. Recently, the

guided filter has been proposed and became one of the popular filtering methods. In this paper, to

achieve the computation demand of guided filtering in full-HD video, a double integral image

architecture for guided filter ASIC design is proposed. In addition, a reformation of the guided

filter formula is proposed, which can prevent the error resulted from truncation in the fractional

part and modify the regularization parameter ε on user's demand. The hardware architecture of

the guided image filter is then proposed and can be embedded in mobile devices to achieve real-

time HD applications. To the best of our knowledge, this paper is also the first ASIC design for

guided image filter. With a TSMC 90-nm cell library, the design can operate at 100 MHz and

support for Full-HD (1920 × 1080) 30 frame/s with 92.9K gate counts and 3.2 KB on-chip

memory. Moreover, for the hardware efficiency, our architecture is also the best compared to

other previous works with bilateral filter.

15. Video Colorization Using Parallel Optimization in Feature Space

We present a new scheme for video colorization using optimization in rotation-aware Gabor

feature space. Most current methods of video colorization incur temporal artifacts and

prohibitive processingcosts, while this approach is designed in a spatiotemporal manner to

preserve temporal coherence. The parallel implementation on graphics hardware is also

facilitated to achieve realtime performance of color optimization. By adaptively

clustering video frames and extending Gabor filtering to optical flow computation, we can

achieve real-time color propagation within and between frames. Temporal coherence is further





Mob.No: 09677465689,


refined through user scribbles in video frames. The experimental results demonstrate that our

proposed approach is efficient in producing high-quality colorized videos.

16. Joint Non-Gaussian Denoising and Superresolving of Raw High Frame Rate Videos

High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-

noise-ratio and image resolution, so combinational super-resolving and denoising is crucial for

enhancing high speed videos and extending their applications. The solution is nontrivial due to

the fact that two deteriorations co-occur during capturing and noise is nonlinearly dependent on

signal strength. To handle this problem, we propose conducting noise separation and super

resolution under a unified optimization framework, which models both spatiotemporal priors of

high quality videos and signal-dependent noise. Mathematically, we align the frames along

temporal axis and pursue the solution under the following three criterion: 1) the sharp noise-free

image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a

given nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with

sparse coefficients and an over complete dictionary learned from high quality natural images. In

computation aspects, we propose to obtain the final result by solving a convex optimization

using the modern local linearization techniques. In the experiments, we validate the proposed

approach in both synthetic and real captured data.





Mob.No: 09677465689,


17. Intuitive real-times platform for audio signal processing and musical instrument

response emulation

In recent years, the DSP group at the University of Manchester has developed a range of DSP

platforms for realtime filtering and processing of acoustic signals. These include Signal Wizard

2.5, Signal Wizard 3 and Vsound. These incorporate processors operating at 100 million

multiplication-accumulations per second (MMACs) for SW 2.5 and 600 MMACS for SW 3 and

Vsound. SW 3 features six input and eight output analogue channels, digital input/output in the

form of S/PDIF and a USB interface. For all devices, The software allows the user, with no

knowledge of filter theory or programming, to design and run standard or completely arbitrary

FIR, IIR and adaptive filters. Processing tasks are specified using the graphical icon based

interface. In addition, the system has the capability to emulate in real-time linear system behavior

such as sensors, instrument bodies, string vibrations, resonant spaces and electrical networks.

Tests have confirmed a high degree of fidelity between the behavior of the physical system and

its digitally emulated counterpart. In addition to the supplied software, the user may also

program the system using a variety of commercial packages via the JTAG interface.





Mob.No: 09677465689,


1. On the Relation of Random Grid and Deterministic Visual Cryptography

Visual cryptography is a special type of secret sharing. Two models of visual cryptography have

been independently studied: 1) deterministic visual cryptography, introduced by Naor and

Shamir, and 2) random grid visual cryptography, introduced by Kafri and Keren. In this paper,

we show that there is a strict relation between these two models. In particular, we show that to

any random grid scheme corresponds a deterministic scheme and vice versa. This allows us to

use results known in a model also in the other model. By exploiting the (many) results known in

the deterministic model, we are able to improve several schemes and to provide many upper

bounds for the random grid model and by exploiting some results known for the random grid

model, we are also able to provide new schemes for the deterministic model. A side effect of this

paper is that future new results for any one of the two models should not ignore, and in fact be

compared with, the results known in the other model.

2. Efficient Algorithm and Architecture for Elliptic Curve Cryptography for

Extremely Constrained Secure Applications

Recently, considerable research has been performed in cryptography and security to optimize the

area, power, timing, and energy needed for the point multiplication operations over binary

elliptic curves. In this paper, we propose an efficient implementation of point multiplication on

Koblitz curves targeting extremely-constrained, secure applications. We utilize the Gaussian

Cryptography and Steganography





Mob.No: 09677465689,


normal basis (GNB) representation of field elements over GF(2m

) and employ an efficient bit-

level GNB multiplier. One advantage of this GNB multiplier is that we are able to reduce the

hardware complexity through sharing the addition/accumulation with other field additions. We

utilized the special property of normal basis representation and squarings are implemented very

efficiently by only rewiring in hardware. We introduce a new technique for point addition in

affine coordinate which requires fewer registers. Based on this technique, we propose an

extremely small processor architecture for point multiplication. Through application-specific

integrated circuit (ASIC) implementations, we evaluate the area, performance, and energy

consumption of the proposed crypto-processor. Utilizing two different working frequencies, it is

shown that the proposed architecture reaches better results compared to the previous works,

making it suitable for extremely-constrained, secure environments.

3. Property Analysis of XOR-Based Visual Cryptography

A (k,n) visual cryptographic scheme (VCS) encodes a secret image into n shadow images

(printed on transparencies) distributed among n participants. When any k participants

superimpose their transparencies on an overhead projector (OR operation), the secret image can

be visually revealed by a human visual system without computation. However, the monotone

property of OR operation degrades the visual quality of reconstructed image for OR-based VCS

(OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR operation for decoding, was

proposed to enhance the contrast. In this paper, we investigate the relation between OVCS and

XVCS. Our main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can

be used in (k,n)-XVCS. Meantime, the contrast is enhanced 2(k-1)

times.





Mob.No: 09677465689,


4. Multifunction Residue Architectures for Cryptography

A design methodology for incorporating Residue Number System (RNS) and Polynomial

Residue Number System (PRNS) in Montgomery modular multiplication in GF(p) or GF(2n)

respectively, as well as a VLSI architecture of a dual-field residue arithmetic Montgomery

multiplier are presented in this paper. An analysis of input/output conversions to/from residue

representation, along with the proposed residue Montgomery multiplication algorithm, reveals

common multiply-accumulate data paths both between the converters and between the two

residue representations. A versatile architecture is derived that supports all operations of

Montgomery multiplication in GF(p) and GF(2n), input/output conversions, Mixed Radix

Conversion (MRC) for integers and polynomials, dual-field modular exponentiation and

inversion in the same hardware. Detailed comparisons with state-of-the-art implementations

prove the potential of residue arithmetic exploitation in dual-field modular multiplication.

5. Error Detection and Recovery for ECC: A New Approach Against Side-Channel

Attacks

Side channel attacks allow an attacker to retrieve secret keys with far less effort than other

attacks. Countermeasures against these attacks should be considered during cryptosystem design.

This paper presents a novel low-cost error detection and recovery scheme (LOEDAR) to counter

fault attacks. The proposed architecture retains the efficiency of the Montgomery ladder

algorithm and shows strong resistance to both environmental-induced faults as well as attacker-

introduced faults. Moreover, the proposed LOEDAR scheme is compatible with most existing

countermeasures against various power analysis attacks including differential power analysis and

its variants, which makes it extendable to a comprehensive countermeasure against both fault

attacks and power analysis attacks.





Mob.No: 09677465689,


6. Effectiveness of Leakage Power Analysis Attacks on DPA-Resistant Logic Styles

Under Process Variations

This paper extends the analysis of the effectiveness of Leakage Power Analysis (LPA) attacks to

cryptographic VLSI circuits on which circuit level countermeasures against Differential Power

Analysis (DPA) are adopted. Security metrics used for assessing the DPA-resistance of crypto

core implementations, such as the minimum number to disclosure (MTD) and the asymptotic

correlation coefficient, have been extended to the case of LPA. The LPA-resistance has been

evaluated in terms of MTD as a function of the on chip noise. Noise variances up to 10000 times

greater than the signal variance have been taken into account and LPA attacks have been

successfully executed for all the logic styles under analysis using less than 100000

measurements. Moreover the role of process variations has been investigated through extensive

Monte Carlo simulations in order to evaluate their impact on the leakage model for the logic

styles under analysis. Results show that LPA attacks can be successfully carried out on the

different anti-DPA logic styles even in presence of process variations. To the best of our

knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario

where on chip noise and process variations are taken into account.

7. New and Improved Methods to Analyze and Compute Double-Scalar

Multiplications

We address several algorithms to perform a double-scalar multiplication on an elliptic curve. All

the methods investigated are related to the double-base number system (DBNS) and extend





Mob.No: 09677465689,


previous work of Doche et al. [25]. We refine and rigorously prove the complexity analysis of

the joint binary-ternary (JBT) algorithm. Experiments are in line with the theory and show that

the JBT requires approximately 6 percent less field multiplications than the standard joint sparse

form (JSF) method to compute [n]P + [m]Q. We also introduce a randomized version of the JBT,

called JBT-Rand, that gives total control of the number of triplings in the expansion that is

produced. So it becomes possible with the JBT-Rand to adapt and tune the number of triplings to

the coordinate system and bit length that are used, to further decrease the cost of a double-scalar

multiplication. Then, we focus on Koblitz curves. For extension degrees enjoying an optimal

normal basis of type II, we discuss a Joint τ-DBNS approach that reduces the number of field

multiplications by at least 35 percent over the traditional τ-JSF. For other extension degrees

represented in polynomial basis, the Joint τ-DBNS is still relevant provided that appropriate

bases conversion methods are used. In this situation, tests show that the speedup over the τ-JSF

is then larger than 20 percent. Finally, when the use of the τ-DBNS becomes unrealistic, for

instance because of the lack of an efficient normal basis or the lack of memory to allow an

efficient conversion, we adapt the joint binary-ternary algorithm to Koblitz curves giving rise to

the Joint τ-τ method whose complexity is analyzed and proved. The Joint τ-τ induces a speedup

of about 10 percent over the τ-JSF.

8. A Hybrid Scheme for Authenticating Scalable Video Codestreams

A scalable video coding (SVC) codestream consists of one base layer and possibly several

enhancement layers. The base layer, which contains the lowest quality and resolution images, is

the foundation of the SVC codestream and must be delivered to recipients, whereas enhancement

layers contain richer contour/texture of images in order to supplement the base layer in

resolution, quality, and temporal scalabilities. This paper presents a novel hybrid authentication





Mob.No: 09677465689,


(HAU) scheme. The HAU employs both cryptographic authentication and content-based

authentication techniques to ensure integrity and authenticity of the SVC codestreams. Our

analysis and experimental results indicate that the HAU is able to detect malicious manipulations

and locate the tampered image regions while is robust to content-preserving manipulations for

enhancement layers. Although our focus in this paper is on authenticating H.264/SVC

codestreams, the proposed technique is also applicable to authenticate other scalable multimedia

contents such as MPEG-4 fine grain scalability and JPEG2000 codestreams.

9. Authenticated Encryption: Toward Next-Generation Algorithms

Wondering whether researchers have a cryptographic tool able to provide both confidentiality

(privacy) and integrity (authenticity) of a message? They do: authenticated encryption (AE), a

symmetric-key mechanism that transforms a message into a ciphertext. This article discusses

standard AE algorithms, classic security models' shortcomings for AE algorithms, and related

attacks. Motivated by these attacks, the crypto community started CAESAR (Competition for

Authenticated Encryption: Security, Applicability, and Robustness) to promote the development

of next-generation AE algorithms.

10. E-MACs: Toward More Secure and More Efficient Constructions of Secure

Channels

In cryptography, secure channels enable the confidential and authenticated message exchange

between authorized users. A generic approach of constructing such channels is by combining an

encryption primitive with an authentication primitive (MAC). In this work, we introduce the





Mob.No: 09677465689,


design of a new cryptographic primitive to be used in the construction of secure channels.

Instead of using general purpose MACs, we propose the deployment of special purpose MACs,

named ε-MACs. The main motivation behind this work is the observation that, since the message

must be both encrypted and authenticated, there might be some redundancy in the computations

performed by the two primitives. Therefore, removing such redundancy can improve the

efficiency of the overall composition. Moreover, computations performed by the encryption

algorithm can be further utilized to improve the security of the authentication algorithm. In

particular, we will show how ε-MACs can be designed to reduce the amount of computation

required by standard MACs based on universal hash functions, and show how ε-MACs can be

secured against key-recovery attacks.

11. Robust lightweight fingerprint encryption using random block feedback

Fingerprint encryption in embedded environments should satisfy both lightweightedness and

secureness. Normally, the encryption scheme divides the 8-bit pixel images into bit planes and

then performs full encryption for one bit plane, e.g. least significant bit plane, and simple

operations for the remaining bit planes. Thus, the scheme performs better compared with the 8-

bit full encryption, while the security is decreased since only one bit plane is fully encrypted. An

innovative fingerprint encryption scheme is proposed which supports better security while

maintaining the overall performance. The proposed scheme uses a bit plane encryption and a

random block feedback. The encryption schemes are implemented and tested with 320 sample

fingerprint images. The result shows that the scheme has superior aspects compared with the

existing bit plane encryption and even with the naive full encryption.





Mob.No: 09677465689,


12. Optimising the SHA-512 cryptographic hash function on FPGAs

In this study, novel pipelined architectures, optimised in terms of throughput and throughput/area

factors, for the SHA-512 cryptographic hash function, are proposed. To achieve this,

algorithmic- and circuit-level optimisation techniques such as loop unrolling, re-timing, temporal

pre-computation, resource re-ordering and pipeline are applied. All the techniques, except

pipeline are applied in the function's transformation round. The pipeline was applied through the

development of all the alternative pipelined architectures and implementation in several Xilinx

FPGA families and they are evaluated in terms of frequency, area, throughput and

throughput/area factors. Compared to the initial un-optimised implementation of SHA-512

function, the introduced five-stage pipelined architecture improves the both the throughput and

throughput/area factors by 123 and 61.5%, respectively. Furthermore, the proposed five-stage

pipelined architecture outperforms the existing ones both in throughput (3.4× up to 16.9×) and

throughput/area (19.5% up to 6.9×) factors.

13. Constructions of Resilient S-Boxes With Strictly Almost Optimal Nonlinearity

Through Disjoint Linear Codes

In this paper, a novel approach of finding disjoint linear codes is presented. The cardinality of a

set of [u, m, t+1] disjoint linear codes largely exceeds all the previous best known methods used

for the same purpose. Using such sets of disjoint linear codes, not necessarily of the same length,

we have been able to provide a construction technique of t-resilient S-boxes F:F2n→2

m ( n even,

) with strictly almost optimal nonlinearity . This is the first time that the bound 2n-1

-2n/2

has been

exceeded by multiple output resilient functions. Actually, the nonlinearity of our functions is in

many cases equal to the best known nonlinearity of balanced Boolean functions. A large class of

previously unknown cryptographic resilient S-boxes is obtained, and several improvements of





Mob.No: 09677465689,


the original approach are proposed. Some other relevant cryptographic properties are also briefly

discussed. It is shown that these functions may reach Siegenthaler's bound n-t-1, and can be

either of optimal algebraic immunity or of slightly suboptimal algebraic immunity, which was

confirmed by simulations.

14. Data Hiding in Encrypted H.264/AVC Video Streams by Codeword Substitution

Digital video sometimes needs to be stored and processed in an encrypted format to maintain

security and privacy. For the purpose of content notation and/or tampering detection, it is

necessary to perform data hiding in these encrypted videos. In this way, data hiding in encrypted

domain without decryption preserves the confidentiality of the content. In addition, it is more

efficient without decryption followed by data hiding and re-encryption. In this paper, a novel

scheme of data hiding directly in the encrypted version of H.264/AVC video stream is proposed,

which includes the following three parts, i.e., H.264/AVC video encryption, data embedding, and

data extraction. By analyzing the property of H.264/AVC codec, the codewords of

intraprediction modes, the codewords of motion vector differences, and the codewords of

residual coefficients are encrypted with stream ciphers. Then, a data hider may embed additional

data in the encrypted domain by using codeword substitution technique, without knowing the

original video content. In order to adapt to different application scenarios, data extraction can be

done either in the encrypted domain or in the decrypted domain. Furthermore, video file size is

strictly preserved even after encryption and data embedding. Experimental results have

demonstrated the feasibility and efficiency of the proposed scheme.





Mob.No: 09677465689,


15. A Novel Joint Data-Hiding and Compression Scheme Based on SMVQ and Image

Inpainting

In this paper, we propose a novel joint data-hiding and compression scheme for digital images

using side match vector quantization (SMVQ) and image inpainting. The two functions

of data hiding and image compression can be integrated into one single module seamlessly. On

the sender side, except for the blocks in the leftmost and topmost of the image, each of the other

residual blocks in raster-scanning order can be embedded with secret data and compressed

simultaneously by SMVQ or image inpainting adaptively according to the current embedding bit.

Vector quantization is also utilized for some complex blocks to control the visual distortion and

error diffusion caused by the progressive compression. After segmenting the image compressed

codes into a series of sections by the indicator bits, the receiver can achieve the extraction of

secret bits and image decompression successfully according to the index values in the segmented

sections. Experimental results demonstrate the effectiveness of the proposed scheme.

16. A New Secure Image Transmission Technique via Secret-Fragment-Visible Mosaic

Images by Nearly Reversible Color Transformations

A new secure image transmission technique is proposed, which transforms automatically a given

large-volume secret image into a so-called secret-fragment-visible mosaic image of the same

size. The mosaic image, which looks similar to an arbitrarily selected target image and may be

used as a camouflage of the secret image, is yielded by dividing the secret image into fragments

and transforming their color characteristics to be those of the corresponding blocks of the target

image. Skillful techniques are designed to conduct the color transformation process so that the

secret image may be recovered nearly losslessly. A scheme of handling the





Mob.No: 09677465689,


overflows/underflows in the converted pixels' color values by recording the color differences in

the untransformed color space is also proposed. The information required for recovering the

secret image is embedded into the created mosaic image by a lossless data hiding scheme using a

key. Good experimental results show the feasibility of the proposed method.

17. An Overview of Information Hiding in H.264/AVC Compressed Video

Information hiding refers to the process of inserting information into a host to serve specific

purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video

domain are surveyed. First, the general framework of information hiding is conceptualized by

relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by

using various data representation schemes such as bit plane replacement, spread spectrum,

histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which

information hiding takes place are then identified, including prediction process, transformation,

quantization, and entropy coding. Related information hidingmethods at each venue are briefly

reviewed, along with the presentation of the targeted applications, appropriate diagrams, and

references. A timeline diagram is constructed to chronologically summarize the invention of

information hiding methods in the compressed still image and video domains since 1992. A

comparison among the considered information hiding methods is also conducted in terms of

venue, payload, bitstream size overhead, video quality, computational complexity, and video

criteria. Further perspectives and recommendations are presented to provide a better

understanding of the current trend of information hiding and to identify new opportunities for

information hiding in compressed video.





Mob.No: 09677465689,


18. Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images

This paper studies the impact of secure watermark embedding in digital images by proposing a

practical implementation of secure spread-spectrum watermarking using distortion optimization.

Because strong security properties (key-security and subspace-security) can be achieved using

naturalwatermarking (NW) since this particular embedding lets the distribution of the host and

watermarked signals unchanged, we use elements of transportation theory to minimize the global

distortion. Next, we apply this new modulation, called transportation NW (TNW), to design a

secure watermarking scheme for grayscale images. The TNW uses a multiresolution image

decomposition combined with a multiplicative embedding which is taken into account at the

distribution level. We show that the distortion solely relies on the variance of the wavelet

subbands used during the embedding. In order to maximize a target robustness after JPEG

compression, we select different combinations of subbands offering the lowest Bit Error Rates

for a target PSNR ranging from 35 to 55 dB and we propose an algorithm to select them. The use

of transportation theory also provides an average PSNR gain of 3.6 dB on PSNR with respect to

the previous embedding for a set of 2000 images.

19. A Phase-Based Audio Watermarking System Robust to Acoustic Path Propagation

Today, comparing audio watermarking systems remain a challenge due to the lack of publicly-

available reference algorithms. In addition, robustness against acoustic path transmission is only

occasionally evaluated. This jeopardizes the chances of digital watermarking to be adopted in the

context of applications where such a feature is vital, e.g., second screen, audience measurement,

and so on. In this paper, we introduce a rather simple audio watermarking algorithm, whose

source code has been publicized for potential reuse by the watermarking community. We then

complement this baseline system with three additional components, namely a psychoacoustic





Mob.No: 09677465689,


model, a resynchronization framework, and an improved correlation-based detector. Reported

experimental results clearly demonstrate that the resulting high-fidelity

audio watermarking system manages to survive the acoustic path.

20. Tolerance Evaluation for Defocused Images to Optical Watermarking Technique

In this paper, we describe a new aspect to evaluating the robustness of the

optical watermarkingtechnique, which is a unique technology that can add watermarked

information to object image data taken with digital cameras without any specific extra hardware

architecture. However, since this technology uses light with embedded watermarked information,

which is irradiated onto object images, the condition of taking a picture with digital cameras may

affect the accuracy with which embedded watermarked data can be detected. Images taken with

digital cameras are usually defocused, which occurs under non-optimal conditions. We evaluated

the defocusing in images against the accuracy with which optical watermarking could be

detected. Defocusing in images can be expressed with convolution with a line-spread function

(LSF). We used the value of full-width at half-maximum (FWHM) of a Gaussian function as the

degree to which images were defocused, which could approximate LSF. We carried out

experiments where the accuracies of detection were evaluated as we varied the degree to which

images were defocused. The results from the experiments revealed that

optical watermarkingtechnology was extremely robust against defocusing in images.

21. Adaptive Watermarking and Tree Structure Based Image Quality Estimation





Mob.No: 09677465689,


Image quality evaluation is very important. In applications involving signal transmission, the

Reduced- or No-Reference quality metrics are generally more practical than the Full-Reference

metrics. In this study, we propose a quality estimation method based on a novel semi-fragile and

adaptivewatermarking scheme. The proposed scheme uses the embedded watermark to estimate

the degradation of cover image under different distortions. The watermarking process is

implemented in DWT domain of the cover image. The correlated DWT coefficients across the

DWT subbands are categorized into Set Partitioning in Hierarchical Trees (SPIHT). Those

SPHIT trees are further decomposed into a set of bitplanes. The watermark is embedded into the

selected bitplanes of the selected DWT coefficients of the selected tree without causing

significant fidelity loss to the cover image. The accuracy of the quality estimation is made to

approach that of Full-Reference metrics by referring to an "Ideal Mapping Curve" computed a

priori. The experimental results show that the proposed scheme can estimate image quality in

terms of PSNR, wPSNR, JND and SSIM with high accuracy under JPEG compression,

JPEG2000 compression, Gaussian low-pass filtering and Gaussian noise distortion. The results

also show that the proposed scheme has good computational efficiency for practical applications.

22. A Fragile Watermarking Algorithm for Hologram Authentication

A fragile watermarking algorithm for hologram authentication is presented in this paper. In the

proposed algorithm, the watermark is embedded in the discrete cosine transform (DCT) domain

of a hologram. The watermarked hologram is stored in spatial domain with finite precision level.

By enhancing the precision for storing the watermarked hologram pixels, the distortion produced

by the proposedwatermarking scheme can be lowered. While providing high perceptual

transparency, the proposed algorithm also attains high performance detection to delivery errors





Mob.No: 09677465689,


and malicious tampering. Experimental results reveal that the proposed algorithm can be used as

an effective filter for blocking polluted or tampered holograms from 3D magnitude and/or phase

reconstruction.





Mob.No: 09677465689,


1. Time Domain Channel Estimation for OQAM-OFDM Systems:

Algorithms and Performance Bounds

In this paper, we first present a general time domain model for

the channel estimation in the orthogonal frequency division multiplexing system

with offset quadrature amplitude modulation (OQAM-OFDM), and utilize the

frequency domain pilots to estimate the time domain channel impulse responses.

Different form the conventional methods, there is no specific requirement for the

length of the symbol interval compared to the the maximum channel delay spread

in the proposed scheme. Furthermore, with the proposed time domain model,

the channel statistic information could be utilized to improve the performance of

the channel estimation. Then, we propose two channel estimation schemes, i.e.,

linear minimum mean square error (LMMSE) and weighted least square (WLS),

and we also derive their corresponding Bayesian Cramer -Rao Bound (BCRB) and

Cramer-Rao Bound (CRB) bounds, respectively. Simulation results demonstrate

that the BCRB and CRB bounds could be achieved by the proposed LMMSE and

WLS methods, respectively. Moreover, simulation results show that the proposed

methods are much robust to the time synchronization error compared to the

conventional frequency domain methods, and imply that the pulse shaping filter

Wireless Communication & 4G Technology





Mob.No: 09677465689,


with waveforms concentrated in the time domain could be employed in OQAM-

OFDM systems to improve the channel estimationperformance and spectral

efficiency.

2. Robust Training Sequence Design for Correlated MIMO Channel

Estimation

We study how to design a worst-case robust training sequence for multiple-input

multiple-output (MIMO)channel estimation. We consider mean-squared error

of channel estimates as the figure of merit which is a function of second-order

statistics of the MIMO channel, i.e., channel covariance matrix, in order to

optimize training sequences under a total power constraint. In practical

applications, the channelcovariance matrix is not known perfectly. Thus the main

aspect of our design is to improve robustness of the training sequences against

possible uncertainties in the available channel covariance matrix. Using a

deterministic uncertainty model, we formulate a robust training sequence design as

a minimax optimization problem where we take such imperfections into account.

We investigate the robust design problem assuming the general case of an

arbitrarily correlated MIMO channel and a non-empty compact convex uncertainty

set. We prove that such a problem admits a globally optimal solution by exploiting

the convex-concave structure of the objective function, and propose numerical





Mob.No: 09677465689,


algorithms to address the robust training design problem. We proceed the analysis

by considering multiple-input single-output (MISO) channels and Kronecker

structured MIMO channels along with unitarily-invariant uncertainty sets. For

these scenarios, we show that the problem is diagonalized by the eigenvectors of

the nominal covariance matrices so that the robust design is significantly simplified

from a complex matrix-variable problem to a real vector-variable power allocation

problem. For the MISO channel, we provide closed-form solutions for the robust

training sequences with the uncertainty sets defined by the spectral norm and

nuclear norm.

3. On Forward Channel Estimation for MIMO Precoding in Cooperative

Relay Wireless Transmission Systems

Linear precoding for wireless multi-input multi-output (MIMO) transceivers has

demonstrated substantial strength in cooperative relay networks for achieving high

system throughput and performance. However, traditional precoder optimization

critically assumes knowledge of channel state information (CSI) at source nodes.

For linear MIMO source precoding design, we propose a novel method to estimate

the quadratic product of forward-link channel information between source and

relay nodes. To conserve bandwidth, our source estimates the forward-link MIMO

CSI by utilizing inherent signals transmitted by amplify-and-forward (AF) relays





Mob.No: 09677465689,


without requiring the cumbersome default method of coordinated

relay channel estimation and relay feedback of its estimated CSI. From the

overheard AF relay signal, the source node simply extracts the

quadratic channel information of its forward-link for designing its precoder. In

addition to presenting a low overhead method for forward channel estimation, we

also analyze the channel estimation performance by investigating its bias and its

Cramer-Rao lower bound. Finally, we present analytical results in comparison with

simulations.

4. Impact of Channel Estimation Errors on SC-FDE Systems

Single carrier transmissions with frequency domain equalization (SC-FDE) have

gained widespread use in emergent broadband wireless systems becoming an

attractive alternative to popular Orthogonal Frequency Division Multiplexing

(OFDM) schemes, particularly at the uplink. Since coherent receivers are usually

employed with SC-FDE, accurate channel estimates are required so as to avoid

substantial performance degradation. Several channel estimation strategies have

been proposed for SC-FDE, but a thoroughly evaluation of the degradation caused

by channel estimation errors and a comparison against OFDM is still lacking. In

this paper we study the impact of imperfect channel knowledge on SC transmission

with focus on the linear frequency domain equalizer (FDE) and on the Iterative





Mob.No: 09677465689,


Block Decision Feedback Equalizer (IB-DFE). We propose a modified IB-DFE

which incorporates knowledge of the channel estimation error model and show that

its performance becomes more robust against the presence of strong error

components in the channel estimates. We also evaluate, analytically and through

simulations, the degradation caused by imperfect channel estimation in SC-FDE

and compare it against OFDM schemes (Orthogonal Frequency Division

Multiplexing). It is shown that the channelestimation requirements for SC-FDE are

higher than for OFDM unless a channel estimation error aware receiver is

employed.

5. Pilot Design for Sparse Channel Estimation in OFDM-Based Cognitive

Radio Systems

In this correspondence, sparse channel estimation is first introduced in orthogonal

frequency-division multiplexing (OFDM)-based cognitive radio systems. Based on

the results of spectrum sensing, the pilot design is studied by minimizing the

coherence of the dictionary matrix used for sparse recovery. Then, it is formulated

as an optimal column selection problem where a table is generated and the indexes

of the selected columns of the table form a pilot pattern. A novel scheme using

constrained cross-entropy optimization is proposed to obtain an optimized pilot

pattern, where it is modeled as an independent Bernoulli random process. The





Mob.No: 09677465689,


updating rule for the probability of each active subcarrier selected as a pilot

subcarrier is derived. A projection method is proposed so that the number of pilots

during the optimization is fixed. Simulation results verify the effectiveness of the

proposed scheme and show that it can achieve 11.5% improvement in spectrum

efficiency with the same channel estimationperformance compared with the least

squares (LS) channel estimation.

6. Low complexity minimum mean square error channel estimation for

adaptive coding and modulation systems

Performance of the Adaptive Coding and Modulation (ACM) strongly depends on

the retrieved ChannelState Information (CSI), which can be obtained using

the channel estimation techniques relying on pilot symbol transmission. Earlier

analysis of methods of pilot-aided channel estimation for ACM systems were

relatively little. In this paper, we investigate the performance of CSI prediction

using the Minimum Mean Square Error (MMSE) channel estimator for an ACM

system. To solve the two problems of MMSE: high computational operations and

oversimplified assumption, we then propose the Low-Complexity schemes (LC-

MMSE and Recursion LC-MMSE (R-LC-MMSE)). Computational complexity

and Mean Square Error (MSE) are presented to evaluate the efficiency of the

proposed algorithm. Both analysis and numerical results show that LC-MMSE





Mob.No: 09677465689,


performs close to the well-known MMSE estimator with much lower complexity

and R-LC-MMSE improves the application of MMSE estimation to specific

circumstances.

7. Embedded Iterative Semi-Blind Channel Estimation for Three-Stage-

Concatenated MIMO-Aided QAM Turbo Transceivers

The lack of accurate and efficient channel estimation (CE) for multiple-input-

multiple-output (MIMO)channel state information (CSI) has long been the

stumbling block of near-MIMO-capacity operation. We propose a semi-blind joint

CE and three-stage iterative detection/decoding scheme for near-capacity MIMO

systems. The main novelty is that our decision-directed (DD) CE exploits the a

posteriori information produced by the MIMO soft demapper within the inner

turbo loop to select a “just sufficient number” of high-quality detected soft bit

blocks or symbols for DDCE, which significantly improves the accuracy and

efficiency of DDCE. Moreover, our DDCE is naturally embedded into the iterative

three-stage detection/decoding process, without imposing an additional external

iterative loop between the DDCE and the three-stage turbo detector/decoder.

Hence, the computational complexity of our joint CE and three-stage turbo

detector/decoder remains similar to that of the three-stage turbo detection/decoding

scheme associated with the perfect CSI. Most significantly, the mean square error





Mob.No: 09677465689,


(MSE) of our DD channel estimator approaches the Cramer -Rao lower bound

(CRLB) associated with the optimal-training-based CE, whereas the bit error rate

(BER) of our semi-blind scheme is capable of achieving the optimal maximum-

likelihood (ML) performance bound associated with the perfect CSI.

8. Channel estimation relying on the minimum bit-errorratio criterion for

BPSK and QPSK signals

The authors consider the channel estimation problem in the context of a linear

equaliser designed for a frequency selective channel, which relies on the minimum

bit-error-ratio (MBER) optimisation framework. Previous literature has shown that

the MBER-based signal detection may outperform its minimum-mean-square-error

(MMSE) counterpart in the bit-error-ratio performance sense. In this study, they

develop a framework for channel estimation by first discretising the parameter

space and then posing it as a detection problem. Explicitly, the MBER cost

function (CF) is derived and its performance studied, when transmitting binary

phase shift keying (BPSK) and quadrature phase shift keying (QPSK) signals. It is

demonstrated that the MBER based CF aided scheme is capable of outperforming

existing MMSE, least square-based solutions.





Mob.No: 09677465689,


9. Low-Complexity DFT-Based Channel Estimation with Leakage Nulling

for OFDM Systems

In this letter, a low-complexity but near-optimal DFT-based channel estimator with

leakage nulling is proposed for OFDM systems using virtual subcarriers. The

proposed estimator is composed of a time-domain (TD) index

set estimation considering the leakage effect followed by a low-complexity TD

post-processing to suppress the leakage. The performance and complexity of the

proposed channelestimator are analyzed and verified by computer simulation.

Simulation results show that the proposed estimator outperforms conventional

estimators and provides near-optimal performance while keeping the low

complexity comparable to the simple DFT-based channel estimator.

10. Improved Matching-Pursuit Implementation for LTE Channel

Estimation

An implementation of a reduced complexity matching pursuit channel estimator

for LTE is presented. The design contains an FFT/IFFT module with non-radix-2

units and a core estimator. The module is flexible enough to perform FFT and

IFFT at different resolutions needed, using the same hardware. Based on prior

work the needed internal word lengths are found. Internal shifts are employed to

maximize the use of available resources. The design is implemented in a 65 nm





Mob.No: 09677465689,


low power process from STMicroelectronics. The total area of the implementation

is 1 mm2 design, including input pads and extra control logic. The algorithmic

improvements reduce the complexity by up to 56% compared to prior art. At the

same time estimator shows great improvement in speed, allowing over 6 times the

number of estimations in the same time. Power consumption of the estimator is

simulated to ~ 20 mW, running at 70 MHz.

1. Time-Based All-Digital Technique for Analog Built-in Self-Test

A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-

chip voltages in an all-digital manner is presented. The method is well suited for a distributed

architecture, where the routing of analog signals over long paths is minimized. A clock is routed

serially to the sampling heads placed at the nodes of analog test voltages. This sampling head

present at each test node, which consists of a pair of delay cells and a pair of flip-flops, locally

converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as

many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the

corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew

between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS

process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a

ANALOG VLSI





Mob.No: 09677465689,


few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals

in the absence of sample-and-hold circuitry.

2. Speech Processing on a Reconfigurable Analog Platform

We describe architectures for audio classification front ends on a reconfigurable analog platform.

Real-time implementation of audio processing algorithms involving discrete-time signals tend to

be power-intensive. We present an alternate continuous-time system implementation of a noise-

suppression algorithm on our reconfigurable chip, while detailing the design considerations. We

also describe a framework that enables future implementations of other speech processing

algorithms, classifier front ends, and hearing aids.

3. Analysis and Design of a Low-Voltage Low-Power Double-Tail Comparator

The need for ultra low-power, area efficient, and high speed analog-to-digital converters is

pushing toward the use of dynamic regenerative comparators to maximize speed and power

efficiency. In this paper, an analysis on the delay of the dynamic comparators will be presented

and analytical expressions are derived. From the analytical expressions, designers can obtain an

intuition about the main contributors to the comparator delay and fully explore the tradeoffs in

dynamic comparator design. Based on the presented analysis, a new dynamic comparator is

proposed, where the circuit of a conventional double-tail comparator is modified for low-power

and fast operation even in small supply voltages. Without complicating the design and by adding

few transistors, the positive feedback during the regeneration is strengthened, which results in





Mob.No: 09677465689,


remarkably reduced delay time. Post-layout simulation results in a 0.18- μm CMOS technology

confirm the analysis results. It is shown that in the proposed dynamic comparator both the power

consumption and delay time are significantly reduced. The maximum clock frequency of the

proposed comparator can be increased to 2.5 and 1.1 GHz at supply voltages of 1.2 and 0.6 V,

while consuming 1.4 mW and 153 μW, respectively. The standard deviation of the input-referred

offset is 7.8 mV at 1.2 V supply.









coherent sampling, but it requires a significantly smaller number of points to make the FFT

realization more suitable for on-chip built-in testing and calibration applications that require area

and power efficiency. The technique was assessed by comparing the simulation results from the

proposed method of single and multiple tones with the simulation results obtained from the FFT

of coherently sampled tones. The results indicate that the proper selection of test tone

frequencies can avoid spectral leakage even with multiple narrowly spaced tones. When low-

frequency signals are captured with an analog-to-digital converter (ADC) for on-chip analysis,

the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth






Mob.No: 09677465689,






1. On-Chip Measurement of Rise/Fall Gate Delay Using Reconfigurable Ring

Oscillator

In this brief, a new technique to measure the on-chip rise/fall delay of an individual gate is

presented. In the proposed technique, the rise/fall gate delay is measured using the duty cycle of

a reconfigurable ring oscillator (RRO). A set of linear equations is formed with the different

configuration settings of the RRO, relating the rise/fall delay of all the gates in the path of the

RRO to the positive/negative duty cycle of the undivided RRO. The high-frequency undivided

RRO signal is needed for this type of measurement as it preserves the rise/fall delay of an

individual gate. However, it is difficult to bring the high-frequency undivided RRO signal

outside the chip due to the frequency limitation of the output pad. The high-frequency RRO

signal is subsampled by a clock that is generated from an on-chip phase-locked loop to make it

low frequency. The rise and fall delays of an individual gate can be calculated from the

difference of the duty cycle of the subsampled RRO signal at two different configurations of the

RRO. The proposed concept is validated in a test chip that is fabricated in an industrial 65-nm

technology node.

2. Smart: Single-Cycle Multihop Traversals over a Shared Network on Chip

ANALOG VLSI





Mob.No: 09677465689,


As the number of on-chip cores increases, scalable on-chip topologies such as meshes inevitably

add multiple hops to each network traversal. The best practice today is to design one-cycle

routers, such that the low-load network latency between a source and destination is equal to the

number of routers and links (that is, twice the hops) between them. Designers of operating

systems, compilers, and cache coherence protocols often try to limit communication to within a

few hops because on-chiplatency is critical for their scalability. In this article, the authors

propose an on-chip network called Smart (Single-cycle Multihop Asynchronous Repeated

Traversal) that aims to present a single-cycle datapath all the way from the source to the

destination. They do not add any additional fast physical express links in the datapath; instead,

they drive the shared crossbars and links asynchronously up to multiple hops within a single

cycle. They designed a router and link microarchitecture to achieve such a traversal, and a flow-

control technique to arbitrate and set up multihop paths within a cycle. A place-and-route design

at 45 nm achieves 11 hops within a 1-GHz cycle for paths without turns (9 hops for paths with

turns). The authors observe 5 to 8 times reduction in low-load latencies across synthetic traffic

patterns on an 8×8 chip multiprocessor, compared to a baseline one-cycle router

network. Full-system simulations with Splash-2 and Parsec benchmarks demonstrate 27 and 52

percent reduction in runtime for private and shared level-2 designs, respectively.











Mob.No: 09677465689,




coherent sampling, but it requires a significantly smaller number of points to make the FFT

realization more suitable for on-chipbuilt-in testing and calibration applications that require area

and power efficiency. The technique was assessed by comparing the simulation results from the

proposed method of single and multiple tones with the simulation results obtained from the FFT

of coherently sampled tones. The results indicate that the proper selection of test tone

frequencies can avoid spectral leakage even with multiple narrowly spaced tones. When low-

frequency signals are captured with an analog-to-digital converter (ADC) for on-chip analysis,

the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth






4. Methodology for adapting on-chip interconnect architectures

Network-on-chip (NoC) has been proposed to solve the scalability problem experienced in bus-

based system-on-chip. The main challenge is the ability to predict the quality of service that the

network infrastructure provides while meeting other system constraints, namely power and area.

Although these architectures are regular with predictable electrical parameters, they may suffer

from higher latency and lower throughput. To tackle this issue, the network structure needs to be

adaptable in response to the needs of the application. This paper presents a methodology for





Mob.No: 09677465689,


augmenting an NoC with a programmable infrastructure that allows application-specific

adaptation. Based on the developed infrastructure, an algorithm is also presented for static

adaptation based on application traffic patterns. To evaluate the proposed methodology of the

adaptable NoC, the WK-recursive on-chip interconnect is used as a case study. Simulations are

conducted and reported results demonstrate the usefulness of the proposed approach.

5. Energy Efficiency Optimization Through Codesign of the Transmitter and Receiver

in High-Speed On-Chip Interconnects

A novel equalized global link architecture and driver-receiver codesign flow are proposed for

high-speed and low-energy on-chip communication by utilizing a continuous-time linear

equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the

formula of CTLE eye opening is derived to provide high-level design guidelines and insights.

Compared with the separate driver-receiver design flow, over 50% energy reduction is observed.

The final optimal solution achieves 20-Gb/s signaling over 10 mm, 2.6- μm pitch on-

chip transmission line with 15.5-ps/mm latency and 0.196-pJ/b energy using 45-nm technology.

Monte Carlo simulation also shows that 3 σ/μ for power and delay variation in the proposed

global link are 13.1% and 4.6%, respectively.

6. Fault-Tolerant Network Interfaces for Networks-on-Chip

As the complexity of designs increases and technology scales down into the deep-submicron

domain, the probability of malfunctions and failures in the networks-on-chip (NoCs) components

increases. In this work, we focus on the study and evaluation of techniques for increasing

reliability and resilience of network interfaces (NIs) within NoC-based multiprocessor system-





Mob.No: 09677465689,


on-chip architectures. NIs act as interfaces between intellectual property cores and the

communication infrastructure; the faulty behavior of one of them could affect, therefore, the

overall system. In this work, we propose a functional fault model for the NI components by

evaluating their susceptibility to faults. We present a two-level fault-tolerant solution that can be

employed for mitigating the effects of both permanent and temporary faults in the NI.

Experimental simulations show that with a limited overhead, we can obtain an NI reliability

comparable to the one obtainable by implementing the system by using standard triple modular

redundancy techniques, while saving up to 48 percent in area, as well as obtaining a significant

energy reduction.

7. On Deadlock Problem of On-Chip Buses Supporting Out-of-Order Transactions

Modern on-chip communication protocols such as advanced eXtensible interface and open core

protocol support advanced transactions to improve communication efficiency. Out-of-order

transactions that allow responses to be returned in an order different from their request order play

an important role in this improvement. However, a deadlock situation may occur if these

transactions are not properly manipulated. In this paper, we address the deadlock problem in an

on-chip bus system supporting out-of-order transactions. We present a graphic model that can

well represent the status of a bus system and show that a cycle exists in the graph if and only if

the bus system is in an unsafe state that may lead to a bus deadlock. Based on this model, we

propose a novel bus design technique that can efficiently resolve the bus deadlock problem.

Experimental results show that buses with the proposed technique can be up to 3.3 times faster

than those with the currently available techniques.





Mob.No: 09677465689,


8. Built-In Binary Code Inversion Technique for On-Chip Flash Memory Sense

Amplifier With Reduced Read Current Consumption

The bit-line sense amplifier (S/A) for on-chip flash memory compares cell current with reference

current to identify data that are programmed. The S/A for 0 (erased) cell data consumes a large

sink current, which is greater than off-current for 1 (programmed) cell data. This brief proposes a

built-in write/read path based on binary inversion methods to reduce the sensing current of S/A.

An original binary code is programmed into flash memory with an inverted binary code based on

the proposed bit inversion techniques. The de-inversion hardware, which is implemented with

small logic gates to restore original binary data, only consumes logic current instead of analog

sink current in the S/A. The proposed techniques are evaluated for the DSPStone benchmark and

are applied to the modified S/A for ARM Cortex-M3-based microcontroller with 128-kB on-

chip flash memory based on a 0.18-um EEPROM technology. The circuit-level simulation result

for the DSPStone benchmark shows that a newly implemented chip with the S/A based on the

proposed technique consumes approximately less than 22% of the operating power that

conventional S/A uses.

9. Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip

As technology shrinks, the power dissipated by the links of a network-on-chip (NoC) starts to

compete with the power dissipated by the other elements of the communication subsystem,

namely, the routers and the network interfaces (NIs). In this paper, we present a set of data

encoding schemes aimed at reducing the power dissipated by the links of an NoC. The proposed

schemes are general and transparent with respect to the underlying NoC fabric (i.e., their





Mob.No: 09677465689,


application does not require any modification of the routers and link architecture). Experiments

carried out on both synthetic and real traffic scenarios show the effectiveness of the proposed

schemes, which allow to save up to 51% of power dissipation and 14% of energy consumption

without any significant performance degradation and with less than 15% area overhead in the NI.

10. Path-Congestion-Aware Adaptive Routing With a Contention Prediction Scheme

for Network-on-Chip Systems

Network-on-chip systems can achieve higher performance than bus systems

for chip multiprocessor systems. However, as the complexity of the network increases, the

channel and switch congestion problems become major performance bottlenecks. An effective

adaptive routing algorithm can help minimize path congestion through load balancing. However,

conventional adaptive routing schemes only use channel-based information to detect the

congestion status. Due to the lack of switch-based information, channel-based information is

difficult to reveal the real congestion status along the routing path. Therefore, in this paper, we

remodel the path congestion information to show hidden spatial congestion information and

improve the effectiveness of routing path selection. We propose a path-congestion-aware

adaptive routing (PCAR) scheme based on the following techniques: 1) a path-congestion-aware

selection strategy that simultaneously considers switch congestion and channel congestion, and

2) a contention prediction technique that uses the rate of change in the buffer level to predict

possible switch contention. The experimental results show that the proposed PCAR scheme can

achieve a high saturation throughput with an improvement of 15.4%-48.7% compared to existing

routing schemes. The proposed PCAR method also includes a VLSI architecture, which has





Mob.No: 09677465689,


higher area efficiency with an improvement of 16%-35.7% compared with the other router

designs.

11. On-Chip Codeword Generation to Cope With Crosstalk

Capacitive and inductive coupling between bus lines results in crosstalk induced delays. Many

bus encoding techniques have been proposed to improve the performance. Existing

implementation techniques and mapping algorithms in the literature only apply the specific

encoding. This paper presents the first generalized framework for a stall-free on-chip codeword

generation strategy that is scalable and easy to automate. It is applicable to the coupling aware

encoding techniques that allow recursive codeword generation. The proposed implementation

strategy iteratively generates codewords without explicitly enumerating them. Codeword

mapping relies on graph-based representation that is unique to the given encoding technique. The

codewords are calculated on-chip using basic function blocks, such as adders and multiplexers.

Three encoding techniques were implemented using the proposed strategy. Experimental results

show significant reduction in the area overhead and power dissipation over the existing method

that uses random logic to implement the codec.

12. Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement

Many-core processors will have many processing cores with a network-on-chip (NoC) that

provides access to shared resources such as main memory and on-chip caches. However, locally-

fair arbitration in multi-stage NoC can lead to globally unfair access to shared resources and

impact system-level performance depending on where each task is physically placed. In this

work, we propose an arbitration to provide equality-of-service (EoS) in the network and provide





Mob.No: 09677465689,


support for location-oblivious task placement. We propose using probabilistic arbitration

combined with distance-based weights to achieve EoS and overcome the limitation of round-

robin arbiter. However, the complexity of probabilistic arbitration results in high area and long

latency which negatively impacts performance. In order to reduce the hardware complexity, we

propose an hybrid arbiter that switches between a simple arbiter at low load and a complex

arbiter at high load. The hybrid arbiter is enabled by the observation that arbitration only impacts

the overall performance and global fairness at a high load. We evaluate our arbitration scheme

with synthetic traffic patterns and GPGPU benchmarks. Our results shows that hybrid arbiter that

combines round-robin arbiter with probabilistic distance-based arbitration reduces performance

variation as task placement is varied and also improves average IPC.

13. DPPC: Dynamic Power Partitioning and Control for Improved Chip

Multiprocessor Performance

A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a

power budget limited by the CMP’s cooling, packaging, and power supply capacities. Most

existing solutions rely solely on dynamic voltage and frequency scaling (DVFS) to adapt the

power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2)

cache. This paper proposes DPPC, achip-level power partitioning and control strategy that can

dynamically and explicitly partition the chip-level power budget among different CPU cores and

the shared last-level cache in a CMP based on the workload characteristics measured online.

DPPC features a novel performance-power model and an online model estimator to

quantitatively estimate the performance contributed by each core and the cache with their

respective local power budgets. DPPC then re-partitions the chip-level power budget among





Mob.No: 09677465689,


them for optimized CMP performance. The partitioned local power budgets for the CPU cores

and cache are precisely enforced by power control algorithms designed rigorously based on

feedback control theory. Our extensive experimental results demonstrate that DPPC achieves

better CMP performance, within a given power budget, than several state-of-the-art power

control solutions for both SPEC CPU2006 benchmarks and multi-threaded SPLASH-2

workloads.

14. Crosstalk-Aware Multiple Error Detection Scheme Based on Two-Dimensional

Parities for Energy Efficient Network on Chip

Achieving reliable operation under the influence of deep-submicrometer noise sources including

crosstalk noise at low voltage operation is a major challenge for network on chip links. In this

paper, we propose a coding scheme that simultaneously addresses crosstalk effects on signal

delay and detects up to seven random errors through wire duplication and simple parity checks

calculated over the rows and columns of the two-dimensional data. This high error detection

capability enables the reduction of operating voltage on the wire leading to energy saving. The

results show that the proposed scheme reduces the energy consumption up to 53% as compared

to other schemes at iso-reliability performance despite the increase in the overhead number of

wires. In addition, it has small penalty on the network performance, represented by the average

latency and comparable codec area overhead to other schemes.

IEEE 2013-2014 Project titles

Documents

Transcript of IEEE 2013-2014 Project titles