SlideShare a Scribd company logo
1 of 18
Download to read offline
Wiener Filter Realization using Hardware.
QR decomposition of matrices and inversion
by Givens’ Rotation
Semester Project Report
Akashdip Das
Abantika Chowdhury
Sayan Chaudhuri
Guide : Dr. Ayan Banerjee
Electronics and Telecommunication Engineering Department
December, 2016
1 Abstract 3
2 Introduction 3
3 Wiener Filtering 4
4 Q-R decomposition of a matrix 6
5 Hardware for inversion of an upper triangular matrix(R) 9
5.1 Storage in a RAM . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Address generation Mechanism . . . . . . . . . . . . . . . . . 10
5.3 Hardware for finding the inverse of diagonal elements . . . . . 12
5.4 Hardware for the finding the inverse of the other elements . . 13
6 Conclusions 15
6.1 Multi PORT RAM for faster performance . . . . . . . . . . . 15
6.2 Distributed Arithmetic for computing the product of the two
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7 Acknowledgements 17
1 Abstract
Super-resolution reconstruction is a method for reconstructing higher reso-
lution images from a set of low resolution observations. The sub-pixel differ-
ences among different observations of the same scene allow to create higher
resolution images with better quality. In the last thirty years, many methods
for creating high resolution images have been proposed. However, hardware
implementations of such methods are limited. Wiener filter design is one
of the techniques we will use initially for this process. Wiener filter design
involves matrix inversion. A novel method for the matrix inversion has been
proposed in the report. QR decomposition will be the computational algo-
rithm used using Givens Rotation.
2 Introduction
The process of super resolution initially requires that the image be restored
from the effects of noise and degradation(assumed isotropic). For that pur-
pose the Wiener Filter is used that basically helps in forming an estimate of
the image from the degraded one.The fundamentals of the Wiener Filtering
has been discussed in Section 3. The Wiener Filtering requires generation of
the inverse of a given matrix The method followed here is the QR Decompo-
sition(discussed in Section 4). The QR decomposition involves generation of
an upper triangular matrix which we will be inverting in the proposed algo-
rithm. Various techniques for decomposition of the matrix has been discussed
in papers [3],[4]. However the inversion of a matrix proposed by them was
not sufficient for the general solution for the problem. Rather the solution
was illustrated for a specific system of 3x3 matrix. The QR decomposition
involves forming an upper triangular matrix and an orthogonal matrix. The
inversion of an orthogonal matrix is simply obtained by computing its trans-
pose. The inversion of the upper triangular matrix has been discussed in this
paper. The solutions available for this process is for a 3x3 or 4x4 system.
So in this paper we have generalized the inversion to a nxn system. The
hardware that is required for this purpose has been developed in Section 5
along with sound reasoning and justification. The hardware that has been
developed has scopes for enhanced performance that has been discussed in
section 6
3 Wiener Filtering
In signal processing, the Wiener filter is a filter used to produce an estimate
of a desired or target random process by linear time-invariant (LTI) filtering
of an observed noisy process, assuming known stationary signal and noise
spectra, and additive noise. The Wiener filter minimizes the mean square
error between the estimated random process and the desired process. The
goal of the Wiener filter is to compute a statistical estimate of an unknown
signal using a related signal as an input and filtering that known signal to
produce the estimate as an output. For example, the known signal might
consist of an unknown signal of interest that has been corrupted by additive
noise. The Wiener filter can be used to filter out the noise from the corrupted
signal to provide an estimate of the underlying signal of interest. he Wiener
filter is based on a statistical approach based on MMSE (Minimum Mean
Square Error).The causal finite impulse response (FIR) Wiener filter, instead
of using some given data matrix X and output vector Y, finds optimal tap
weights by using the statistics of the input and output signals. It populates
the input matrix X with estimates of the auto-correlation of the input signal
(T) and populates the output vector Y with estimates of the cross-correlation
between the output and input signals (V).
In order to derive the coefficients of the Wiener filter, consider the signal
w[n] being fed to a Wiener filter of order N and with coefficients {a0, · · · , aN }.
The output of the filter is denoted x[n] which is given by the expression.
x[n] = N
i=0 aiw[n − i].
The residual error is denoted e[n] and is defined as e[n] = x[n] s[n] (see the
corresponding block diagram). The Wiener filter is designed so as to mini-
mize the mean square error (MMSE criteria) which can be stated concisely
as follows:
ai = arg min E e2
[n] , where E[·] denotes the expectation operator. In
the general case, the coefficientsai may be complex and may be derived for
the case where w[n] and s[n] are complex as well. With a complex signal, the
matrix to be solved is a Hermitian Toeplitz matrix, rather than symmetric
Toeplitz matrix. For simplicity, the following considers only the case where
all these quantities are real. The mean square error (MSE) may be rewritten
E e2
[n] = E (x[n] − s[n])2
= E x2
[n] + E s2
[n] − 2E[x[n]s[n]]
= E
aiw[n − i]
 + E s2
[n] − 2E
aiw[n − i]s[n]
To find the vector [a0, . . . , aN ] which minimizes the expression above, calcu-
late its derivative with respect to each ai
E e2
[n] =
aiw[n − i]
 + E s2
[n] − 2E
aiw[n − i]s[n]
= 2E
ajw[n − j] w[n − i] − 2E[s[n]w[n − i]]
= 2
E[w[n − j]w[n − i]]aj − 2E[w[n − i]s[n]]
Assuming that w[n] and s[n] are each stationary and jointly stationary, the
sequencesRw[m] and Rws[m] known respectively as the autocorrelation of
w[n] and the cross-correlation between w[n] and s[n] can be defined as fol-
Rw[m] = E{w[n]w[n + m]}
Rws[m] = E{w[n]s[n + m]}
The derivative of the MSE may therefore be rewritten as (notice that Rws[−i] = Rsw[i])
E e2
[n] = 2
Rw[j − i]aj − 2Rsw[i] i = 0, · · · , N.
Letting the derivative be equal to zero results in
Rw[j − i]aj = Rsw[i] i = 0, · · · , N.
which can be rewritten in matrix form
Rw[0] Rw[1] · · · Rw[N]
Rw[1] Rw[0] · · · Rw[N − 1]
Rw[N] Rw[N − 1] · · · Rw[0]
These equations are known as the Wiener–Hopf equations. The matrix T ap-
pearing in the equation is a symmetric Toeplitz matrix. Under suitable con-
ditions on R , these matrices are known to be positive definite and therefore
non-singular yielding a unique solution to the determination of the Wiener
filter coefficient vector,
a = T−1
It is this equation that makes it necessary to design a Matrix Inversion Hard-
ware that is faster than the existing ones so that there is less delay in image
processing and also generalization to NxN form. The inversion of the matrix
will be done in this paper using QR decomposition using Givens Rotation
4 Q-R decomposition of a matrix
QR Decomposition: QR decomposition is one of the most important opera-
tions in linear algebra. It can be used to find matrix inversion, to solve a set of
simulations equations or in numerous applications in scientific computing. It
represents one of the relatively small numbers of matrix operation primitive
from which a wide range of algorithms can be realized. QR decomposition
is an elementary operation, which decomposes a matrix into an orthogonal
and a triangular matrix. QR decomposition of a real square matrix A is a
decomposition of A as A = QR, where Q is an orthogonal matrix (QT Q =
I) and R is an upper triangular matrix. And we can factor m x n matrices
(with m n) of full rank as the product of an m x n orthogonal matrix where
QT Q = I and an n x n upper triangular matrix. There are different meth-
ods which can be used to compute QR decomposition. The techniques for
QR decomposition are Gram-Schmidt ortho-normalization method, House-
holder reflections, and the Givens rotations. Each decomposition method has
a number of advantages and disadvantages because of their specific solution
process.The Givens’ Rotation Technique has been discussed
If there are two nonzero vectors, x and y, in a plane, the angle, θ, between
them can be formalized as :
cos(θ)= (x,y)
The rotation will be performed using 16 bit pipelined CORDIC.
This formula can be extended to n vectors. The angle, θ , can be defined
θ=arccos (x,y)
A=QR where R is an upper triangular matrix and R is an orthogonal matrix.
Consider a 4X4 system
A =
a1,1 a1,2 a1,3 a1,4
a2,1 a2,2 a2,3 a2,4
a3,1 a3,2 a3,3 a3,4
a4,1 a4,2 a4,3 a4,4
R =
a1,1 a1,2 a1,3 a1,4
0 a2,2 a2,3 a2,4
0 0 a3,3 a3,4
0 0 0 a4,4
The matrix of Givens Rotation is
G(i,j, θ) =
1 0 0 0
0 cos(θ) sin(θ) 0
0 −sin(θ) cos(θ) 0
0 0 0 1
Givens Rotation process utilizes a cycle of rotation whose function is to
null an element in the sub-diagonal of the matrix forming the QR matrix. Q
matrix is obtained by concatenating all the Givens Rotation.
R is to be found from three rotation where each element is obtained from
each rotation. Givens Rotation matrices needed for a 3x3 system
G1 =
cos(θ) 0 sin(θ)
0 1 0
−sin(θ) 0 cos(θ)
G2 =
cos(θ) sin(θ) 0
−sin(θ) cos(θ) 0
cos(θ) cos(θ) 1
G3 =
1 0 0
cos(θ) cos(θ) sin(θ)
cos(θ) −sin(θ) cos(θ)
θ, A(3,1) , A(2,1), A(3,2) can be obtained using
c1 = A1(1,1)
c2 = A1(1,1)
c3 = A1(1,1)
s1 = A1(3,1)
s2 = A1(2,1)
s3 = A1(3,2)
Q = G1
A2 = G1A1
A3 = G2A2
R = G3A3
A = QR
= (QR)−1
= (R)−1
= (R)−1
This nececitates the formation of the inverse of the upper triangular ma-
trix and it’s subsequent multiplication to the transpose of the orthogonal
Figure 1: Basic Hardware for matrix inversion using QR decomposition.The
G matrix is formed using Givens Rotation performed using CORDIC
5 Hardware for inversion of an upper trian-
gular matrix(R)
We have designed the hardware for inversion of a generalised N X N upper
triangular matrix R. where R=
r1,1 r1,2 · · · r1,n
0 r2,2 · · · r2,n
0 0 · · · rn,n
Let B be (R)−1
. The algorithm is as followed
1 f or ( row=1;row<=n ; row++)
2 B(row , row )=1/R(row , row )
3 next row
4 f or ( row=1;row<=n ; row++)
5 f or ( col=row+1; col<=n ; col++)
6 s=0
7 f or (k=1;k<=col −1;k++)
8 s=s+B(row , k)R(k , col )
9 s=−s /R( col , col )
10 B(row , col )=s
11 next k
12 next col
13 next row
We observe that the inverse of the upper triangular matrix is also an
upper triangular matrix with the diagonal elements reciprocal of the diag-
onal elements of the original matrix. The inverse of the other elements are
calculated recursively using the algorithm as mentioned above. An example
to illustrate how the algorithm works is shown below. Let A be an upper
triangular matrix and B be its inverse then
a1,1 a1,2 a1.3 · · · r1,n
0 a2,2 a2,3 · · · a2,n
0 0 a3,3 · · · a3,n
0 0 0 · · · an,n
b1,1 b1,2 ab1.3 · · · br1,n
0 b2,2 b2,3 · · · b2,n
0 0 b3,3 · · · b3,n
0 0 0 · · · bn,n
Since AB=I
a1,1 a1,2 a1.3 · · · r1,n
0 a2,2 a2,3 · · · a2,n
0 0 a3,3 · · · a3,n
0 0 0 · · · an,n
b1,1 b1,2 ab1.3 · · · br1,n
0 b2,2 b2,3 · · · b2,n
0 0 b3,3 · · · b3,n
0 0 0 · · · bn,n
1 0 0 · · · 0
0 1 0 · · · 0
0 0 1 · · · 0
0 0 0 · · · 1
Multiplying the ith
row of matrix A with the ith
column of B yields ai,ibi,i=1.
Hence we see that bi,i = 1
Now to solve for the non diagonal elements of the matrix B. We multiply the
first row and second column first to get a1,1b1,2+a1,2b2,2=0. We already know
thw value of b2,2 So the only unknown is b1,2. Now in general to obtain the
value of bi,j we multiply the ith
row of A and the jth
column of B and equate
that to 0 proceeding in a proper sequence of steps so that the values of b that
are needed to do the forward substitution are obtained from beforehand.
5.1 Storage in a RAM
In any matrix total number of elements = n x n=n2
. In the upper triangular
matrix generated here the number of non-zero elements is n(n−1)
since the
rest of the elements are zero in the bottom left triangle.So for minimisation
of hardware we have come up with an algorithm to omit storage of the zeros
in the RAM. If the zeros were not omitted the position of the element ri,j
would be j + (i-1)x n. However since this is not the case we are required to
develop an algorithm to generate the RAM location address for given i, j and
5.2 Address generation Mechanism
As in the upper triangular matrix the ri,j = 0 for i<j; there would no need for
storing them as zeroes individually in the RAM, instead we could just omit
the zeroes and find the position in the RAM corresponding to inputs (i,j)
that is ri,j would be given and a corresponding location in the RAM would
be obtained in our mechanism where zeroes are not stored, the address in
the RAM for ri,j would be equal to
Now this formula is obtained from the fact that in the actual system we
would have the address of the element ri,j as j + (i-1)x n but this time for
each row we are omitting i zeros, so the cumulative number of zeros omitted
is i
k=1 k
Figure 2: Block diagram of the address generation block
Figure 3: Circuit diagram of the address generation block
Hardware Required :
4 adders/subtractors
2 multipliers
1 bit right shifter
5.3 Hardware for finding the inverse of diagonal ele-
The following circuit (Figure 4) can be used for inversion of the diagonal
elements of the upper triangular matrix. The circuit consists of a loadable
up counter that counts till the number of rows in the matrix. Hence the
comparator to indicate that this process needs to stop when the value n is
reached. The circuit then sends value to the address generator block of RAM
A and then the same address is sent to RAM B so that the data is modified
in the same location in both RAM and RAM B.
Hardware Required :
1 Loadable Up Counter
1 Comparator
1 Inverter Block that computes the inverse of a 16 bit number.
Time Required :
Same as n clock pulses
Figure 4: Schematic hardware design for inversion of diagonal elements
5.4 Hardware for the finding the inverse of the other
The following circuit(Figure 5) can be used for diagonalizing all elements
other than the diagonal elements.
Hardware Required :
3 Loadable Up Counter
4 address generation blocks
1 divider
1 multiplier
4 adders/subtractors
1 Register
Necessary control circuits for termination of loops
No. of clock cycles needed :
Figure 5: Schematic hardware design for inversion of elements other than
those lying in the principal diagonal
6 Conclusions
6.1 Multi PORT RAM for faster performance
One of the obstacles in the way of obtaining high performance in computing
is the memory-wall . If the processing elements cannot get the data from reg-
ister file (RF) at the processing rate, this causes a bottleneck that adversely
affects the overall performance. In order to meet the requirement of proper
data usage between the computational units, such a computation system
needs a register file that can meet the requirements of different computing
units on the FPGA. The demand to process more data per unit time requires
multiple read and write operations at a time, which can be achieved by the
usage of multi-port register files (MPo-RFs) instead of conventional single-
port RFs (SPo-RF).Multi-ported memories are challenging to implement on
FPGAs since the block RAMs included in the fabric typically have only two
ports. Hence we must construct memories requiring more than two ports
either out of logic elements or by combining multiple block RAMs. Some
Conventional Multi-Port Register File Implementations that can be used:
1. Distributed Memory
2. Replication
3. Banking
4. Multi-pumping
6.2 Distributed Arithmetic for computing the product
of the two matrices
Distributed arithmetic is a technique developed for the real-time computation
of the inner product of the vector with constant elements and the vector
with varying coefficients. The inner product is computed without splitting
into operations of multiplication and addition. At calculation, operations
of summation and shift of inner products of an unchangeable vector and a
bit-slice of a changeable vector are carried out. All possible values of partial
inner products are calculated offline and written down in Look Up Table
(LUT).The content of LUT is computed dynamically in the online mode.
Contents of this memory remain invariable for the period of multiplication of
the left matrix by a column of the right matrix. Despite need of calculation
of contents of LUT total number of micro-operations of addition decreases
Figure 6: 4 Read + 1 Write block RAM as an example of Multiport RAM
in comparison with a classical way of calculation of matrix product.
7 Acknowledgements
The authors would like to thank their Project Guide Dr. Ayan Banerjee
for his invaluable suggestions and proper direction throughout the course
of the project. Thankfulness and heartfelt gratitude is also extended to Mr.
Anirban Chakraborty who is currently pursuing his Ph.D under the guidance
of Prof. Ayan Banerjee.
[1] Gonzalez, R. C., Woods, R. E. (2002). Digital image processing. Upper
Saddle River, NJ: Prentice Hall.
[2] Seyid K, Blanc S, Leblebici Y Hardware Implementation of Real-Time
Multiple Frame Super-Resolution eyid Very Large Scale Integration
(VLSI-SoC), 2015 IFIP/IEEE International Conference on
[3] Matrix Inversion Using QR Decomposition by Parabolic Synthesis Nafiz
Ahmed Chisty—
[4] Brown, Robert Grover; Hwang, Patrick Y.C. (1996). Introduction to Ran-
dom Signals and Applied Kalman Filtering (3 ed.). New York: John Wiley
Sons. ISBN 0-471-12839-2.
[5] D. Boulfelfel, R.M. Rangayyan, L.J. Hahn, and R. Kloiber, 1994, ”Three-
dimensional restoration of single photon emission computed tomography
images”, IEEE Transactions on Nuclear Science, 41(5): 1746-1754, Octo-
ber 1994
[6] Wiener, Norbert (1949). Extrapolation, Interpolation, and Smoothing of
Stationary Time Series. New York: Wiley. ISBN 0-262-73005-7.
[7] Thomas Kailath, Ali H. Sayed, and Babak Hassibi, Linear Estimation,
Prentice-Hall, NJ, 2000, ISBN 978-0-13-022464-4.
[8] Wiener N: The interpolation, extrapolation and smoothing of stationary
time series’, Report of the Services 19, Research Project DIC-6037 MIT,
February 1942
[9] Kolmogorov A.N: ’Stationary sequences in Hilbert space’, (In Russian)
Bull. Moscow Univ. 1941 vol.2 no.6 1-40. English translation in Kailath
T. (ed.) Linear least squares estimation Dowden, Hutchinson Ross 1977
[10] Vladislav Lesnikov, Tatiana Naumovich, Alexander Chastikov, ”Modifi-
cation of the architecture of a distributed arithmetic”, East-West Design
Test Symposium (EWDTS) 2015 IEEE, pp. 1-4, 2015.
[11] Tips Tricks: Creating a 2W+4R FPGA Block RAM, Part 1 ´Alvaro
Lopes, Senior Software engineer, Critical Software
[12] An Efficient FPGA Implementation of Scalable Matrix Inversion Core
using QR Decomposition

More Related Content

What's hot

DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersAmr E. Mohamed
13 fourierfiltrationen
13 fourierfiltrationen13 fourierfiltrationen
13 fourierfiltrationenhoailinhtinh
Image restoration1
Image restoration1Image restoration1
Image restoration1moorthim7
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformAmr E. Mohamed
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)Shajun Nisha
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal ProcessingSandip Ladi
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingAmr E. Mohamed
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformationsJohn Williams
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114IJRAT
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2Ali Baig
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidXavier Davias
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions ssuser2797e4
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed

What's hot (20)

DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital Filters
13 fourierfiltrationen
13 fourierfiltrationen13 fourierfiltrationen
13 fourierfiltrationen
Image restoration1
Image restoration1Image restoration1
Image restoration1
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier Transform
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformations
Unit ii
Unit iiUnit ii
Unit ii
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114
Image transforms
Image transformsImage transforms
Image transforms
Signal Processing Homework Help
Signal Processing Homework HelpSignal Processing Homework Help
Signal Processing Homework Help
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions
Matched filter
Matched filterMatched filter
Matched filter
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems

Similar to Wiener Filter Hardware Realization

Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithmrahulmonikasharma
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...sipij
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
Performance evaluation of ds cdma
Performance evaluation of ds cdmaPerformance evaluation of ds cdma
Performance evaluation of ds cdmacaijjournal
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewIJARIIT
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...eSAT Journals
Mining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsMining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsDr.MAYA NAYAK
A novel architecture of rns based
A novel architecture of rns basedA novel architecture of rns based
A novel architecture of rns basedVLSICS Design
Espacios y subepacios vectoriales
Espacios y subepacios vectorialesEspacios y subepacios vectoriales
Espacios y subepacios vectorialesMirianArcos1
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...ijcsit
A novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteA novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteeSAT Publishing House
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check IJECEIAES

Similar to Wiener Filter Hardware Realization (20)

Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
Performance evaluation of ds cdma
Performance evaluation of ds cdmaPerformance evaluation of ds cdma
Performance evaluation of ds cdma
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a review
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...
Mining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsMining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systems
06075626 (1)
06075626 (1)06075626 (1)
06075626 (1)
A novel architecture of rns based
A novel architecture of rns basedA novel architecture of rns based
A novel architecture of rns based
Espacios y subepacios vectoriales
Espacios y subepacios vectorialesEspacios y subepacios vectoriales
Espacios y subepacios vectoriales
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...
A novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteA novel approach for high speed convolution of finite
A novel approach for high speed convolution of finite
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check

Recently uploaded

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan

Recently uploaded (20)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx

Wiener Filter Hardware Realization

  • 1. Wiener Filter Realization using Hardware. QR decomposition of matrices and inversion by Givens’ Rotation *************************************** 7th Semester Project Report Akashdip Das Abantika Chowdhury Sayan Chaudhuri Guide : Dr. Ayan Banerjee Electronics and Telecommunication Engineering Department December, 2016 1
  • 2. Contents 1 Abstract 3 2 Introduction 3 3 Wiener Filtering 4 4 Q-R decomposition of a matrix 6 5 Hardware for inversion of an upper triangular matrix(R) 9 5.1 Storage in a RAM . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Address generation Mechanism . . . . . . . . . . . . . . . . . 10 5.3 Hardware for finding the inverse of diagonal elements . . . . . 12 5.4 Hardware for the finding the inverse of the other elements . . 13 6 Conclusions 15 6.1 Multi PORT RAM for faster performance . . . . . . . . . . . 15 6.2 Distributed Arithmetic for computing the product of the two matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7 Acknowledgements 17 2
  • 3. 1 Abstract Super-resolution reconstruction is a method for reconstructing higher reso- lution images from a set of low resolution observations. The sub-pixel differ- ences among different observations of the same scene allow to create higher resolution images with better quality. In the last thirty years, many methods for creating high resolution images have been proposed. However, hardware implementations of such methods are limited. Wiener filter design is one of the techniques we will use initially for this process. Wiener filter design involves matrix inversion. A novel method for the matrix inversion has been proposed in the report. QR decomposition will be the computational algo- rithm used using Givens Rotation. 2 Introduction The process of super resolution initially requires that the image be restored from the effects of noise and degradation(assumed isotropic). For that pur- pose the Wiener Filter is used that basically helps in forming an estimate of the image from the degraded one.The fundamentals of the Wiener Filtering has been discussed in Section 3. The Wiener Filtering requires generation of the inverse of a given matrix The method followed here is the QR Decompo- sition(discussed in Section 4). The QR decomposition involves generation of an upper triangular matrix which we will be inverting in the proposed algo- rithm. Various techniques for decomposition of the matrix has been discussed in papers [3],[4]. However the inversion of a matrix proposed by them was not sufficient for the general solution for the problem. Rather the solution was illustrated for a specific system of 3x3 matrix. The QR decomposition involves forming an upper triangular matrix and an orthogonal matrix. The inversion of an orthogonal matrix is simply obtained by computing its trans- pose. The inversion of the upper triangular matrix has been discussed in this paper. The solutions available for this process is for a 3x3 or 4x4 system. So in this paper we have generalized the inversion to a nxn system. The hardware that is required for this purpose has been developed in Section 5 along with sound reasoning and justification. The hardware that has been developed has scopes for enhanced performance that has been discussed in section 6 3
  • 4. 3 Wiener Filtering In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process. The goal of the Wiener filter is to compute a statistical estimate of an unknown signal using a related signal as an input and filtering that known signal to produce the estimate as an output. For example, the known signal might consist of an unknown signal of interest that has been corrupted by additive noise. The Wiener filter can be used to filter out the noise from the corrupted signal to provide an estimate of the underlying signal of interest. he Wiener filter is based on a statistical approach based on MMSE (Minimum Mean Square Error).The causal finite impulse response (FIR) Wiener filter, instead of using some given data matrix X and output vector Y, finds optimal tap weights by using the statistics of the input and output signals. It populates the input matrix X with estimates of the auto-correlation of the input signal (T) and populates the output vector Y with estimates of the cross-correlation between the output and input signals (V). In order to derive the coefficients of the Wiener filter, consider the signal w[n] being fed to a Wiener filter of order N and with coefficients {a0, · · · , aN }. The output of the filter is denoted x[n] which is given by the expression. x[n] = N i=0 aiw[n − i]. The residual error is denoted e[n] and is defined as e[n] = x[n] s[n] (see the corresponding block diagram). The Wiener filter is designed so as to mini- mize the mean square error (MMSE criteria) which can be stated concisely as follows: ai = arg min E e2 [n] , where E[·] denotes the expectation operator. In the general case, the coefficientsai may be complex and may be derived for the case where w[n] and s[n] are complex as well. With a complex signal, the matrix to be solved is a Hermitian Toeplitz matrix, rather than symmetric Toeplitz matrix. For simplicity, the following considers only the case where all these quantities are real. The mean square error (MSE) may be rewritten as: 4
  • 5. E e2 [n] = E (x[n] − s[n])2 = E x2 [n] + E s2 [n] − 2E[x[n]s[n]] = E   N i=0 aiw[n − i] 2   + E s2 [n] − 2E N i=0 aiw[n − i]s[n] To find the vector [a0, . . . , aN ] which minimizes the expression above, calcu- late its derivative with respect to each ai ∂ ∂ai E e2 [n] = ∂ ∂ai    E   N i=0 aiw[n − i] 2   + E s2 [n] − 2E N i=0 aiw[n − i]s[n]    = 2E N j=0 ajw[n − j] w[n − i] − 2E[s[n]w[n − i]] = 2 N j=0 E[w[n − j]w[n − i]]aj − 2E[w[n − i]s[n]] Assuming that w[n] and s[n] are each stationary and jointly stationary, the sequencesRw[m] and Rws[m] known respectively as the autocorrelation of w[n] and the cross-correlation between w[n] and s[n] can be defined as fol- lows: Rw[m] = E{w[n]w[n + m]} Rws[m] = E{w[n]s[n + m]} The derivative of the MSE may therefore be rewritten as (notice that Rws[−i] = Rsw[i]) ∂ ∂ai E e2 [n] = 2 N j=0 Rw[j − i]aj − 2Rsw[i] i = 0, · · · , N. Letting the derivative be equal to zero results in N j=0 Rw[j − i]aj = Rsw[i] i = 0, · · · , N. which can be rewritten in matrix form      Rw[0] Rw[1] · · · Rw[N] Rw[1] Rw[0] · · · Rw[N − 1] ... ... ... ... Rw[N] Rw[N − 1] · · · Rw[0]      T      a0 a1 ... aN      a =      Rsw[0] Rsw[1] ... Rsw[N]      v These equations are known as the Wiener–Hopf equations. The matrix T ap- 5
  • 6. pearing in the equation is a symmetric Toeplitz matrix. Under suitable con- ditions on R , these matrices are known to be positive definite and therefore non-singular yielding a unique solution to the determination of the Wiener filter coefficient vector, a = T−1 v It is this equation that makes it necessary to design a Matrix Inversion Hard- ware that is faster than the existing ones so that there is less delay in image processing and also generalization to NxN form. The inversion of the matrix will be done in this paper using QR decomposition using Givens Rotation 4 Q-R decomposition of a matrix QR Decomposition: QR decomposition is one of the most important opera- tions in linear algebra. It can be used to find matrix inversion, to solve a set of simulations equations or in numerous applications in scientific computing. It represents one of the relatively small numbers of matrix operation primitive from which a wide range of algorithms can be realized. QR decomposition is an elementary operation, which decomposes a matrix into an orthogonal and a triangular matrix. QR decomposition of a real square matrix A is a decomposition of A as A = QR, where Q is an orthogonal matrix (QT Q = I) and R is an upper triangular matrix. And we can factor m x n matrices (with m n) of full rank as the product of an m x n orthogonal matrix where QT Q = I and an n x n upper triangular matrix. There are different meth- ods which can be used to compute QR decomposition. The techniques for QR decomposition are Gram-Schmidt ortho-normalization method, House- holder reflections, and the Givens rotations. Each decomposition method has a number of advantages and disadvantages because of their specific solution process.The Givens’ Rotation Technique has been discussed If there are two nonzero vectors, x and y, in a plane, the angle, θ, between them can be formalized as : cos(θ)= (x,y) ||x||2||y||2 The rotation will be performed using 16 bit pipelined CORDIC. This formula can be extended to n vectors. The angle, θ , can be defined as 6
  • 7. θ=arccos (x,y) ||x||2||y||2 ((A−1 ) −1 )=A A=QR where R is an upper triangular matrix and R is an orthogonal matrix. I=QQT Consider a 4X4 system A =     a1,1 a1,2 a1,3 a1,4 a2,1 a2,2 a2,3 a2,4 a3,1 a3,2 a3,3 a3,4 a4,1 a4,2 a4,3 a4,4     R =     a1,1 a1,2 a1,3 a1,4 0 a2,2 a2,3 a2,4 0 0 a3,3 a3,4 0 0 0 a4,4     The matrix of Givens Rotation is G(i,j, θ) =     1 0 0 0 0 cos(θ) sin(θ) 0 0 −sin(θ) cos(θ) 0 0 0 0 1     Givens Rotation process utilizes a cycle of rotation whose function is to null an element in the sub-diagonal of the matrix forming the QR matrix. Q matrix is obtained by concatenating all the Givens Rotation. R is to be found from three rotation where each element is obtained from each rotation. Givens Rotation matrices needed for a 3x3 system G1 =   cos(θ) 0 sin(θ) 0 1 0 −sin(θ) 0 cos(θ)   G2 =   cos(θ) sin(θ) 0 −sin(θ) cos(θ) 0 cos(θ) cos(θ) 1   G3 =   1 0 0 cos(θ) cos(θ) sin(θ) cos(θ) −sin(θ) cos(θ)   θ, A(3,1) , A(2,1), A(3,2) can be obtained using c1 = A1(1,1) √ A1(3,1)2 +A1(1,1)2 7
  • 8. c2 = A1(1,1) √ A1(2,2)2 +A1(3,2)2 c3 = A1(1,1) √ A1(2,2)2 +A1(3,2)2 s1 = A1(3,1) √ A1(3,1)2 +A1(1,1)2 s2 = A1(2,1) √ A1(2,1)2 +A1(1,1)2 s3 = A1(3,2) √ A1(2,2)2 +A1(3,2)2 Q = G1 T .G2 T .G3 T A2 = G1A1 A3 = G2A2 R = G3A3 A = QR A−1 = (QR)−1 A−1 = (R)−1 (Q)−1 A−1 = (R)−1 (Q)T This nececitates the formation of the inverse of the upper triangular ma- trix and it’s subsequent multiplication to the transpose of the orthogonal matrix. Figure 1: Basic Hardware for matrix inversion using QR decomposition.The G matrix is formed using Givens Rotation performed using CORDIC 8
  • 9. 5 Hardware for inversion of an upper trian- gular matrix(R) We have designed the hardware for inversion of a generalised N X N upper triangular matrix R. where R=      r1,1 r1,2 · · · r1,n 0 r2,2 · · · r2,n ... ... ... ... 0 0 · · · rn,n      Let B be (R)−1 . The algorithm is as followed 1 f or ( row=1;row<=n ; row++) 2 B(row , row )=1/R(row , row ) 3 next row 4 f or ( row=1;row<=n ; row++) 5 f or ( col=row+1; col<=n ; col++) 6 s=0 7 f or (k=1;k<=col −1;k++) 8 s=s+B(row , k)R(k , col ) 9 s=−s /R( col , col ) 10 B(row , col )=s 11 next k 12 next col 13 next row We observe that the inverse of the upper triangular matrix is also an upper triangular matrix with the diagonal elements reciprocal of the diag- onal elements of the original matrix. The inverse of the other elements are calculated recursively using the algorithm as mentioned above. An example to illustrate how the algorithm works is shown below. Let A be an upper triangular matrix and B be its inverse then A=        a1,1 a1,2 a1.3 · · · r1,n 0 a2,2 a2,3 · · · a2,n 0 0 a3,3 · · · a3,n ... ... ... ... ... 0 0 0 · · · an,n        B=        b1,1 b1,2 ab1.3 · · · br1,n 0 b2,2 b2,3 · · · b2,n 0 0 b3,3 · · · b3,n ... ... ... ... ... 0 0 0 · · · bn,n        Since AB=I 9
  • 10.        a1,1 a1,2 a1.3 · · · r1,n 0 a2,2 a2,3 · · · a2,n 0 0 a3,3 · · · a3,n ... ... ... ... ... 0 0 0 · · · an,n               b1,1 b1,2 ab1.3 · · · br1,n 0 b2,2 b2,3 · · · b2,n 0 0 b3,3 · · · b3,n ... ... ... ... ... 0 0 0 · · · bn,n        =        1 0 0 · · · 0 0 1 0 · · · 0 0 0 1 · · · 0 ... ... ... ... ... 0 0 0 · · · 1        Multiplying the ith row of matrix A with the ith column of B yields ai,ibi,i=1. Hence we see that bi,i = 1 ai,i Now to solve for the non diagonal elements of the matrix B. We multiply the first row and second column first to get a1,1b1,2+a1,2b2,2=0. We already know thw value of b2,2 So the only unknown is b1,2. Now in general to obtain the value of bi,j we multiply the ith row of A and the jth column of B and equate that to 0 proceeding in a proper sequence of steps so that the values of b that are needed to do the forward substitution are obtained from beforehand. 5.1 Storage in a RAM In any matrix total number of elements = n x n=n2 . In the upper triangular matrix generated here the number of non-zero elements is n(n−1) 2 since the rest of the elements are zero in the bottom left triangle.So for minimisation of hardware we have come up with an algorithm to omit storage of the zeros in the RAM. If the zeros were not omitted the position of the element ri,j would be j + (i-1)x n. However since this is not the case we are required to develop an algorithm to generate the RAM location address for given i, j and n 5.2 Address generation Mechanism As in the upper triangular matrix the ri,j = 0 for i<j; there would no need for storing them as zeroes individually in the RAM, instead we could just omit the zeroes and find the position in the RAM corresponding to inputs (i,j) that is ri,j would be given and a corresponding location in the RAM would be obtained in our mechanism where zeroes are not stored, the address in the RAM for ri,j would be equal to n(i-1)+j-i(i−1) 2 -1. Now this formula is obtained from the fact that in the actual system we would have the address of the element ri,j as j + (i-1)x n but this time for 10
  • 11. each row we are omitting i zeros, so the cumulative number of zeros omitted is i k=1 k Figure 2: Block diagram of the address generation block Figure 3: Circuit diagram of the address generation block Hardware Required : 11
  • 12. 4 adders/subtractors 2 multipliers 1 bit right shifter 5.3 Hardware for finding the inverse of diagonal ele- ments The following circuit (Figure 4) can be used for inversion of the diagonal elements of the upper triangular matrix. The circuit consists of a loadable up counter that counts till the number of rows in the matrix. Hence the comparator to indicate that this process needs to stop when the value n is reached. The circuit then sends value to the address generator block of RAM A and then the same address is sent to RAM B so that the data is modified in the same location in both RAM and RAM B. Hardware Required : 1 Loadable Up Counter 1 Comparator 1 Inverter Block that computes the inverse of a 16 bit number. Time Required : Same as n clock pulses 12
  • 13. Figure 4: Schematic hardware design for inversion of diagonal elements 5.4 Hardware for the finding the inverse of the other elements The following circuit(Figure 5) can be used for diagonalizing all elements other than the diagonal elements. Hardware Required : 3 Loadable Up Counter 4 address generation blocks 1 divider 1 multiplier 4 adders/subtractors 1 Register Necessary control circuits for termination of loops No. of clock cycles needed : O(n2 ) 13
  • 14. Figure 5: Schematic hardware design for inversion of elements other than those lying in the principal diagonal 14
  • 15. 6 Conclusions 6.1 Multi PORT RAM for faster performance One of the obstacles in the way of obtaining high performance in computing is the memory-wall . If the processing elements cannot get the data from reg- ister file (RF) at the processing rate, this causes a bottleneck that adversely affects the overall performance. In order to meet the requirement of proper data usage between the computational units, such a computation system needs a register file that can meet the requirements of different computing units on the FPGA. The demand to process more data per unit time requires multiple read and write operations at a time, which can be achieved by the usage of multi-port register files (MPo-RFs) instead of conventional single- port RFs (SPo-RF).Multi-ported memories are challenging to implement on FPGAs since the block RAMs included in the fabric typically have only two ports. Hence we must construct memories requiring more than two ports either out of logic elements or by combining multiple block RAMs. Some Conventional Multi-Port Register File Implementations that can be used: 1. Distributed Memory 2. Replication 3. Banking 4. Multi-pumping 6.2 Distributed Arithmetic for computing the product of the two matrices Distributed arithmetic is a technique developed for the real-time computation of the inner product of the vector with constant elements and the vector with varying coefficients. The inner product is computed without splitting into operations of multiplication and addition. At calculation, operations of summation and shift of inner products of an unchangeable vector and a bit-slice of a changeable vector are carried out. All possible values of partial inner products are calculated offline and written down in Look Up Table (LUT).The content of LUT is computed dynamically in the online mode. Contents of this memory remain invariable for the period of multiplication of the left matrix by a column of the right matrix. Despite need of calculation of contents of LUT total number of micro-operations of addition decreases 15
  • 16. Figure 6: 4 Read + 1 Write block RAM as an example of Multiport RAM in comparison with a classical way of calculation of matrix product. 16
  • 17. 7 Acknowledgements The authors would like to thank their Project Guide Dr. Ayan Banerjee for his invaluable suggestions and proper direction throughout the course of the project. Thankfulness and heartfelt gratitude is also extended to Mr. Anirban Chakraborty who is currently pursuing his Ph.D under the guidance of Prof. Ayan Banerjee. References [1] Gonzalez, R. C., Woods, R. E. (2002). Digital image processing. Upper Saddle River, NJ: Prentice Hall. [2] Seyid K, Blanc S, Leblebici Y Hardware Implementation of Real-Time Multiple Frame Super-Resolution eyid Very Large Scale Integration (VLSI-SoC), 2015 IFIP/IEEE International Conference on [3] Matrix Inversion Using QR Decomposition by Parabolic Synthesis Nafiz Ahmed Chisty— [4] Brown, Robert Grover; Hwang, Patrick Y.C. (1996). Introduction to Ran- dom Signals and Applied Kalman Filtering (3 ed.). New York: John Wiley Sons. ISBN 0-471-12839-2. [5] D. Boulfelfel, R.M. Rangayyan, L.J. Hahn, and R. Kloiber, 1994, ”Three- dimensional restoration of single photon emission computed tomography images”, IEEE Transactions on Nuclear Science, 41(5): 1746-1754, Octo- ber 1994 [6] Wiener, Norbert (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series. New York: Wiley. ISBN 0-262-73005-7. [7] Thomas Kailath, Ali H. Sayed, and Babak Hassibi, Linear Estimation, Prentice-Hall, NJ, 2000, ISBN 978-0-13-022464-4. [8] Wiener N: The interpolation, extrapolation and smoothing of stationary time series’, Report of the Services 19, Research Project DIC-6037 MIT, February 1942 17
  • 18. [9] Kolmogorov A.N: ’Stationary sequences in Hilbert space’, (In Russian) Bull. Moscow Univ. 1941 vol.2 no.6 1-40. English translation in Kailath T. (ed.) Linear least squares estimation Dowden, Hutchinson Ross 1977 [10] Vladislav Lesnikov, Tatiana Naumovich, Alexander Chastikov, ”Modifi- cation of the architecture of a distributed arithmetic”, East-West Design Test Symposium (EWDTS) 2015 IEEE, pp. 1-4, 2015. [11] Tips Tricks: Creating a 2W+4R FPGA Block RAM, Part 1 ´Alvaro Lopes, Senior Software engineer, Critical Software [12] An Efficient FPGA Implementation of Scalable Matrix Inversion Core using QR Decomposition 18