# AN EFFICIENT MULTIPLE CLOCK CYCLE REAL-TIME IMPLEMENTATION OF THE SIGNAL DEPENDENT METHOD FOR TIME-FREQUENCY ANALYSIS

Veselin N. Ivanović Dept. of Electrical Engineering, University of Montenegro e-mail: very@cg.ac.yu web: <u>http://www.tfsa.cg.yu</u>

**Abstract** – Multiple clock cycle implementation of a system for time-frequency (TF) analysis is presented. Proposed architecture realizes a signal dependent method allowing it to take different numbers of clock cycles for achieving the desired signal concentration in different time-instants, depending on the analyzed signal shape. Also, this architecture allows signal dependent method to share functional units within a single execution. These abilities represent the major advantages of multicycle design and they help reduce both hardware complexity and the implementation cost.

## 1. INTRODUCTION AND THE PROBLEM FORMULATION

S-method (SM) for TF analysis alleviates serious drawbacks of the commonly used TF distributions (TFDs), spectrogram (SPEC) and pseudo Wigner distribution (WD): low concentration in the TF plane and generation of cross-terms in the case of multicomponent signals analysis, respectively, [4]. The SM definition is, [4,5,6]:

$$SM(n,k) = \sum_{i=-L(n,k)}^{L(n,k)} P_{(n,k)}(i)F(n,k+i)F^{*}(n,k-i), \qquad (1)$$

where F(n,k) represents the short-time Fourier transform (STFT) of the analyzed signal f(n), 2L(n,k)+1 is the width of a finite frequency domain (convolution) rectangular window  $P_{(n,k)}(i), P_{(n,k)}(i)=0$ , for |i|>L(n,k), which is limited (without loss of generality) by the assumed maximal possible width  $2L_{\max}+1$ ,  $L_{\max}\leq N/2$  and the signal's duration is  $N=2^{m}$ . Definition (1), based on STFT, makes the SM very attractive for implementation. However, all TFDs, beyond the STFT, are numerically quite complex and require significant calculation time. This fact makes them unsuitable for realtime analysis, and severely restricts their application. Hardware implementations, when they are possible, can overcome this problem and enable application of these methods in numerous additional problems in practice. Some simple implementations of the systems for TF analysis are presented in [2,3,6,7,9,10]. They give desired TFD in one clock cycle. This means that no architecture resource can be used more than once, and that any element needed more than once must be duplicated.

SM produces, as its marginal cases, the WD and the SPEC with maximal (L(n,k)=N/2), and minimal (L(n,k)=0) convolution window width, respectively. In the case of a multicomponent signal with non-overlapping components, by an appropriate window  $P_{(n,k)}(i)$  width selection the SM can produce a sum of the WDs of individual signal components,

avoiding cross-terms, [4,6,8]:  $P_{(n,k)}(i)$  should be wide enough to enable complete integration over the auto-terms, but narrower than the distance between two auto-terms. In addition, the SM produces better results than the SPEC and the WD, regarding calculation complexity and noise influence, [4,5].

Two possibilities for the SM (1) implementation are:

1) With a signal independent (constant) convolution window width L(n,k), L(n,k)=L=const, [4,6,7], when, in order to get the WD for each component, the convolution window width L(n,k) should be such that 2L+1 is equal to the width of the widest auto-term. But, for the entire TF plane, except at the central points of the widest component, this window should be too long;

2) With a signal dependent L(n,k) (so called signal dependent SM), [8]. In this case, summation in (1) is performed until zero value of F(n,k+i) or F(n,k-i) is detected. Practically it means that the absolute square value of F(n,k+i) or F(n,k-i) is smaller than an assumed reference level, which is defined as a few percents of the spectrogram's maximal value at a considered time-instant n,  $R_n^2 = \max_k S(n,k) / Q^2$ , [6,7], where S(n,k) is the SPEC of analyzed signal and  $1 \le Q < \infty$ . In the sequel, the signals that determine nonzero values of  $F(n,k\pm i)$  (i=0,1,...,L(n,k)) will be denoted by  $x_{\pm i}$ :  $x_{\pm i}=1$  if  $|F(n,k\pm i)|^2 > R_n^2$ , and  $x_{\pm i}=0$  otherwise. Note that the signal dependent form may further significantly improve the essential SM properties (auto-terms concentration, cross-terms reduction and noise influence suppression), [5,8].

In order to involve only real multiplications in (1), we modify it by using  $F(n,k)=F_{\text{Re}}(n,k)+jF_{\text{Im}}(n,k)$  ( $F_{\text{Re}}(n,k)$  and  $F_{\text{Im}}(n,k)$  are the real and imaginary part of F(n,k), respectively), as:

$$SM_R(n,k) = F_{\rm Re}^2(n,k) + 2\sum_{i=1}^{L(n,k)} F_{\rm Re}(n,k+i)F_{\rm Re}(n,k-i),$$
 (2)

$$SM_{I}(n,k) = F_{\rm Im}^{2}(n,k) + 2\sum_{i=1}^{L(n,k)} F_{\rm Im}(n,k+i)F_{\rm Im}(n,k-i), \quad (3)$$

where  $SM(n,k)=SM_R(n,k)+SM_I(n,k)$ . The *k*-th channel, one of *N* channels (obtained for k=0,1,...,N-1), is described by (2)-(3). Note that it will consist of two identical sub-channels used for processing of  $F_{\text{Re}}(n,k)$  and  $F_{\text{Im}}(n,k)$ , respectively.

## 2. MULTICYCLE HARDWARE IMPLEMENTATION

The hardware necessary for one channel multicycle implementation of the signal dependent SM is presented in Fig.1.



Fig. 1. Architecture for the multicycle implementation of the signal dependent S-method.

Proposed hardware is designed based on a two-block structure. The first block is used for the STFT implementation, whereas the second block is used to modify the outputs of the STFT block in order to obtain the improved TFD concentration based on the SM. The STFT block can be implemented by using the available FFT chips, or by using approaches based on the recursive algorithm, [1,2,6,7,10]. The second block is designed so that it realizes each summation term from the eqs.(2)-(3) in the corresponding step of the method implementation.

In order to accommodate hardware for signal dependent window width, we add two N/2-input multiplexors to generate SignDep(endent) control signal, which determines whether or not the *i*-th term enters the summation in (2)-(3). With the zero value of the SignDep control signal, adding the new term to the calculated SM value (and stored into the SMStore temporary register) is disabled, since the additional improvement of the TFD concentration is impossable. It takes different values in different steps defined as:

$$SignDep = x_i \cdot x_{-i}, \ i = 0, 1, 2, ..., L_{\max}.$$
 (4)

The circuit needed to generate signal  $x_i$  is separated with dashed lines and represented in Fig.1.

We break the SM execution into several steps, each taking one clock cycle. Our goal in breaking the execution into clock cycles should be to balance the amount of work done in each cycle, so that we minimize the clock cycle time. In the first step, the STFT is calculated and the signals  $x_i$ ,  $i=\pm 1,\pm 2,...,\pm (N/2-1)$  are set. In the second step the SPEC is calculated based on the first step execution. These steps have to be executed since SPEC value should be forwarded to the output anyway. Namely, even if  $|F(n,k)|^2 \leq R_n^2$ , for all k, i.e.  $x_0=0$  (practically, these are points (n,k) with no signal) the convolution window width takes zero value, and than the SM takes its marginal form - SPEC, [4,5]. Execution of the second step is provided by setting the unit value to the first input of the N/2-input multiplexors instead of  $x_0$ , so  $SignDep\equiv1$  in the second - SPEC completion step.

Table I. Total number of functional units per channel in an SM block and the clock cycle time in the cases of: a) single-cycle implementation (SCI) and b) the multicycle implementation (MCI).  $T_m$  is the multiplication time of a two-input 16-bit multiplier,  $T_a$  is the addition time of a two-input 16-bit adder, whereas  $T_s$  is the time for 1-bit shift.

| Implementation | Adders          | Multipliers       | Shift Left Registers | Clock cycle time                 |
|----------------|-----------------|-------------------|----------------------|----------------------------------|
| SCI            | $2L_{\max} + 1$ | $2(L_{\max} + 1)$ | $4L_{\rm max}$       | $2T_m + (L_{\max} + 3)T_a + T_s$ |
| MCI            | 3               | 2                 | 2                    | $T_m + 2T_a + T_s$               |

Table II. The function of each of the control signal generated by the Control logic.

| Control signal | Effect                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SelSTFT        | (m-1)-bit signal which controls $N/2$ -input multiplexors (two of them per subchannel are introduced to select between the STFT values from different channels, whereas two multiplexors are introduced to generate <i>SignDep</i> control signal in the corresponding step).                                                                                                                                                                                         |
| SPECorSM       | 1-bit signal with the following functions: 1. Enables use of the Shift Left register in the corresponding steps (when we need to implement multiplication by 2), or disables this (in the second step); 2. Enables use of only one adder per sub-channel for implementing sums in eqs.(2)-(3) by controlling its second input, which can be either the constant 0 (in the second step) or a register <i>Real</i> (or <i>Imag</i> ) value (in each higher order step). |
| SignLoad       | 1-bit signal which enables sampling of the analyzed analog signal $f(t)$ , but only after execution of the desired TFD of the analyzed signal samples from the preceding time instant.                                                                                                                                                                                                                                                                                |
| SMWriteCond    | 1-bit signal which enables, together with the <i>SignDep</i> control signal, saving the computed SM value into the <i>SMStore</i> temporary register.                                                                                                                                                                                                                                                                                                                 |

In different time-instants the proposed architecture executes the same number  $(L_{\max}+1)$  of steps, since the sampling rate of the analyzed analog signal f(t) assumes  $(L_{\max}+1)T_c$ , where  $T_c$  is the clock cycle time. Generally, the SM(n,k) value is calculated in the *L*-th step, where  $L \leq L_{\max}+1$ , and held into the *SMStore* temporary register.

Each sub-channel of the second block contains exactly one adder, one multiplier, and one shift left register for implementation of eqs.(2)-(3). Because we need to use these functional units with different inputs in later steps, we must save the computed values, based on eqs.(2)-(3), into a temporary registers that we name Real and Imag, respectively. In order to share functional units from the second block for different inputs in different clock cycles, we need to add multiplexors and/or a demultiplexor at their inputs. In the first step only the STFT block of the proposed two-block architecture is used, whereas in the other steps only the second block is used. This will be regulated by the set of control signals introduced on temporary registers, and multiplexors and a demultiplexor, see Table II. By introducing the temporary registers and several multiplexors at the inputs of the functional units, we achieve the required reduction of the amount of hardware compared to a singlecycle architecture, [6,7] and Table I. Note that the achieved hardware reduction is significant. Since temporary registers and the introduced multiplexors are fairly small, this could yield a substantial reduction in the hardware cost.

The longest path in the second block is one that connects the inputs  $F_{\text{Re}}(n,k)$  (or  $F_{\text{Im}}(n,k)$ ), through one multiplier, one shift left register and 2 adders, with the output of the second block. It can be the longest path in the STFT block, as well (when the STFT block is realized based on the recursive algorithm, [6,7]). This path determines the clock cycle time and than the fastest sampling rate. This design can be implemented as an ASIC chip to meet the speed and performance demands of very fast real-time applications.

**Defining the Control.** From the defined multi-step sequence of the signal dependent SM multicycle execution we can determine what the Control logic must do at each clock cycle. It can set all but one of the control signals, based solely on the distribution code (TFDcode). The register *SMStore* write control signal is the exception. To generate it we will need to AND together a *SMWriteCond* signal from the control unit, with *SignDep* control signal.

Control for the multicycle architecture must specify both the signals to be set in any step and the next step in the sequence. Here, we use finite state Moore machine to specify the multicycle control, Fig.2. Note that the implementation of a finite state machine usually assumes that all outputs, that are not explicitly asserted, are deasserted. Finite state Control essentially corresponds to the steps of signal dependent SM execution; each state in the finite state machine will take one clock cycle.

## **3. CONCLUSION**

The system for multiple clock cycle implementation of the signal dependent TF algorithm is presented. The proposed implementation allows a functional unit to be used more than once per TFDs execution, as long as it is used on different clock cycles, and, consequently, enables a significant reduction of hardware complexity and cost. The major advantages of the proposed design are the ability to allow the method to take different number of clock cycles in order to achieve the desired TFD concentration in different time instants and the ability to share functional units within the execution.



Fig. 2. The finite state machine Control for the multicycle hardware implementation of the signal dependent SM from Fig.1.

## 4. REFERENCES

- [1] M.G. Amin, "A new approach to recursive Fourier transform," *Proc. IEEE*, vol. 75, 1987, pp. 1357-1358.
- [2] K.J.R. Liu, "Novel parallel architectures for Short-time Fourier transform," *IEEE Trans. on Circuits and Systems-II*, vol. 40, no. 12, 1993, pp. 786-789.
- [3] B. Boashash, and J.B. Black, "An efficient real time implementation of the Wigner-Ville distribution," *IEEE Trans. Acoust., Speech, Signal Processing*, vol. 35, no. 11, 1987, pp. 1611-1618.
- [4] V.N. Ivanović, and LJ. Stanković, "Multiple clock cycle real-time implementation of a system for time-frequency analysis," in *Proceedings of the 12th EUSIPCO*, Viena, Austria, Sept.2004, pp. 1633-1636.
- [5] V.N. Ivanović, and LJ. Stanković, "The multiple clock cycle implementation of the flexible special purpose hardware for time-frequency analysis," *The Journal of VLSI Signal Processing*, submitted.
- [6] D.A. Patterson, and J.L. Hennessy, Computer organization and design, the hardware/software interface, Morgan Kaufmann Publishers, San Mateo, California, 1994.
- [7] LJ. Stanković, "A method for time-frequency analysis," *IEEE Trans. on Signal Processing*, vol. 42, no. 1, 1994, pp. 225-229.
- [8] LJ. Stanković, V.N. Ivanović, and Z. Petrović, "Unified approach to the noise analysis in the Wigner distribution and spectrogram," *Annales des Telecommunications*, vol. 51, no. 11-12, 1996, pp. 585-594.
- [9] S. Stanković, and LJ. Stanković, "An architecture for the realization of a system for time-frequency analysis," *IEEE Trans. on Circuits and Systems-II*, vol. 44, no. 7, 1997, pp. 600-604.
- [10] S. Stanković, LJ. Stanković, V.N. Ivanović, and R. Stojanović, "An architecture for the VLSI design of

systems for time-frequency analysis and time-varyin filtering," *Annales des Telecommunications*, vol. 57, no. 9-10, 2002, pp. 974-995.

- [11] LJ. Stanković, and J.F. Bohme, "Time-frequency analysis of multiple resonances in combustion engine signals," *Signal Processing*, vol. 79, no. 1, Nov. 1999, pp. 15-28.
- [12] K. Maharatna, A.S. Dhar, and S. Banerjee, "A VLSI array architecture for realization of DFT, DHT, DCT and DST," *Signal Processing*, vol. 81, no. 9, 2001, pp. 1813-1822.
- [13] K.J.R. Liu, and C.T. Chiu, "Unified parallel lattice structures for time-recursive discrete cosine/sine/Hartley transforms," *IEEE Trans. on Signal Processing*, vol. 41, no. 3, 1993, pp. 1357-1377.

Sažetak: U radu je prezentirana implementacija sistema za vremensko-frekvencijsku analizu kojim se generiše željena reprezentacija signala u više taktnih intervala. Predložena arhitektura realizuje od signala zavisni metod omogućavajući mu dostizanje maksimalne koncentracije analiziranog signala u različitim trenucima, zavisno od oblika samog signala. Željena koncentracija se postiže za različit broj taktnih intervala u različitim trenucima. Istovremeno, ova arhitektura dozvoljava od signala zavisnom metodu da sharuje upotrijebljene funkcionalne jedinice za vrijeme izvršavanja. Ove mogućnosti predstavljaju glavne prednosti predloženog dizajna i pomažu u redukovanju složenosti hardwarea i njegove cijene.

## EFIKASNA IMPLEMENTACIJA OD SIGNALA ZAVISNOG METODA ZA VREMENSKO-FREKVENCIJSKU ANALIZU U VIŠE TAKTNIH INTERVALA

Veselin N. Ivanović