A Machine Learning Approach for Speech Detection in Modern Wireless Communication Environment

odern wireless communication has gained a improved position as compared to previous time. Similarly, speech communication is the major focus area of research in respective applications. Many developments are done in this field. In this work, we have chosen the OFDM modulation based communication system, as it has importance in both licensed and unlicensed wireless communication platform. The voice signal is passed though the proposed model to obtain at the receiver end. Due to different circumstances, the signal may be corrupted partially at the user end. Authors try to achieve a better signal for reception using a neural network model of RBFN. The parameters are chosen for the RBFN model, as energy, ZCR, ACF, and fundamental frequency of the speech signal. In one part these parameters have eligibility to eliminate noise partially, where as in other part the RBFN model with these parameters proves its efficacy for both noisy speech signals with noisy channel as Gaussian channel. The efficiency of OFDM model is verified in terms of symbol error rate and the transmitted speech signal is evaluated in term of SNR that shows the reduction of noise. For visual inspection, a sample of signal, noisy signal and received signal is also shown. The experiment is performed with 5dB, 10dB, 15dB noise levels. The result proves the performance of RBFN model as the filter. The performance is measured as the listener's voice in each condition. The results show that, at the time of the voice in noise environment, proposed technique improves the intelligibility on speech quality.


Introduction
In 70's the Wireless and cellular concept was developed and become more popular than expected at the time.Since then the focus in this area increased till now.The clarity at the receiver end is highly essential.Day-by-day the communication techniques are developed along with the type of modulation.Researchers focus on the capacity as well as the reception capability [1].The technology including OFDM, MIMO and coding types are to be observed for higher bandwidth utilization with better response in less time.Congestion in licensed spectrum increases in gradual manner for which unlicensed spectrum utilization has been occurred that is developed by researchers [2][3].
Many different factors are included in wireless communication, such as environmental factors, channel condition, types of speakers, and the way we transmit in the channel.Research of speech technology requires changing the way of transmission, and reception in specific type of communication [4].
The Speech is a common mode of Communication between of human being.Speech signal is an important data for communication network.The detection and recognition accurately is most important criteria.To maintain wireless communication successfully speech of human being should listen and speak clearly.As well the matching among network should be maintained perfectly.In such cases the synthetic speech is also verified in many problems [5][6][7] .
Speech has potential of being important mode of interaction with computer for this evaluation.For errorless communication and noiseless signal reception, initially speech signal is processed for noise suppression and enhancement.It helps in choosing the technique in digital age point of view [8][9].
It is a challenge for the next technological development to make the natural speech reception through HCI at the user end.Speech processing is exciting areas of research in signal processing and one type of pattern recognition problem.The choice and use of features should relevant for the purpose of detection.Again it must well manage at the time of training and testing on use of machine learning techniques.
Different Technologies are used for faster communication in both licensed and unlicensed spectrum utilization.These are (i) OFDM is a FDM scheme where digital multicarrier modulation method is utilized.The data is subdivided into parallel data streams sharing every sub-carrier.Further every sub-carrier is modulated with a modulation method like QAM, and variant of PSK.
(ii) MIMO technique is used with multiple antennas along with transmitters and receiver to improve performance.It offers increase in data and receivers throughput and link range without additional bandwidth.
(iii) Turbo Code-It is a category of high performance error correction codes which was developed in 1993.The coding technique is one of the significant technologies in communication.It is helpful for maximum information transmission without error/ relatively small error.
The respective generations of wireless communication have many advantages like higher bandwidth, Better response time.It works at 2.6 GHz frequency implies that better coverage even though with same tower.The gradual developed generations provide higher flexibility as compared to already existing technologies [10].New technology with less cost and better usage needs to simplify hardware with effective design so that the versatility can be maintained with the same handset with better reception capability for different generations.
The paper is organized as follows.Section 1 introduces the work.Section 2 provides the methodology proposed in this work.It explains the principle of hearing the speech communication model.Through the model the speech is communicated.For detection purpose Radial Basis Function Network (RBFN) is used and explained with its parameters as the problem formulation.Section 3 explains the result and section 4 concludes the work.

Methodology
One important mechanism for received signal is source separation that has the capability to remove time-frequency regions where the speech signal is less distorted [11].To increase the success of communication, adaptation is required at different levels, such as subject, place and vocabulary.As a result the Lombard effect [5] can be analyzed.The parameters for Lombard speech like intensity, vowel duration, speaking rate, energy distribution, spectral tilt, formant frequency are observed by researchers.

OFDM System
OFDM technique is used due to better transmission capacity and high bandwidth efficiency in wireless communication for both licensed and non-licensed spectrum.Such system is based on spreading technique with low rate carriers.The spacing between the orthogonal components is generated using the Fast Fourier Transform technique [12].The data is converted to parallel stream and grouped.Further, it is modulated using either Quadrature Amplitude Modulation (QAM), or Quadrature Phase Shift Keying (QPSK), or Binary Phase Shift Keying (BPSK).Finally, required spectrum is then converted back to its time domain signal using an Inverse Fast Fourier Transform (IFFT).At the receiver end it is converted from parallel to serial for transmission of data.With this technology the system is designed by considering the Gaussian noise channel [13].
The basic model of OFDM system is presented in Fig. 1.Input signal as considered for transmitted symbols through the wireless Gaussian channel.The impulse response of the channel can be expressed as, Where α m represents the amplitudes.It is formed by a N-point DFT N and is expressed as, For N-independent channel the expression will be, Where h=(h 0 ,h 1…….. .hn-1 ) T that can be considered as attenuation of the channel and n=(n 0 ,n 1…….. .nn-1 ) T is a noise vector.The system can be formulated as, y XFg n  Where X is the input data and can be expressed in terms of twiddle factor as, 00 0( 1) ( 1)0 ( 1)( 1) The twiddle factor is defined as, The MMSE estimate of h becomes, Where, signifies the cross covariance matrix and the auto covariance.Again R hh is the auto covariance matrix of h and denotes the noise variance.Assuming these quantities to be known, the MMSE estimates (h MMSE ) will be, HH ˆˆF X y The LS estimator for channel impulse response h is analyzed as follows, Similarly the least square channel estimator can be formulated as, considering the two equations we have, From equation ( 7) and ( 11) it is shown the LS estimate has a high mean square error as compared to MMSE estimation technique.

Energy
It is defined as the squared signal.In speech signal case it is analyzed frame wise.Hence the short time energy is to be evaluated considering different windowed signal [14].The energy of the speech signal reflects the amplitude variations.Short-time energy can define as: where , s(i) represents the signal, w(m) represents the window and En represents the energy.

Zero-Crossing Rate
The rate of change of signal from positive to negative is defined as the Zero crossing Rate (ZCR).It is a measure of number of times in particular time interval/frame.As a result the amplitude of the speech signals passes through a value of zero.The zero crossing rate of a signal can be found by using The model for speech production suggests that the energy of voiced speech is concentrated about 3 kHz as the spectrum fall of glottal wave and for unvoiced speech, the energy is found at higher frequencies.Since high frequencies imply high zero crossing rates, and energy.There is a strong correlation between zero-crossing rate and energy distribution.Therefore, another parameter is considered as autocorrelation coefficient to keep relevancy at received signal.

Autocorrelation
For clean speech and separation of noise autocorrelation coefficient has a major role alike to energy and ZCR.It works not only for noise elimination, but also for smoothening the signal.It is a type of cross correlation convolution.The relation is expressed as, It is of finite energy of Signal.Similarly for the measurement of frequency as low or high fundamental frequency is an important parameter of speech and is considered.

Fundamental frequency.
As the human voice varies over a range of frequencies, the fundamental frequencies cannot be considered as a specific value.Though it is an essential component of speech and speaker recognition, it has a similar application in voice communication and is taken as an attribute.

Radial basis function Network (RBFN) Model for Detection
Both RBFN is used and tested for detection accuracy.Different possible hybridization of features has been attempted.Radial basis function Network (RBFN) consists of an input layer, a hidden layer and a linear output layer.In this case, the Gaussian kernel as activation function is used and the distance is evaluated [15].The hidden layer depends on a non-linear RBF activation function [16][17].The output of the network is found as the distance between the input vector and the vector of the centre of the Gaussian function and can be expressed as [18][19][20].
where, R is the RBF, j c is the center, is the distance between input and the center.
12 , ,... j x x x are represented as the inputs, 12 , .... j y y y are the outputs and 12 , ,... j w w w are the weights of the network.The target output is obtained by updating the corresponding weights.The output to weight and input is given as, where, j w is the weight of the j th center and N is the length of the signal.The structure of the network is shown in Figure1.The network is operated with the activation function that is Gaussian and is expressed as, where,  is the width of the center.The network is trained with adaptive learning method and is described in following subsection as proposed method.

Result
The work consists of two parts as modulation technique and noise elimination through neural network model.The results are obtained from both the techniques and depicted in this section.The bit error rate is obtained to validate the OFDM system and is shown in Fig. 3. To strength it the MSE is found and is shown in Fig. 4.
Once the system found suitable, the chosen parameters are given to the RBFN model.One of the sample of speech is shown in the Fig. 5 for visual aid.The corresponding outputs for original signal, noisy signal and obtained result with noiseless signal are shown in Fig. 6 and Fig. 7. From this result it is clear that the voice signal is well communicated and can be suitable for next generation wireless network.

Conclusions
From the work, it is concluded that the detection accuracy depends largely on the types and size of the features fed as input.Graphical analysis shows that Intensity or energy appears to be the best feature for detection.Recognition of emotional speech in communication can provide a new future direction.

Fig 3 .
Fig 3. Plot of SNR vs. MSE for OFDM system

Fig. 7 :
Fig.7: The Enhanced Signal at the Receiver End

Table 1 :
Accuracy performance of RBFN with different combinational features