Alireza Heshmati

Research Assistant as a Signal Processing Engineer at Electronics Research Institute (ERI), Sharif University of Technology, Interested in Speech and Image Processing, Trustworthy AI, Robustness of Deep learning Networks, NLP, Compressed Sensing 🧑‍💻💻✖️➖


2022 - . . .
I am a research assistant at the Electronics Research Institute (ERI) of Sharif University of Technology. I worked on functional Persian language speech processing modules including Automatic Speech Recognition (ASR), Speech Enhancement (SE), Few-Shot Keyword Spotting (FSKWS), and Voice Activity Detection (VAD), as well as adversarial attacks and the robustness of deep neural networks (DNNs). I have been as a coordinator of a research group at ERI for a period and contributed to the publication of a paper on designing fast gradual sparse attacks to DNNs in the Elsevier EAAI journal. I am focusing on Speaker Recognition and Diarization.
2020 - 2022
My last degree was M.Sc. in Comunications System at Sharif University of Technology. My M.Sc. project was about designing sparse imperceptible adversarial attacks to DNNs. I published a paper about designing low coherent measurement matrix in IEEE SPL journal. I was a Teaching Assistant for Fundamental of Electrical Engineering course.
2016 - 2020
I was the first rank of our bachelor entries of Electrical Engineering at Shiraz University of Technology. I was Teaching Assistant for Electrical Circuits (I) (x1) and Differential Equations (x2), and a main member of the Central Council of the Electrical Scientific Association for one year.
projects
Designing Pixel-wise and Group-wise Attacks to Deep Neural Networks This project focuses on designing pixel-wise and group-wise attacks to identify key pixels and features in images from the view of Deep Neural Networks (DNNs). These methods generate both imperceptible perturbations for non-robust DNNs and counterfactual explanations for robust DNNs. We use the general concept of sparsity, overlapping sparsity, to introduce general regularization which covers all modes of sparsity.
Few-Shot Keyword Spotting (FSKWS) This project involved recognizing a specific keyword with very limited training data. This approach allows us to change the target keyword with as few as five samples or fewer. We prepared a rich dataset in the Persian language for pretraining using a prototypical network. Additionally, we implemented streaming capabilities for real-time keyword spoting.
Designing Fast Gradual Sparse Attacks to Deep Neural Networks In this paper, we propose a new algorithm to design fast, sparse attacks to DNNs using proximal-based optimization methods. This algorithm uses ℓ1 norm, ℓ0 norm, and the Smoothly Clipped Absolute Deviation (SCAD) function for sparsity regularization. In addition, it starts with a dense perturbation and gradually makes it sparse using the penalty method.
Speech Enhancement (SE) The goal was to prepare a punctually efficient SE module before the ASR system to enhance noisy input speech so that the resulting ASR embedded outputs become close to the corosponding clean ones. The noise was mostly environmental (both natural and artificial), also reverberation was considered a little among the input speech.
Voice Activity Detection (VAD) This project was about detection of speech and non-speech for each frame of audio. This module is needed to reduce the time complexity and errors of Automatic Speech Recognition (ASR) systems. In our project, We designed VAD with deep learning modules such as CNN, RNN and FNN in such a way that there are fewer parameters and execution time on CPU, and more accuracy compared to recent models.
Tokens Position Detection in Speech This project was about detecting position of tokens in audio according to the encoder of an Automatic Speech Recognition (ASR) system. For this, We found correspondence between the input and the output of the encoder that has a CTC (Connectionist Temporal Classification) layer. indeed, the blank tokens were removed and each relative tokens were merged as their positions.
Designing Low Coherent Measurement Matrix This is my first paper in IEEE SPL. It is about designing low coherent measurement matrix with controlled spectral norm via an efficient approximation of ty ℓ∞ norm. Compressed Sensing (CS) is targeted at reconstructing a signal from a small set of measurements, if the signal is sparse in some domains. In this respect, a low coherent measurement matrix plays an important role. In this letter, an efficient approximation of ℓ∞ norm based on the soft maximum was introduced to design a low coherent measurement matrix with a controllable spectral norm. The proposed approximation, called Logarithm of Sum of Exponential Absolute values (LSEAp), is convex (similar to ℓ∞ norm) and almost smooth. Acordingly, we designed a low coherent measurement matrix with a small spectral norm via minimization of the ℓ∞ norm of the Gram matrix. The resulting problem was not convex but our simulations show that the LSEAp leads to an improved design of the measurement matrix, as compared to current methods.
Attack to Deep Learning Networks This, as my master's degree project, evaluates the robustness of DNNs against a designed attack using pixel-wise or group-wise (structured) perturbations on images (CIFAR-10 and ImageNet). The key challenges include controlling the sparsity of perturbed units and the perturbations intensity. The proposed method implements sparsity and imperceptibility criteria using the Smoothed ℓ0 function and an approximation of ℓ∞ norm, respectively. In this project, the proposed sparse adversarial attacks were developed such that the element-wise perturbations can be converted into either pixel-wise or group perturbations.
Keyword Spotting This project was about recognizing the desired word (keywor spotting) in audio. The network of the project is light, which increased the speed of the application. This network was inspired by residual blocks, and used 1D and 2D audio features to increase the network accuracy and the execution speed. In this project, Google voice command v1 dataset was used for network training and evaluation.
Object Detection and Depth Estimation In this project, I used YOLO network for object detecting and FastDepth that was interduced by facebook as a depth estimator. FastDepth is an encoder-and-decoder model that use mobilenet v2 as a light encoder, and skip connections between encoder layers and decoder layers to reduce the decoder layers. For this project, I used NYU Depth Dataset V2.
Pose Estimation using Convolutional Neural Network In this project, I had to determine the location of the head, torso, and the joints of the hands and feet by giving the image to a convolutional neural network (CNN). Accordingly, I used a version of the LSP dataset and a simplified network in this paper that consists of two parts (Initial stage and Stages). Finally, I managed to learn this network.
Recovery of an Image with Using IMAT and OMP Methods In this project, IMAT (Iterative Method with Adaptive Threshold) and OMP (Orthogonal Matching Pursuit) were used for recovering of an image from its random samples, uniform and non-uniform samples using their sparse domains.
PPM Demodulation Using Non-uniform Sampling and Inverse System Approach: Pulse Position Modulation (PPM) signals are generated at the time of intersection between modulating signals and saw-tooth with constant amplitude and time period. In the PPM signal, information is not at the amplitude, indeed, the distance between the two PPM pulses has information. Two approaches were defined for PPM demodulation. First approach is non-uniform sampling such as:
  • Wiley/Marvasti
  • Time-varying
  • Zero-Order-Hold
  • Linear interpolation
  • Voronoi
  • Adaptive Weight Method(ADPW).
And second one is PPM inverse system using an iterative and a Chebyshev Acceleration (CA) methods. In this project I used those methods for PPM demodulation.
Compensating Distortion of Interpolation of 1D Signals Interpolation methods such as Sample-and-hold (S&H), linear, 1-spline and c-spline have distortion. In this project, a modular method and an iterative method were used to compensate for distortion of common interpolators. according to the modular method, the interpolation function is multiplied by 1 +2 [cos(2𝑇𝜋𝑡)+cos(4𝑇𝜋𝑡)+⋯ +cos(2𝑁𝜋)] and then got low-pass filter. The interpolation of a discrete signal is modeled as the output of a linear time-invariant system when the inputs are the discrete samples. Results of simulation:
  • The iterative method achieved the SNR of the interpolation by about 35 dB
  • the modular method achieved the SNR about 15 dB.
Mask Detection Using Neural Networks This was my bachelor's project and the first project that I used Deep Neural Networks, which was about detecting people with masks. For this, I used a practicable Convolutional Neural Network (CNN) and I prepared my own dataset to identify the mask in the frames of a video. Issues were to track the position of a person with a mask in the frames of videos and crop its head that these were addresed with detection of eyes in the frames.
publications
Alireza Heshmati; Saman Soleimani Roudi; Sajjad Amini; Shahrokh Ghaemmaghami; Farokh Marvasti
ELSEVIER Engineering Applications of Artificial Intelligence
Sajjad Amini; Alireza Heshmati; Shahrokh Ghaemmaghami
IEEE Signal Processing Letters
Alireza Heshmati; Sajjad Amini; Shahrokh Ghaemmaghami; Farokh Marvasti

Easy access to Google Scholar, GitHub and Linkedin