projects
Designing Pixel-wise and Group-wise Attacks to Deep Neural Networks
This project focuses on
designing pixel-wise and group-wise attacks to identify key pixels and
features in images from the view of Deep Neural Networks (DNNs). These methods generate both
imperceptible perturbations for non-robust DNNs and counterfactual
explanations for robust DNNs. We use the general concept of sparsity,
overlapping sparsity, to introduce general regularization which covers all modes of
sparsity.
Few-Shot Keyword Spotting (FSKWS) This project involved recognizing a specific keyword with very
limited training data. This approach allows us to change the target keyword with as few as five samples or fewer. We prepared a rich dataset
in the Persian language for pretraining using a prototypical network. Additionally, we implemented streaming capabilities
for real-time keyword spoting.
Designing Fast Gradual Sparse Attacks to Deep Neural Networks In this paper,
we propose a new algorithm to design fast, sparse attacks to DNNs using proximal-based optimization methods.
This algorithm uses ℓ1 norm, ℓ0 norm, and the Smoothly Clipped Absolute Deviation (SCAD) function for sparsity
regularization. In addition, it starts with a dense perturbation and gradually makes it sparse using the penalty method.
Speech Enhancement (SE) The goal was to prepare a punctually efficient
SE module before the ASR system to enhance noisy input speech so that the resulting ASR embedded
outputs become close to the corosponding clean ones. The noise was mostly environmental (both natural and artificial),
also reverberation was considered a little among the input speech.
Voice Activity Detection (VAD) This project was about detection of speech and non-speech for each frame of audio. This module is needed
to reduce the time complexity and errors of Automatic Speech Recognition (ASR) systems. In our project, We designed VAD with deep learning modules
such as CNN, RNN and FNN in such a way that there are fewer parameters and execution time on CPU, and more accuracy compared to recent models.
Tokens Position Detection in Speech This project was about detecting position of tokens in audio according to
the encoder of an Automatic Speech Recognition (ASR) system. For this, We found correspondence between the input and the output of the encoder
that has a CTC (Connectionist Temporal Classification) layer. indeed, the blank tokens were removed and each relative tokens were merged as their positions.

Designing Low Coherent Measurement Matrix This is my first paper in IEEE SPL. It is about designing
low coherent measurement matrix with controlled spectral norm via an efficient approximation of ty ℓ∞ norm. Compressed
Sensing (CS) is targeted at reconstructing a signal from a small set of measurements, if the signal is sparse
in some domains. In this respect, a low coherent measurement matrix plays an important role. In this letter, an efficient approximation
of ℓ∞ norm based on the soft maximum was introduced to design a low coherent measurement matrix with a controllable
spectral norm. The proposed approximation, called Logarithm of Sum of Exponential Absolute values (LSEAp), is convex (similar
to ℓ∞ norm) and almost smooth. Acordingly, we designed a low coherent measurement matrix with a small spectral norm via minimization
of the ℓ∞ norm of the Gram matrix. The resulting problem was not convex but our simulations show that the LSEAp leads to
an improved design of the measurement matrix, as compared to current methods.
Attack to Deep Learning Networks
This, as my master's degree project, evaluates the robustness of DNNs against a designed attack using pixel-wise or group-wise
(structured) perturbations on images (CIFAR-10 and ImageNet). The key challenges include controlling the sparsity of perturbed units and
the perturbations intensity. The proposed method implements sparsity and imperceptibility criteria using the Smoothed ℓ0
function and an approximation of ℓ∞ norm, respectively. In this project, the proposed sparse adversarial attacks were developed
such that the element-wise perturbations can be converted into either pixel-wise or group perturbations.
Keyword Spotting
This project was about recognizing the desired word (keywor spotting) in audio.
The network of the project is light, which increased the speed of the application. This network was
inspired by residual blocks, and used 1D and 2D audio features to increase the network accuracy and the execution speed. In this project,
Google voice command v1 dataset was used for network training and evaluation.
Object Detection and Depth Estimation
In this project, I used YOLO network for object detecting and
FastDepth that was
interduced by facebook as a depth estimator. FastDepth
is an encoder-and-decoder model that use mobilenet v2 as a light encoder, and skip connections between encoder layers and decoder layers to reduce
the decoder layers.
For this project, I used NYU Depth Dataset V2.
Pose Estimation using Convolutional Neural Network
In this project, I had to determine the location of the head, torso, and the joints of the hands and feet by giving the image to a convolutional
neural network (CNN). Accordingly,
I used a version of the LSP dataset and a simplified network in
this paper that consists of two
parts
(Initial stage and Stages). Finally, I managed to learn this network.
Recovery of an Image with Using IMAT and OMP Methods In this project, IMAT (Iterative Method with Adaptive
Threshold) and OMP (Orthogonal Matching Pursuit) were used for recovering of
an image from its random samples, uniform and non-uniform samples using their sparse domains.
PPM Demodulation Using Non-uniform Sampling and Inverse System Approach: Pulse Position Modulation (PPM)
signals are generated at the time of intersection between modulating signals and saw-tooth with constant amplitude and time period. In
the PPM signal, information is not at the amplitude, indeed, the distance between the two PPM pulses has information.
Two approaches were defined for PPM demodulation. First approach is non-uniform sampling such as:
- Wiley/Marvasti
- Time-varying
- Zero-Order-Hold
- Linear interpolation
- Voronoi
- Adaptive Weight Method(ADPW).
And second one is PPM inverse system using an iterative and
a Chebyshev Acceleration (CA) methods. In this project I used those methods for PPM demodulation.
Compensating Distortion of Interpolation of 1D Signals Interpolation methods such as
Sample-and-hold (S&H), linear, 1-spline and c-spline
have distortion. In this project, a modular method and an iterative method were used to compensate for distortion of
common interpolators. according to the modular method, the interpolation function is multiplied by 1 +2 [cos(2𝑇𝜋𝑡)+cos(4𝑇𝜋𝑡)+⋯ +cos(2𝑁𝜋)]
and then
got low-pass filter. The interpolation of a discrete signal is modeled as the output of a linear time-invariant system when the inputs are
the discrete samples. Results of simulation:
- The iterative method achieved the SNR of the interpolation by about 35 dB
- the modular method achieved the SNR about 15 dB.
Mask Detection Using Neural Networks This was my bachelor's project and the first project that I used Deep Neural Networks,
which was about detecting people with masks.
For this, I used a practicable Convolutional Neural Network (CNN) and I prepared my own dataset to identify the mask in the frames of a video.
Issues were to track the position of a person with a mask in the frames of videos and crop its head that these were addresed with
detection of eyes in the frames.