Face Verification is a problem whereby we are required to confirm if a pair of images depict the same peron's facial features. This task is widely used in modern day applications like the popular 'Face-unlock' feature in smartphones, document id verification etc. This task can essentially be split into two steps, face classification followed by face verification. Convolutional Neural Networks are the most popular choice while dealing with such tasks, hence ResNet-18 is the chosen architecture here.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). This project attempts to create an end-to-end speech transcription network consisting of encoder-decoder structure equipped with attention mechanism. Levenshtein distance was the evaluation metric used to gauge the performance of the network. This architecture obtains an average Levenshtein distance of 24.3.