saif sayed - Academia.edu (original) (raw)

Papers by saif sayed

arXiv (Cornell University), Jul 30, 2021

Feature tracking is the building block of many applications such as visual odometry, augmented re... more Feature tracking is the building block of many applications such as visual odometry, augmented reality, and target tracking. Unfortunately, the state-of-the-art vision based tracking algorithms fail in surgical images due to the challenges imposed by the nature of such environments. In this paper, we proposed a novel and unified deep learning based approach that can learn how to track features reliably as well as learn how to detect such reliable features for the tracking purpose. The proposed network dubbed as Deep-PT, consists of a tracker network which is a convolutional neural network simulating cross correlation in terms of deep learning and two fully connected networks that operate on the output of intermediate layers of the tracker to detect features and predict track-ability of the detected points. The ability to detect features based on the capabilities of the tracker distinguishes the proposed method from previous algorithms used in this area and improves the robustness of the algorithms against dynamics of the scene. The network is trained using multiple datasets due to the lack of specialized dataset for feature tracking datasets and extensive comparisons are conducted to compare the accuracy of Deep-PT against recent pixel tracking algorithms. As the experiments suggest, the proposed deep architecture deliberately learns what to track and how to track and outperforms the state-of-the-art methods.

2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017

Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-v... more Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a highquality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fullyconnected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for realtime applications.

arXiv (Cornell University), Jul 30, 2021

2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017