Aydin Alatan - Academia.edu (original) (raw)

Papers by Aydin Alatan

Research paper thumbnail of Geospatial object detection using deep networks

In the last decade, deep learning has been drawing a huge interest due to the developments in the... more In the last decade, deep learning has been drawing a huge interest due to the developments in the computational hardware and novel machine learning techniques. This progress also significantly effects satellite image analysis for various objectives, such as disaster and crisis management, forest cover, road mapping, city planning and even military purposes. For all these applications, detection of geospatial objects has crucial importance and some recent object detection techniques are still unexplored to be applied for satellite imagery. In this study, aircraft, building, and ship detection in 4-band remote sensing images by using convolutional neural networks based on popular YOLO network is examined and the accuracy comparison between 4-band and 3-band images are tested. Based on simulation results, it can be concluded that state-of-the-art object detectors can be utilized for geospatial objection detection purposes.

Research paper thumbnail of Estimation of depth fields suitable for video compression based on 3-D structure and motion of objects

Intensity prediction along motion trajectories removes temporal redundancy considerably in video ... more Intensity prediction along motion trajectories removes temporal redundancy considerably in video compression algorithms. In threedimensional (3-D) object-based video coding, both 3-D motion and depth values are required for temporal prediction. The required 3-D motion parameters for each object are found by the correspondence-based Ematrix method. The estimation of the correspondences-two-dimensional (2-D) motion field-between the frames and segmentation of the scene into objects are achieved simultaneously by minimizing a Gibbs energy. The depth field is estimated by jointly minimizing a defined distortion and bitrate criterion using the 3-D motion parameters. The resulting depth field is efficient in the rate-distortion sense. Bit-rate values corresponding to the lossless encoding of the resultant depth fields are obtained using predictive coding; prediction errors are encoded by a Lempel-Ziv algorithm. The results are satisfactory for real-life video scenes.

Research paper thumbnail of Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames

It is possible to index and manage large video archives in a more efficient manner by detecting a... more It is possible to index and manage large video archives in a more efficient manner by detecting and recognizing text within video frames. There are some inherent properties of videotext, such as distinguishing texture, higher contrast against background, and uniform color, making it detectable. By employing these properties, it is possible to detect text regions and binarize the image for character recognition. In this paper, a complete framework for detection and recognition of videotext is presented. The results from Gabor-based texture analysis, contrast-based segmentation and color homogeneity are merged to obtain minimum number of candidate regions before binarization. The performance of the system is tested for its recognition rate for various combinations and it is observed that the results give recognition rates, reasonable for most practical purposes.

Research paper thumbnail of 3DTV: 3D Time-varying Scene Capture Technologies -- A Survey

Advances in image sensors and evolution of digital computation is a strong stimulus for developme... more Advances in image sensors and evolution of digital computation is a strong stimulus for development and implementation of sophisticated methods for capturing, processing and analysis of 3D data from dynamic scenes. Research on perspective time-varying 3D scene capture technologies is important for the upcoming 3DTV displays. Methods such as shape-from-texture, shape-from-shading, shape-from-focus and shape-from-motion extraction can restore 3D shape information from a single camera data. The existing techniques for 3D extraction from single camera video sequences are especially useful for conversion of the already available vast mono-view content to the 3DTV systems. Scene-oriented single camera methods as human face reconstruction and facial motion analysis, body modeling and body motion tracking and motion recognition solve efficiently a variety of tasks. Intensive area of research is 3D multicamera dynamic acquisition and reconstruction with its hardware specifics as calibration ...

Research paper thumbnail of The Visual Object Tracking VOT2017 Challenge Results

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object vi... more The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on shortterm tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website 1 .

Research paper thumbnail of Change Detection for Hyperspectral Images Using Extended Mutual Information and Oversegmentation

2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS)

Research paper thumbnail of Analysis of Airborne LiDAR Point Clouds With Spectral Graph Filtering

IEEE Geoscience and Remote Sensing Letters

Separation of ground and nonground measurements is an essential task in the analysis of light det... more Separation of ground and nonground measurements is an essential task in the analysis of light detection and ranging (LiDAR) point clouds; however, it is challenge to implement a LiDAR filtering algorithm that integrates the mathematical definition of various landforms. In this letter, we propose a novel LiDAR filtering algorithm that adapts to the irregular structure and 3-D geometry of LiDAR point clouds. We exploit weighted graph representations to analyze the 3-D point cloud on its original domain. Then, we consider airborne LiDAR data as an irregular elevation signal residing on graph vertices. Based on a spectral graph approach, we introduce a new filtering algorithm that distinguishes ground and nonground points in terms of their spectral characteristics. Our complete filtering framework consists of outlier removal, iterative graph signal filtering, and erosion steps. Experimental results indicate that the proposed framework achieves a good accuracy on the scenes with data gaps and classifies the nonground points on bridges and complex shapes satisfactorily, while those are usually not handled well by the state-of-the-art filtering methods.

Research paper thumbnail of The Visual Object Tracking VOT2016 Challenge Results

Lecture Notes in Computer Science, 2016

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object vi... more The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on shortterm tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website 1 .

Research paper thumbnail of Uncertainty Modeling for Efficient Visual Odometry via Inertial Sensors on Mobile Devices

2014 Ieee International Conference on Image Processing, Oct 29, 2014

Most of the mobile applications require efficient and precise computation of the device pose, and... more Most of the mobile applications require efficient and precise computation of the device pose, and almost every mobile device has inertial sensors already equipped together with a camera. This fact makes sensor fusion quite attractive for increasing efficiency during pose tracking. However, the state-of-the-art fusion algorithms have a major shortcoming: lack of well-defined uncertainty introduced to the system during the prediction stage of the fusion filters. Such a drawback results in determining covariances heuristically, and hence, requirement for data-dependent tuning to achieve high performance or even convergence of these filters. In this paper, we propose an inertially-aided visual odometry system that requires neither heuristics nor parameter tuning; computation of the required uncertainties on all the estimated variables are obtained after minimum number of assumptions. Moreover, the proposed system simultaneously estimates the metric scale of the pose computed from a monocular image stream. The experimental results indicate that the proposed scale estimation outperforms the state-of-the-art methods, whereas the pose estimation step yields quite acceptable results in real-time on resource constrained systems.

Research paper thumbnail of A complexity-utility framework for modeling decoding complexity towards optimizing subjective quality of video for mobile devices

Proceedings of the 3rd Workshop on Mobile Video Delivery, Oct 25, 2010

Page 1. A Complexity-Utility Framework for Modeling Decoding Complexity Towards Optimizing Subjec... more Page 1. A Complexity-Utility Framework for Modeling Decoding Complexity Towards Optimizing Subjective Quality of Video For Mobile Devices Özgür Deniz Önür Mobilus Ltd. ODTÜ Teknokent Ankara TURKEY +90 5322310184 ozgur@mobilus.com.tr ...

Research paper thumbnail of Video Adaptation for Transmission Channels by Utility Modeling

2005 Ieee International Conference on Multimedia and Expo, 2005

The satisfaction a user gets from watching a video in a resource limited device, can be formulate... more The satisfaction a user gets from watching a video in a resource limited device, can be formulated by Utility Theory. The resulting video adaptation is optimal in the sense that the adapted video maximizes the user satisfaction, which is modeled through subjective tests comprising of 3 independent utility components : crispness, motion smoothness and content visibility. These components are maximized in terms of coding parameters by obtaining a Pareto optimal set. In this manuscript, inclusion of transmission channel capacity into the subjective utility model of user satisfaction is addressed. It is proposed that using the maximum channel capacity as a restriction metric, certain members of the Pareto optimal solution set can be eliminated such that the remaining members are suitable for transmission through the given channel. Once the reduced solution set is obtained, an additional figure of merit can be used to pick a single solution from this set, depending on the application scenario. Figure 1. Delivery of the multimedia content from the media server to mobile terminal A novel approach to obtain the utility function for the problem above, is proposed. The problem is considered as a multiple objective utility formulation. The overall utility function is decomposed into 3 independent components, such that the satisfaction associated with any one of these terms can be considered as independent from every other component. These terms are determined as:

Research paper thumbnail of Watermarking for Image Based Rendering via homography-based virtual camera location estimation

Proceedings Icip International Conference on Image Processing, Oct 1, 2008

The recent advances in Image Based Rendering (IBR) have pioneered freely determining the viewing ... more The recent advances in Image Based Rendering (IBR) have pioneered freely determining the viewing position and angle in a scene from multi-view video. Noting that a person could also record a personal video for this arbitrarily selected view and misuse this content, apparently, copyright and copy protection problems also exist and should be solved for IBR applications. In our recent work [1], we have proposed a watermarking method, which embeds the watermark pattern into every frame of multiview video and extracts this watermark from a rendered image, generated by the nearest-interpolation based light-field rendering (LFR). This paper presents a novel solution for the challenging problem of watermark detection in bilinear interpolation, namely the most attractive and promising interpolation method for LFRbased applications. Moreover, the location of the virtual camera could be completely arbitrary in this new formulation. The results show that the watermark could be extracted successfully for LFR via bilinear interpolation for any imagery camera location and rotation, as long as the visual quality of the rendered image is preserved.

Research paper thumbnail of Compressed Domain MPEG-2 Video Editing with VBV Requirement

Proceedings 2000 International Conference on Image Processing, Feb 1, 2000

A novel method is proposed to achieve efficient MPEG-2 video editing in compressed domain while p... more A novel method is proposed to achieve efficient MPEG-2 video editing in compressed domain while preserving Video Buffer Verifier (VBV) requirements. Different cases are determined, according to the VBV modes of the bitstreams to be concatenated. For each case, the minimum number of zero-stuffing bits or shortest waiting time between two streams is determined analytically, so that the resulting bit-stream is still VBV-compliant. The simulation results show that the proposed method is applicable to any MPEG-2 bit-stream independent of its encoder.

Research paper thumbnail of Block Based Video Data Hiding Using Repeat Accumulate Codes and Forbidden Zone Data Hiding

Proceedings of the 10th Pacific Rim Conference on Multimedia Advances in Multimedia Information Processing, 2009

... 1 Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey 2 ... more ... 1 Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey 2 TÜBİTAK UZAY, Ankara Turkey ersin.esen@uzay.tubitak.gov.tr, alatan@eee.metu.edu.tr Abstract. ... The authors perform 3D interleaving in order to get rid of local burst of errors. ...

Research paper thumbnail of Gender Classification via Gradientfaces

... logoglu, ahmet. saracoglu, ersin. esen}@ uzay. ... 246 KB Loğoğluetal. type of data, classifi... more ... logoglu, ahmet. saracoglu, ersin. esen}@ uzay. ... 246 KB Loğoğluetal. type of data, classification is performed on ie: pixel-based, global feature-based and local feature based. Moghaddam [1] uses Support Vector Machine (SVM) directly on the face images to identify gender. ...

Research paper thumbnail of Efficient Bayesian Track-Before-Detect

Proceedings Icip International Conference on Image Processing, Oct 1, 2006

This paper presents a novel Bayesian recursive track-before-detect (TBD) algorithm for detection ... more This paper presents a novel Bayesian recursive track-before-detect (TBD) algorithm for detection and tracking of dim targets in optical image sequences. The algorithm eliminates the need for storing past observations by recursively incorporating new data acquired through sensor to the existing information. It calculates the likelihood ratio for optimal detection and estimates target state simultaneously. The technique does not require velocity-matched filtering and hence, it is capable of detecting any target moving in any direction. The algorithm is tested with both synthetic and real video sequences, and is shown to be capable of performing sufficiently well for very low signal-to-noise ratio situations. This paper describes a novel Bayesian recursion scheme, which estimates the posterior likelihood ratio and posterior state density based on all past observations. The algorithm detects the presence of a possible target and estimates its position simultaneously. The recursive nature of the proposed solution eliminates the need to store all previous measurements; each received measurement frame is used for updates and discarded. The technique neither assumes constant velocity motion nor requires a filter-bank implementation structure. The algorithm is computationally quite efficient and suitable for any real-time operation. The effectiveness of the proposed algorithm is tested over sequences of synthetic and real data, and it is shown to be capable of detecting targets in very low SNR conditions. 2. BAYESIAN SOLUTION FOR TRACK-BEFORE-DETECT The observations (measurements) acquired through the sensor up to time k can be represented as

Research paper thumbnail of MRF-based planar co-segmentation for depth compression

2014 Ieee International Conference on Image Processing, Oct 1, 2014

Research paper thumbnail of Watermark detection method for broadcasting

Research paper thumbnail of Recognizing Events in an Automated Surveillance System

Lecture Notes in Computer Science, 2006

ABSTRACT Event recognition is probably the ultimate purpose of an automated surveillance system. ... more ABSTRACT Event recognition is probably the ultimate purpose of an automated surveillance system. In this paper, hidden Markov models (HMM) are utilized to recognize the nature of an event occurring in a scene. For this purpose, object trajectories, which are obtained through a successful track, are obtained as a sequence of flow vectors that contain instantaneous velocity and location information. These vectors are clustered by K-means algorithm to obtain a prototype representation. HMMs are trained with sequences obtained from usual motion patterns and abnormality is detected by measuring distances to these models. In order to specify the number of models automatically, a novel approach is proposed which utilizes the clues provided by centroid clustering. Preliminary experimental results are promising for detecting abnormal events.

Research paper thumbnail of Fusing 2D and 3D clues for 3D tracking using visual and range data

Proceedings of the 16th International Conference on Information Fusion, Jul 9, 2013

ABSTRACT 3D tracking of rigid objects is required in many applications, such as robotics or augme... more ABSTRACT 3D tracking of rigid objects is required in many applications, such as robotics or augmented reality (AR). The availability of accurate pose estimates increases reliability in robotic applications and decreases jitter in AR scenarios. Pure vision-based 3D trackers require either manual initializations or offline training stages, whereas trackers relying on pure depth sensors are not suitable for AR applications. In this paper, an automated 3D tracking algorithm, which is based on fusion of vision and depth sensors via Extended Kalman Filter (EKF), which inherits a novel observation weighting method, is proposed. Moreover, novel feature selection and tracking schemes based on intensity and shape index map (SIM) data of 3D point cloud, increases 2D and 3D tracking performance significantly. The proposed method requires neither manual initialization of pose nor offline training, while enabling highly accurate 3D tracking. The accuracy of the proposed method is tested against a number of conventional techniques and superior performance is observed.

Research paper thumbnail of Geospatial object detection using deep networks

In the last decade, deep learning has been drawing a huge interest due to the developments in the... more In the last decade, deep learning has been drawing a huge interest due to the developments in the computational hardware and novel machine learning techniques. This progress also significantly effects satellite image analysis for various objectives, such as disaster and crisis management, forest cover, road mapping, city planning and even military purposes. For all these applications, detection of geospatial objects has crucial importance and some recent object detection techniques are still unexplored to be applied for satellite imagery. In this study, aircraft, building, and ship detection in 4-band remote sensing images by using convolutional neural networks based on popular YOLO network is examined and the accuracy comparison between 4-band and 3-band images are tested. Based on simulation results, it can be concluded that state-of-the-art object detectors can be utilized for geospatial objection detection purposes.

Research paper thumbnail of Estimation of depth fields suitable for video compression based on 3-D structure and motion of objects

Intensity prediction along motion trajectories removes temporal redundancy considerably in video ... more Intensity prediction along motion trajectories removes temporal redundancy considerably in video compression algorithms. In threedimensional (3-D) object-based video coding, both 3-D motion and depth values are required for temporal prediction. The required 3-D motion parameters for each object are found by the correspondence-based Ematrix method. The estimation of the correspondences-two-dimensional (2-D) motion field-between the frames and segmentation of the scene into objects are achieved simultaneously by minimizing a Gibbs energy. The depth field is estimated by jointly minimizing a defined distortion and bitrate criterion using the 3-D motion parameters. The resulting depth field is efficient in the rate-distortion sense. Bit-rate values corresponding to the lossless encoding of the resultant depth fields are obtained using predictive coding; prediction errors are encoded by a Lempel-Ziv algorithm. The results are satisfactory for real-life video scenes.

Research paper thumbnail of Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames

It is possible to index and manage large video archives in a more efficient manner by detecting a... more It is possible to index and manage large video archives in a more efficient manner by detecting and recognizing text within video frames. There are some inherent properties of videotext, such as distinguishing texture, higher contrast against background, and uniform color, making it detectable. By employing these properties, it is possible to detect text regions and binarize the image for character recognition. In this paper, a complete framework for detection and recognition of videotext is presented. The results from Gabor-based texture analysis, contrast-based segmentation and color homogeneity are merged to obtain minimum number of candidate regions before binarization. The performance of the system is tested for its recognition rate for various combinations and it is observed that the results give recognition rates, reasonable for most practical purposes.

Research paper thumbnail of 3DTV: 3D Time-varying Scene Capture Technologies -- A Survey

Advances in image sensors and evolution of digital computation is a strong stimulus for developme... more Advances in image sensors and evolution of digital computation is a strong stimulus for development and implementation of sophisticated methods for capturing, processing and analysis of 3D data from dynamic scenes. Research on perspective time-varying 3D scene capture technologies is important for the upcoming 3DTV displays. Methods such as shape-from-texture, shape-from-shading, shape-from-focus and shape-from-motion extraction can restore 3D shape information from a single camera data. The existing techniques for 3D extraction from single camera video sequences are especially useful for conversion of the already available vast mono-view content to the 3DTV systems. Scene-oriented single camera methods as human face reconstruction and facial motion analysis, body modeling and body motion tracking and motion recognition solve efficiently a variety of tasks. Intensive area of research is 3D multicamera dynamic acquisition and reconstruction with its hardware specifics as calibration ...

Research paper thumbnail of The Visual Object Tracking VOT2017 Challenge Results

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object vi... more The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on shortterm tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website 1 .

Research paper thumbnail of Change Detection for Hyperspectral Images Using Extended Mutual Information and Oversegmentation

2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS)

Research paper thumbnail of Analysis of Airborne LiDAR Point Clouds With Spectral Graph Filtering

IEEE Geoscience and Remote Sensing Letters

Separation of ground and nonground measurements is an essential task in the analysis of light det... more Separation of ground and nonground measurements is an essential task in the analysis of light detection and ranging (LiDAR) point clouds; however, it is challenge to implement a LiDAR filtering algorithm that integrates the mathematical definition of various landforms. In this letter, we propose a novel LiDAR filtering algorithm that adapts to the irregular structure and 3-D geometry of LiDAR point clouds. We exploit weighted graph representations to analyze the 3-D point cloud on its original domain. Then, we consider airborne LiDAR data as an irregular elevation signal residing on graph vertices. Based on a spectral graph approach, we introduce a new filtering algorithm that distinguishes ground and nonground points in terms of their spectral characteristics. Our complete filtering framework consists of outlier removal, iterative graph signal filtering, and erosion steps. Experimental results indicate that the proposed framework achieves a good accuracy on the scenes with data gaps and classifies the nonground points on bridges and complex shapes satisfactorily, while those are usually not handled well by the state-of-the-art filtering methods.

Research paper thumbnail of The Visual Object Tracking VOT2016 Challenge Results

Lecture Notes in Computer Science, 2016

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object vi... more The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on shortterm tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website 1 .

Research paper thumbnail of Uncertainty Modeling for Efficient Visual Odometry via Inertial Sensors on Mobile Devices

2014 Ieee International Conference on Image Processing, Oct 29, 2014

Most of the mobile applications require efficient and precise computation of the device pose, and... more Most of the mobile applications require efficient and precise computation of the device pose, and almost every mobile device has inertial sensors already equipped together with a camera. This fact makes sensor fusion quite attractive for increasing efficiency during pose tracking. However, the state-of-the-art fusion algorithms have a major shortcoming: lack of well-defined uncertainty introduced to the system during the prediction stage of the fusion filters. Such a drawback results in determining covariances heuristically, and hence, requirement for data-dependent tuning to achieve high performance or even convergence of these filters. In this paper, we propose an inertially-aided visual odometry system that requires neither heuristics nor parameter tuning; computation of the required uncertainties on all the estimated variables are obtained after minimum number of assumptions. Moreover, the proposed system simultaneously estimates the metric scale of the pose computed from a monocular image stream. The experimental results indicate that the proposed scale estimation outperforms the state-of-the-art methods, whereas the pose estimation step yields quite acceptable results in real-time on resource constrained systems.

Research paper thumbnail of A complexity-utility framework for modeling decoding complexity towards optimizing subjective quality of video for mobile devices

Proceedings of the 3rd Workshop on Mobile Video Delivery, Oct 25, 2010

Page 1. A Complexity-Utility Framework for Modeling Decoding Complexity Towards Optimizing Subjec... more Page 1. A Complexity-Utility Framework for Modeling Decoding Complexity Towards Optimizing Subjective Quality of Video For Mobile Devices Özgür Deniz Önür Mobilus Ltd. ODTÜ Teknokent Ankara TURKEY +90 5322310184 ozgur@mobilus.com.tr ...

Research paper thumbnail of Video Adaptation for Transmission Channels by Utility Modeling

2005 Ieee International Conference on Multimedia and Expo, 2005

The satisfaction a user gets from watching a video in a resource limited device, can be formulate... more The satisfaction a user gets from watching a video in a resource limited device, can be formulated by Utility Theory. The resulting video adaptation is optimal in the sense that the adapted video maximizes the user satisfaction, which is modeled through subjective tests comprising of 3 independent utility components : crispness, motion smoothness and content visibility. These components are maximized in terms of coding parameters by obtaining a Pareto optimal set. In this manuscript, inclusion of transmission channel capacity into the subjective utility model of user satisfaction is addressed. It is proposed that using the maximum channel capacity as a restriction metric, certain members of the Pareto optimal solution set can be eliminated such that the remaining members are suitable for transmission through the given channel. Once the reduced solution set is obtained, an additional figure of merit can be used to pick a single solution from this set, depending on the application scenario. Figure 1. Delivery of the multimedia content from the media server to mobile terminal A novel approach to obtain the utility function for the problem above, is proposed. The problem is considered as a multiple objective utility formulation. The overall utility function is decomposed into 3 independent components, such that the satisfaction associated with any one of these terms can be considered as independent from every other component. These terms are determined as:

Research paper thumbnail of Watermarking for Image Based Rendering via homography-based virtual camera location estimation

Proceedings Icip International Conference on Image Processing, Oct 1, 2008

The recent advances in Image Based Rendering (IBR) have pioneered freely determining the viewing ... more The recent advances in Image Based Rendering (IBR) have pioneered freely determining the viewing position and angle in a scene from multi-view video. Noting that a person could also record a personal video for this arbitrarily selected view and misuse this content, apparently, copyright and copy protection problems also exist and should be solved for IBR applications. In our recent work [1], we have proposed a watermarking method, which embeds the watermark pattern into every frame of multiview video and extracts this watermark from a rendered image, generated by the nearest-interpolation based light-field rendering (LFR). This paper presents a novel solution for the challenging problem of watermark detection in bilinear interpolation, namely the most attractive and promising interpolation method for LFRbased applications. Moreover, the location of the virtual camera could be completely arbitrary in this new formulation. The results show that the watermark could be extracted successfully for LFR via bilinear interpolation for any imagery camera location and rotation, as long as the visual quality of the rendered image is preserved.

Research paper thumbnail of Compressed Domain MPEG-2 Video Editing with VBV Requirement

Proceedings 2000 International Conference on Image Processing, Feb 1, 2000

A novel method is proposed to achieve efficient MPEG-2 video editing in compressed domain while p... more A novel method is proposed to achieve efficient MPEG-2 video editing in compressed domain while preserving Video Buffer Verifier (VBV) requirements. Different cases are determined, according to the VBV modes of the bitstreams to be concatenated. For each case, the minimum number of zero-stuffing bits or shortest waiting time between two streams is determined analytically, so that the resulting bit-stream is still VBV-compliant. The simulation results show that the proposed method is applicable to any MPEG-2 bit-stream independent of its encoder.

Research paper thumbnail of Block Based Video Data Hiding Using Repeat Accumulate Codes and Forbidden Zone Data Hiding

Proceedings of the 10th Pacific Rim Conference on Multimedia Advances in Multimedia Information Processing, 2009

... 1 Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey 2 ... more ... 1 Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey 2 TÜBİTAK UZAY, Ankara Turkey ersin.esen@uzay.tubitak.gov.tr, alatan@eee.metu.edu.tr Abstract. ... The authors perform 3D interleaving in order to get rid of local burst of errors. ...

Research paper thumbnail of Gender Classification via Gradientfaces

... logoglu, ahmet. saracoglu, ersin. esen}@ uzay. ... 246 KB Loğoğluetal. type of data, classifi... more ... logoglu, ahmet. saracoglu, ersin. esen}@ uzay. ... 246 KB Loğoğluetal. type of data, classification is performed on ie: pixel-based, global feature-based and local feature based. Moghaddam [1] uses Support Vector Machine (SVM) directly on the face images to identify gender. ...

Research paper thumbnail of Efficient Bayesian Track-Before-Detect

Proceedings Icip International Conference on Image Processing, Oct 1, 2006

This paper presents a novel Bayesian recursive track-before-detect (TBD) algorithm for detection ... more This paper presents a novel Bayesian recursive track-before-detect (TBD) algorithm for detection and tracking of dim targets in optical image sequences. The algorithm eliminates the need for storing past observations by recursively incorporating new data acquired through sensor to the existing information. It calculates the likelihood ratio for optimal detection and estimates target state simultaneously. The technique does not require velocity-matched filtering and hence, it is capable of detecting any target moving in any direction. The algorithm is tested with both synthetic and real video sequences, and is shown to be capable of performing sufficiently well for very low signal-to-noise ratio situations. This paper describes a novel Bayesian recursion scheme, which estimates the posterior likelihood ratio and posterior state density based on all past observations. The algorithm detects the presence of a possible target and estimates its position simultaneously. The recursive nature of the proposed solution eliminates the need to store all previous measurements; each received measurement frame is used for updates and discarded. The technique neither assumes constant velocity motion nor requires a filter-bank implementation structure. The algorithm is computationally quite efficient and suitable for any real-time operation. The effectiveness of the proposed algorithm is tested over sequences of synthetic and real data, and it is shown to be capable of detecting targets in very low SNR conditions. 2. BAYESIAN SOLUTION FOR TRACK-BEFORE-DETECT The observations (measurements) acquired through the sensor up to time k can be represented as

Research paper thumbnail of MRF-based planar co-segmentation for depth compression

2014 Ieee International Conference on Image Processing, Oct 1, 2014

Research paper thumbnail of Watermark detection method for broadcasting

Research paper thumbnail of Recognizing Events in an Automated Surveillance System

Lecture Notes in Computer Science, 2006

ABSTRACT Event recognition is probably the ultimate purpose of an automated surveillance system. ... more ABSTRACT Event recognition is probably the ultimate purpose of an automated surveillance system. In this paper, hidden Markov models (HMM) are utilized to recognize the nature of an event occurring in a scene. For this purpose, object trajectories, which are obtained through a successful track, are obtained as a sequence of flow vectors that contain instantaneous velocity and location information. These vectors are clustered by K-means algorithm to obtain a prototype representation. HMMs are trained with sequences obtained from usual motion patterns and abnormality is detected by measuring distances to these models. In order to specify the number of models automatically, a novel approach is proposed which utilizes the clues provided by centroid clustering. Preliminary experimental results are promising for detecting abnormal events.

Research paper thumbnail of Fusing 2D and 3D clues for 3D tracking using visual and range data

Proceedings of the 16th International Conference on Information Fusion, Jul 9, 2013

ABSTRACT 3D tracking of rigid objects is required in many applications, such as robotics or augme... more ABSTRACT 3D tracking of rigid objects is required in many applications, such as robotics or augmented reality (AR). The availability of accurate pose estimates increases reliability in robotic applications and decreases jitter in AR scenarios. Pure vision-based 3D trackers require either manual initializations or offline training stages, whereas trackers relying on pure depth sensors are not suitable for AR applications. In this paper, an automated 3D tracking algorithm, which is based on fusion of vision and depth sensors via Extended Kalman Filter (EKF), which inherits a novel observation weighting method, is proposed. Moreover, novel feature selection and tracking schemes based on intensity and shape index map (SIM) data of 3D point cloud, increases 2D and 3D tracking performance significantly. The proposed method requires neither manual initialization of pose nor offline training, while enabling highly accurate 3D tracking. The accuracy of the proposed method is tested against a number of conventional techniques and superior performance is observed.