Fernando Pereira | Instituto Telecomunicações (original) (raw)
Papers by Fernando Pereira
IEEE Transactions on Circuits and Systems for Video Technology - TCSV, 1997
Lecture Notes in Computer Science, 2006
Nowadays, the heterogeneity of networks, terminals, and users is growing. At the same time, the a... more Nowadays, the heterogeneity of networks, terminals, and users is growing. At the same time, the availability and usage of multimedia content is increasing, which has raised the relevance of content adaptation technologies able to fulfill the needs associated to all usage conditions. For example, mobile displays tend to be too small to allow one to see all the details of an image. This paper presents an innovative method to integrate low-level and semantic visual cues into a unique visual attention map that represents the most interesting contents of an image, allowing the creation of a video sequence that browses through the image displaying its regions of interest in detail. The architecture of the developed adaptation system, the processing solutions and also the principles and reasoning behind the algorithms that have been developed and implemented are presented in this paper. Special emphasis is given to the integration of low-level and semantic visual cues for the maximization of the image to video adapted experience.
Signal Processing: Image Communication, 2002
IEEE Signal Processing Magazine, 2011
IEEE Signal Processing Letters, 2013
IEEE Transactions on Circuits and Systems for Video Technology, 1999
Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)
When working in image and video segmentation, the major objective is to design an algorithm produ... more When working in image and video segmentation, the major objective is to design an algorithm producing the appropriate segmentation results for the particular goals of the application addressed. Therefore, the assessment of the segmentation quality assumes a crucial importance to evaluate the likeliness that the application targets are met. Since no well-established methods for objective segmentation quality evaluation are currently available, this paper's major goal is to propose objective metrics for the evaluation of relative segmentation quality for both individual objects and the overall segmentation partition. The paper presents a methodology for performing objective evaluation of relative segmentation quality, identifies the relevant features to be compared against those of the reference segmentation, and proposes appropriate objective quality metrics. These metrics build on the existing knowledge on segmentation quality evaluation and also on some relevant aspects from the video quality evaluation field.
Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269)
The analysis of video data targeting the identification of relevant objects and the extraction of... more The analysis of video data targeting the identification of relevant objects and the extraction of associated descriptive characteristics will be the enabling factor for a number of multimedia applications. This process has intrinsic difficulties, and since semantic criteria are difficult to express, usually only a part of the desired analysis results can be automatically achieved. For many applications, the automatic tools can be complemented with user guidance to improve performance. This paper proposes an integrated framework for video analysis, addressing the video segmentation and feature extraction problems. The framework includes a set of modules that can be combined following specific application needs. It includes both automatic (more objective) and user interaction (more semantic) analysis modules. The paper also proposes a specific segmentation solution to one of the most relevant application scenarios considered -off-line applications requiring precise segmentation.
Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348)
This paper proposes two DSs to describe the visual information of an AV document. The first one, ... more This paper proposes two DSs to describe the visual information of an AV document. The first one, is devoted to still images. It describes the image visual appearance and its structure with regions as well as its semantic content in terms of objects. The second DS is devoted to video sequences. It describes the sequence structure as well as its semantic content in terms of events. Features such as motion, camera activity, etc. are included in this DS. Moreover, it involves static visual representations such as key-frames, background mosaics and keyregions. These elements are considered as still images and are described by the first DS.
EURASIP Journal on Advances in Signal Processing, 2002
The identification of objects in video sequences, that is, video segmentation, plays a major role... more The identification of objects in video sequences, that is, video segmentation, plays a major role in emerging interactive multimedia services, such as those enabled by the ISO MPEG-4 and MPEG-7 standards. In this context, assessing the adequacy of the identified objects to the application targets, that is, evaluating the segmentation quality, assumes a crucial importance. Video segmentation technology has received considerable attention in the literature, with algorithms being proposed to address various types of applications. However, the segmentation quality performance evaluation of those algorithms is often ad hoc, and a well-established solution is not available. In fact, the field of objective segmentation quality evaluation is still maturing; recently, some more efforts have been made, mainly following the emergence of the MPEG object-based coding and description standards. This paper discusses the problem of objective segmentation quality evaluation in its most difficult scenario: stand-alone evaluation, that is, when a reference segmentation is not available for comparative evaluation. In particular, objective metrics are proposed for the evaluation of stand-alone segmentation quality for both individual objects and overall segmentation partitions.
2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2015
Local descriptors represent a powerful tool, which is exploited in several applications such as v... more Local descriptors represent a powerful tool, which is exploited in several applications such as visual search, object recognition and visual tracking. Real-valued visual descriptors such as SIFT and SURF achieve state-of-the-art accuracy performance for a large set of visual analysis tasks. However, such algorithms are demanding in terms of computational capabilities and bandwidth, being unsuitable for scenarios where resources are constrained. In this context, binary descriptors provide an efficient alternative to real-valued descriptors, due to their low computational complexity, limited memory footprint and fast matching algorithms. In this paper, binary descriptors are used to perform visual tracking of an object along time. The proposed visual tracker performs descriptor matching between consecutive frames, applies filtering techniques to remove undesirable outliers and employs a suitable model to characterize the object appearance. In addition, techniques to code and transmit these description streams are employed, thus reducing the amount of data necessary to transmit to perform accurate object tracking. The efficiency of the proposed visual tracker is evaluated in terms of rate-accuracy, i.e. using the bitrate associated to the compressed binary descriptors and a quantitative metric to assess the accuracy of the visual tracker.
2014 IEEE International Conference on Image Processing (ICIP), 2014
In a visual sensor network, a large number of camera nodes are able to acquire and process image ... more In a visual sensor network, a large number of camera nodes are able to acquire and process image data locally, collaborate with other camera nodes and provide a description about the captured events. Typically, camera nodes have severe constraints in terms of energy, bandwidth resources and processing capabilities. Considering these unique characteristics, coding and transmission of the pixel-level representation of the visual scene must be avoided, due to the energy resources required. A promising approach is to extract at the camera nodes, compact visual features that are coded to meet the bandwidth and power requirements of the underlying network and devices. Since the total number of features extracted from an image may be rather significant, this paper proposes a novel method to select the most relevant features before the actual coding process. The solution relies on a score that estimates the accuracy of each local feature. Then, local features are ranked and only the most relevant features are coded and transmitted. The selected features must maximize the efficiency of the image analysis task but also minimize the required computational and transmission resources. Experimental results show that higher efficiency is achieved when compared to the previous state-of-the-art.
2011 18th IEEE International Conference on Image Processing, 2011
In distributed video coding, motion estimation is typically performed at the decoder to generate ... more In distributed video coding, motion estimation is typically performed at the decoder to generate the side information, increasing the decoder complexity while providing low complexity encoding in comparison with predictive video coding. Motion estimation can be performed once to create the side information or several times to refine the side information quality along the decoding process. In this paper, motion estimation is performed at the decoder side to generate multiple side information hypotheses which are adaptively and dynamically combined, whenever additional decoded information is available. The proposed iterative side information creation algorithm is inspired in video denoising filters and requires some statistics of the virtual channel between each side information hypothesis and the original data. With the proposed denoising algorithm for side information creation, a RD performance gain up to 1.2 dB is obtained for the same bitrate.
Lecture Notes in Computer Science, 2004
It is well known that the text that appears in a video scene or is graphically added to it is an ... more It is well known that the text that appears in a video scene or is graphically added to it is an important source of semantic information for indexing and retrieval, notably in the context of video databases. This paper proposes an improved algorithm for the automatic extraction of text in digital video; its major strengths are its robustness in terms of text skew and its improved performance in dealing with scene text. The system is based on a segmentation approach, using geometrical and spatial analyses for text detection. After, temporal redundancy is exploited to improve the detection performance by means of motion analysis. The output of the text detection step is then directly passed to a standard OCR software package in order to obtain the detected text as ASCII characters.
IEEE Transactions on Circuits and Systems for Video Technology, 1999
IEEE Multimedia, 2002
A n immeasurable amount of multimedia information is available today-in digital archives, on the ... more A n immeasurable amount of multimedia information is available today-in digital archives, on the Web, in broadcast data streams, and in personal and professional databases-and this amount continues to grow. Yet, the value of that information depends on how easily we can manage, find, retrieve, access, and filter it. The transition between two millennia abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information.
Proceedings. International Conference on Image Processing
Although there are several techniques that video encoders may use to improve error resilience, it... more Although there are several techniques that video encoders may use to improve error resilience, it is largely recognized that intra coding refreshment plays a major role. This technique is especially useful for video encoders that rely on predictive (inter) coding to remove temporal redundancy because, in these conditions, the decoded quality can decay very rapidly due to error propagation if errors occur in the transmission or storage of the coded streams. Therefore, in order to avoid error propagation for too long a time, the encoder can use a coding refreshment scheme to refresh the decoding process and stop (spatial and temporal) error propagation. In the context of object-based video coding, the video encoder can apply intra coding refreshment to both the shape and the texture data. In this paper, a shape refreshment need metric is proposed which can be used by object-based video encoders, notably MPEG-4 video encoders, to determine when the shape of a given video object should be refreshed in order to improve the decoded video quality.
2004 International Conference on Image Processing, 2004. ICIP '04.
In this paper, an original motion-based shape error concealment technique, especially useful for ... more In this paper, an original motion-based shape error concealment technique, especially useful for object-based video applications in error-prone environments such as mobile networks, is proposed. It is assumed that the shape of the corrupted object at hand is in the form of a binary alpha plane and some of the shape data is missing due to channel errors. To conceal the corrupted shape, the decoder starts by assuming that the alpha plane changes in consecutive time instants can be described by a global motion model. This way, based on locally estimated global motion parameters, the decoder tries to conceal the corrupted alpha plane by global motion compensating the shape data from the previous time instant. Then, since not all alpha plane changes can be perfectly described by global motion, an additional local motion refinement is applied to deal with areas of the object that have significant motion.
2006 International Conference on Image Processing, 2006
In this paper, a novel shape and texture error concealment technique for segmented object-based v... more In this paper, a novel shape and texture error concealment technique for segmented object-based video scenes is proposed. This technique is different from existing concealment techniques because it considers, not only the corrupted video objects to be concealed, but also the context/scene in which they are inserted. In the proposed technique, concealment is done by using information from the current time instant as well as from the past. The obtained results suggest that the use of this technique significantly improves the subjective visual impact of scenes on the end-user, when compared to independent concealment of video objects.
Proceedings - International Conference on Image Processing, ICIP, 2006
This paper proposes new buffer and video object distortion feedback compensation mechanisms for e... more This paper proposes new buffer and video object distortion feedback compensation mechanisms for efficiently dealing with deviations between the ideal and the actual behavior of video scene encoders when jointly encoding multiple arbitrarily shaped video objects in the context of compliant low-delay object-based MPEG-4 video coding. The proposed solution computes target buffer occupancies for each encoding time instant based on the amount and complexity of the video data to encode, and the bit allocation for each encoding time instant is feedback adjusted according to deviations relatively to this ideal behavior. Additionally, each video object bit allocation is also feedback adjusted based on the relative distortion of the various video objects in the scene. The proposed solution outperforms the non-normative MPEG-4 reference rate control algorithm for a wide range of bit rates and spatio-temporal resolutions, for typical test sequences.
IEEE Transactions on Circuits and Systems for Video Technology - TCSV, 1997
Lecture Notes in Computer Science, 2006
Nowadays, the heterogeneity of networks, terminals, and users is growing. At the same time, the a... more Nowadays, the heterogeneity of networks, terminals, and users is growing. At the same time, the availability and usage of multimedia content is increasing, which has raised the relevance of content adaptation technologies able to fulfill the needs associated to all usage conditions. For example, mobile displays tend to be too small to allow one to see all the details of an image. This paper presents an innovative method to integrate low-level and semantic visual cues into a unique visual attention map that represents the most interesting contents of an image, allowing the creation of a video sequence that browses through the image displaying its regions of interest in detail. The architecture of the developed adaptation system, the processing solutions and also the principles and reasoning behind the algorithms that have been developed and implemented are presented in this paper. Special emphasis is given to the integration of low-level and semantic visual cues for the maximization of the image to video adapted experience.
Signal Processing: Image Communication, 2002
IEEE Signal Processing Magazine, 2011
IEEE Signal Processing Letters, 2013
IEEE Transactions on Circuits and Systems for Video Technology, 1999
Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)
When working in image and video segmentation, the major objective is to design an algorithm produ... more When working in image and video segmentation, the major objective is to design an algorithm producing the appropriate segmentation results for the particular goals of the application addressed. Therefore, the assessment of the segmentation quality assumes a crucial importance to evaluate the likeliness that the application targets are met. Since no well-established methods for objective segmentation quality evaluation are currently available, this paper's major goal is to propose objective metrics for the evaluation of relative segmentation quality for both individual objects and the overall segmentation partition. The paper presents a methodology for performing objective evaluation of relative segmentation quality, identifies the relevant features to be compared against those of the reference segmentation, and proposes appropriate objective quality metrics. These metrics build on the existing knowledge on segmentation quality evaluation and also on some relevant aspects from the video quality evaluation field.
Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269)
The analysis of video data targeting the identification of relevant objects and the extraction of... more The analysis of video data targeting the identification of relevant objects and the extraction of associated descriptive characteristics will be the enabling factor for a number of multimedia applications. This process has intrinsic difficulties, and since semantic criteria are difficult to express, usually only a part of the desired analysis results can be automatically achieved. For many applications, the automatic tools can be complemented with user guidance to improve performance. This paper proposes an integrated framework for video analysis, addressing the video segmentation and feature extraction problems. The framework includes a set of modules that can be combined following specific application needs. It includes both automatic (more objective) and user interaction (more semantic) analysis modules. The paper also proposes a specific segmentation solution to one of the most relevant application scenarios considered -off-line applications requiring precise segmentation.
Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348)
This paper proposes two DSs to describe the visual information of an AV document. The first one, ... more This paper proposes two DSs to describe the visual information of an AV document. The first one, is devoted to still images. It describes the image visual appearance and its structure with regions as well as its semantic content in terms of objects. The second DS is devoted to video sequences. It describes the sequence structure as well as its semantic content in terms of events. Features such as motion, camera activity, etc. are included in this DS. Moreover, it involves static visual representations such as key-frames, background mosaics and keyregions. These elements are considered as still images and are described by the first DS.
EURASIP Journal on Advances in Signal Processing, 2002
The identification of objects in video sequences, that is, video segmentation, plays a major role... more The identification of objects in video sequences, that is, video segmentation, plays a major role in emerging interactive multimedia services, such as those enabled by the ISO MPEG-4 and MPEG-7 standards. In this context, assessing the adequacy of the identified objects to the application targets, that is, evaluating the segmentation quality, assumes a crucial importance. Video segmentation technology has received considerable attention in the literature, with algorithms being proposed to address various types of applications. However, the segmentation quality performance evaluation of those algorithms is often ad hoc, and a well-established solution is not available. In fact, the field of objective segmentation quality evaluation is still maturing; recently, some more efforts have been made, mainly following the emergence of the MPEG object-based coding and description standards. This paper discusses the problem of objective segmentation quality evaluation in its most difficult scenario: stand-alone evaluation, that is, when a reference segmentation is not available for comparative evaluation. In particular, objective metrics are proposed for the evaluation of stand-alone segmentation quality for both individual objects and overall segmentation partitions.
2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2015
Local descriptors represent a powerful tool, which is exploited in several applications such as v... more Local descriptors represent a powerful tool, which is exploited in several applications such as visual search, object recognition and visual tracking. Real-valued visual descriptors such as SIFT and SURF achieve state-of-the-art accuracy performance for a large set of visual analysis tasks. However, such algorithms are demanding in terms of computational capabilities and bandwidth, being unsuitable for scenarios where resources are constrained. In this context, binary descriptors provide an efficient alternative to real-valued descriptors, due to their low computational complexity, limited memory footprint and fast matching algorithms. In this paper, binary descriptors are used to perform visual tracking of an object along time. The proposed visual tracker performs descriptor matching between consecutive frames, applies filtering techniques to remove undesirable outliers and employs a suitable model to characterize the object appearance. In addition, techniques to code and transmit these description streams are employed, thus reducing the amount of data necessary to transmit to perform accurate object tracking. The efficiency of the proposed visual tracker is evaluated in terms of rate-accuracy, i.e. using the bitrate associated to the compressed binary descriptors and a quantitative metric to assess the accuracy of the visual tracker.
2014 IEEE International Conference on Image Processing (ICIP), 2014
In a visual sensor network, a large number of camera nodes are able to acquire and process image ... more In a visual sensor network, a large number of camera nodes are able to acquire and process image data locally, collaborate with other camera nodes and provide a description about the captured events. Typically, camera nodes have severe constraints in terms of energy, bandwidth resources and processing capabilities. Considering these unique characteristics, coding and transmission of the pixel-level representation of the visual scene must be avoided, due to the energy resources required. A promising approach is to extract at the camera nodes, compact visual features that are coded to meet the bandwidth and power requirements of the underlying network and devices. Since the total number of features extracted from an image may be rather significant, this paper proposes a novel method to select the most relevant features before the actual coding process. The solution relies on a score that estimates the accuracy of each local feature. Then, local features are ranked and only the most relevant features are coded and transmitted. The selected features must maximize the efficiency of the image analysis task but also minimize the required computational and transmission resources. Experimental results show that higher efficiency is achieved when compared to the previous state-of-the-art.
2011 18th IEEE International Conference on Image Processing, 2011
In distributed video coding, motion estimation is typically performed at the decoder to generate ... more In distributed video coding, motion estimation is typically performed at the decoder to generate the side information, increasing the decoder complexity while providing low complexity encoding in comparison with predictive video coding. Motion estimation can be performed once to create the side information or several times to refine the side information quality along the decoding process. In this paper, motion estimation is performed at the decoder side to generate multiple side information hypotheses which are adaptively and dynamically combined, whenever additional decoded information is available. The proposed iterative side information creation algorithm is inspired in video denoising filters and requires some statistics of the virtual channel between each side information hypothesis and the original data. With the proposed denoising algorithm for side information creation, a RD performance gain up to 1.2 dB is obtained for the same bitrate.
Lecture Notes in Computer Science, 2004
It is well known that the text that appears in a video scene or is graphically added to it is an ... more It is well known that the text that appears in a video scene or is graphically added to it is an important source of semantic information for indexing and retrieval, notably in the context of video databases. This paper proposes an improved algorithm for the automatic extraction of text in digital video; its major strengths are its robustness in terms of text skew and its improved performance in dealing with scene text. The system is based on a segmentation approach, using geometrical and spatial analyses for text detection. After, temporal redundancy is exploited to improve the detection performance by means of motion analysis. The output of the text detection step is then directly passed to a standard OCR software package in order to obtain the detected text as ASCII characters.
IEEE Transactions on Circuits and Systems for Video Technology, 1999
IEEE Multimedia, 2002
A n immeasurable amount of multimedia information is available today-in digital archives, on the ... more A n immeasurable amount of multimedia information is available today-in digital archives, on the Web, in broadcast data streams, and in personal and professional databases-and this amount continues to grow. Yet, the value of that information depends on how easily we can manage, find, retrieve, access, and filter it. The transition between two millennia abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information.
Proceedings. International Conference on Image Processing
Although there are several techniques that video encoders may use to improve error resilience, it... more Although there are several techniques that video encoders may use to improve error resilience, it is largely recognized that intra coding refreshment plays a major role. This technique is especially useful for video encoders that rely on predictive (inter) coding to remove temporal redundancy because, in these conditions, the decoded quality can decay very rapidly due to error propagation if errors occur in the transmission or storage of the coded streams. Therefore, in order to avoid error propagation for too long a time, the encoder can use a coding refreshment scheme to refresh the decoding process and stop (spatial and temporal) error propagation. In the context of object-based video coding, the video encoder can apply intra coding refreshment to both the shape and the texture data. In this paper, a shape refreshment need metric is proposed which can be used by object-based video encoders, notably MPEG-4 video encoders, to determine when the shape of a given video object should be refreshed in order to improve the decoded video quality.
2004 International Conference on Image Processing, 2004. ICIP '04.
In this paper, an original motion-based shape error concealment technique, especially useful for ... more In this paper, an original motion-based shape error concealment technique, especially useful for object-based video applications in error-prone environments such as mobile networks, is proposed. It is assumed that the shape of the corrupted object at hand is in the form of a binary alpha plane and some of the shape data is missing due to channel errors. To conceal the corrupted shape, the decoder starts by assuming that the alpha plane changes in consecutive time instants can be described by a global motion model. This way, based on locally estimated global motion parameters, the decoder tries to conceal the corrupted alpha plane by global motion compensating the shape data from the previous time instant. Then, since not all alpha plane changes can be perfectly described by global motion, an additional local motion refinement is applied to deal with areas of the object that have significant motion.
2006 International Conference on Image Processing, 2006
In this paper, a novel shape and texture error concealment technique for segmented object-based v... more In this paper, a novel shape and texture error concealment technique for segmented object-based video scenes is proposed. This technique is different from existing concealment techniques because it considers, not only the corrupted video objects to be concealed, but also the context/scene in which they are inserted. In the proposed technique, concealment is done by using information from the current time instant as well as from the past. The obtained results suggest that the use of this technique significantly improves the subjective visual impact of scenes on the end-user, when compared to independent concealment of video objects.
Proceedings - International Conference on Image Processing, ICIP, 2006
This paper proposes new buffer and video object distortion feedback compensation mechanisms for e... more This paper proposes new buffer and video object distortion feedback compensation mechanisms for efficiently dealing with deviations between the ideal and the actual behavior of video scene encoders when jointly encoding multiple arbitrarily shaped video objects in the context of compliant low-delay object-based MPEG-4 video coding. The proposed solution computes target buffer occupancies for each encoding time instant based on the amount and complexity of the video data to encode, and the bit allocation for each encoding time instant is feedback adjusted according to deviations relatively to this ideal behavior. Additionally, each video object bit allocation is also feedback adjusted based on the relative distortion of the various video objects in the scene. The proposed solution outperforms the non-normative MPEG-4 reference rate control algorithm for a wide range of bit rates and spatio-temporal resolutions, for typical test sequences.