On the basis of the Scale Invariant Feature Transform (SIFT) feature, we research the distance measure in the process of image resizing. Through extracting SIFT features from the original image and the resized one, respectively, we match the SIFT features between two images, and calculate the distance for SIFT feature vectors to evaluate the degree of similarity between the original and the resized image. On the basis of the Euclidean distance measure, an effective image resizing algorithm combining Seam Carving with Scaling is proposed. We first resize an image using Seam Carving, and calculate the similarity distance between the original image and its resized one. Before the salient object and content are damaged obviously, we stop Seam Carving and transfer residual task to Scaling. Experiments show that our algorithm is able to avoid the damage and distortion of image content and preserve both the local structure and the global visual effect of the image graciously.
Keywords:image resizing; similarity measure; SIFT feature; Seam Carving; Scaling.
With the rapid development of multimedia techniques, different media networks, such as internet, telecom, and digital TV networks, have been interconnected tightly. Media data need transferring among these networks and displaying with different resolution or aspect ratios in various display devices. So, image and video retargeting have become a research venue in computer vision and graphics field.
Due to lack of considering the structure and the feature distribution of images, traditional content-oblivious methods have clear drawbacks, e.g., if the aspect ratio is changed greatly, an image will bring obvious distortion by Scaling, and Cropping is likely to throw away much important information distributing over the entire image via cutting pixels from the image periphery only. A sophisticated resizing algorithm should be able to maintain the salient and interesting regions intact as much as possible. Recently, content-aware methods such as Seam Carving, non-homogeneous warping, and patch transform were proposed, by considering important content, unimportant region, image construction, or texture, to protect both the global visual effect and local structures of the image. Although these methods could be adopted to resize image fairily, there are still some problems needing to be further solved, such as image damage for the large adjustment amount, inefficient iterative or traverse computation, image distance measure and retargeting effect evaluation, etc.
In existing literatures, the patch-based bidirectional similarity measure (BDS) and its improved version are widely utilized for evaluating similarity between the original image and the retargeted one. However, owing to the computation of patch matching, this kind of manner is quite inefficient. In addition, a user study by Rubinstein et al.  demonstrated that the similarity magnitude by BDS is in low agreement with human subjective perception. In this article, we investigate an approach to distance measure for image resizing based on the image Scale Invariant Feature Transform (SIFT) feature, and present an algorithm, combining Seam Carving with Scaling, which could be used to protect the prominent and important object efficiently. We extract SIFT features from a given image and its resized one, respectively, then match the SIFT features from the given image to the result, and calculate the distance for SIFT features to evaluate the degree of similarity between two images. On the basis of the SIFT feature distance measure, our image resizing algorithm first resizes the original image using Seam Carving step-by-step, and calculates the similarity distance between the original image and the resized one. While the distance value exceeds a threshold, we abandon Seam Carving and transfer residual task to Scaling, such that both the local structures and the global visual effect of the image could be preserved graciously.
In summary, our main contributions are
• Propose an approach to distance measure between an image and its resized version based on SIFT feature;
• Utilize the SIFT distance measure to assess the degree of distortion for the resized images;
• Propose an effective image resizing algorithm, combining Seam Carving with Scaling, which could preserve the salient information and the global visual effect based on the SIFT distance measure.
The rest of this article is organized as follows: Section 2 introduces the background of image resizing. Section 3 shows the distance measure algorithms used in this article. In Section 4, we present an image resizing algorithm combining Seam Carving and Scaling. Section 5 describes the analysis and setting of the threshold. We compare the effects of our method with those of other algorithms and present some discussion in Section 6.
2. Related studies
Up to now, various algorithms have been proposed for image and video retargeting, in which different aspects are taken into account for achieving desired results.
The salient and important information-based algorithm is a kind of popular retargeting method and could be used for preserving the visual consistency of important regions of an image effectively. Zhang et al.  employed shrinkability maps and a random walk model to improve the resizing efficiency with low storage requirements. Roberto et al.  utilized a reduced linear model for image resizing, in which a combination of gradient information with visual saliency maps  is used for evaluating the image energy. Due to the visual saliency map involving information of color, intensity, and orientation, its implementation is not quite well in some scenarios. Huang et al.  proposed a framework for preserving the global structure in images and vector art. Their method formulates the structure preservation as an optimization problem and the accuracy relies on robust structure detection. Wang et al.  presented a Scale-and-Stretch (SNS) warping method, via iteratively computing optimal local scaling factors for each local region and updating a warped image, to resize an image. In some case, some objects might be excessively distorted since the distortion is distributed over all the spatial directions. Based on the conformal energy, Zhang et al.  employed handles to describe original image and minimized quadratic distortion energies to obtain a resized image, but their method cannot guarantee to strictly preserve edges. Guo et al.  constructed a mesh image representation and associated an image saliency into the image mesh, then regarded image structure as constraints for mesh parameterization. Owing to the emphasis of relative scale of salient object, nearby objects may be distorted.
Avidan and Shamir  presented a greedy image resizing method called Seam Carving which pays more attention to the unimportant regions, and can retain important content via removing or duplicating monotonic pixel-wide low-energy seams. But, if low gradient pixels in the required spatial direction have been run out, or interesting objects span the entire image, some interesting objects and important regions would suffer from distortion, the local structure and global layout might be destroyed. On the basis of Seam Carving, Rubinstein et al.  improved it by using graph cut for image and video retargeting. Through utilizing a stream, a path of several pixels width, instead of a seam, Domingues et al.  presented an improved algorithm called Stream Carving to induce an increase in the quality of resized image. Mansfield et al.  proposed a scene carving method, by decomposing the image retargeting procedure into removing image content with minimal distortion and re-arrangement of known objects within the scene to maximize their visibility. Moreover, they introduced the visibility map for pixel removing, casting retargeting as a binary graph labeling problem to improve Seam Carving . Considering the distortion in both spatial and temporal dimensions, Grundmann et al.  presented a discontinuous Seam Carving for video retargeting to process the video frame sequentially and afford great flexibility. Dong et al.  presented a resizing algorithm combining Seam Carving with Scaling. But, their algorithm needs to compute all the possible combinations of resizing amount by Seam Carving and Scaling, respectively, then chooses the best ratio for resizing image. Other similar approaches, which combine Seam Carving with Scaling, resize image by using a modified energy function based on wavelet decomposition , analyzing the cost of the next seam  and the importance value for the minimal seam , etc. By combining the region of interest-based technique with an extended Seam Carving, Kopf et al  proposed a video retargeting algorithm for reducing the distortion of straight lines and avoiding jitter in the target video. Chen and Luo  proposed an approach for modeling dynamic visual attention to detect the focus of interest, by defining visual cubes to determine a proper extent of salient regions for the global optimization. Their algorithm is able to keep the video's isotropic manipulation and the continuous dynamics of visual perception. In addition, patch-based methods [21-24] are also presented for image retargeting or image summarization.
In general, associated with seam removal, some artifacts and warping will be introduced to the resized image. The more the number of removed seams is, the heavier the distortion of resulting image would be. Ideally, an image resizing algorithm should check whether farther seam removal will result in unacceptable distortion. So, a similarity or distance metric could be taken into account for image resizing to evaluate the retargeting effect. Similarity measure between images is an important portion of image analysis, broadly used for image retrieval, quality assessment, and visual tracking. For image summarization, Simakov et al.  proposed a similarity measure method which quantitatively captures the incompleteness and incoherence of the patches between the original image and the resized images. Rubinstein et al.  provided a similarity measure algorithm between images termed bi-directional warping (BDW). It measures the similarity between every row (column) and then takes the maximum alignment error as the distance. Maalouf and Larabi  defined a multi-scale bandelet-based perceptual similarity measure for image retargeting, via measuring the geometric and perceptual similarities between two images to obtain resulting image that contains as much as geometric and perceptual features of the original image.
In literatures [15,21,27], BDS is employed to calculate the distance between the retargeted image and the original, and assess resizing effect. This type of algorithm is time-consuming due to iteration computation. Such limitation becomes a bottleneck applying the technique for interactive use (e.g., for portable devices). Besides, Rubinstein et al.  compared a number of state-of-the-art retargeting methods and conducted a user study, their Subjective Analysis and Objective Analysis revealed that both BDS and BDW show low agreement with human perception, while image descriptors such as SIFT flow  and Earth Mover's Distance  are more suitable than patch-based distances for conveying local permissible changes.
3. Distance measure for resized image
SIFT feature is a type of descriptor of the key point found out from multiple scale spaces of an image. Due to the invariant to image translation, scaling, rotation and robust matching across the affine distortion, change in illumination and addition of noise, the SIFT feature is widely used for image matching and retrieval, pattern recognition, etc. The SIFT descriptor has the ability to robustly capture structural properties of the image; it is more suitable than patch-based distance for conveying local permissible changes in content. The SIFT feature points dominantly distribute across regions where color and texture change, hence, SIFT feature vectors and number would mildly change if the low-energy regions are carved out only, and obviously vary while the silent objects get damaged. Such that we can utilize the SIFT feature to show the deformation of the resized image and the distance from the original.
We extract the SIFT feature, using SIFT algorithm by Lowe , from the original image and its resized one, respectively. According to the vectors and the number of SIFT features, we calculate the distance for a resized one from the original image. There are two manners for this purpose.
3.1. The Euclidean distance between SIFT feature vectors
For the case of dimension reduction, the feature number of resized image will decrease. Calculating the Euclidean distances from source features to target ones is capable of revealing the difference between two images obviously. Along with adjustment amount getting larger, outliers across features will get more, the distance values will become lager. In particular, when the silent objects within the image get damaged, both the feature vectors and number would alter heavily. Inversely, if the distance is computed from target features to source ones, in the process of resizing, features in resized image are usually capable of finding appropriate matched-features from the original. Such that the distance would alter slightly, it is not able to represent the degree of deformation in the resized image sufficiently.
We traverse all the SIFT features in the given image. For each one, we calculate the Euclidean distances to all the SIFT features in the resized image, and record the shortest distance, then sum up all of the shortest distances, and define a mean of the sum as the distance between two images to evaluate the distortion of resized image. We formulate it as
where m is the number of SIFT features of the original image S, n is the number of SIFT features of the resized image T, Sik is the kth element of the ith feature vector of S, tjk is the kth element of the jth feature vector of T. Because D(S,T) denotes an average distance regarding all SIFT features in an original image to target, it is suitable for the case of number variation of detected feature points from various images.
3.2. The percentage of matched SIFT features
According to literature , we extract SIFT features from the original image and the resized one, respectively. For each feature in the source, we calculate and find the shortest and second shortest Euclidean distances to target features. Two features with the shortest distance are defined as matched pair only if the ratio of two distances is less than 0.6. While gaining the number of matched pairs, we can calculate the percentage of matched SIFT features relative to the total number of SIFT features in the original image as follows:
where Nm, Nt indicate the number of matched SIFT feature pairs, the total number of SIFT features in the original image, respectively.
Analogous to the Euclidean distance between SIFT features, the change of the percentage is capable of expressing the degree of distortion as well. The smaller the value of percentage becomes, the larger the distortion within the resized image will be.
Figure 1 is an example of extracting SIFT feature. We extract 186 SIFT features (marked with +) from the original image (a), 116 and 173 from the resized image (b) and (c), respectively.
Figure 1. SIFT feature extraction. (a) Original image (200 × 133). (b) Resized image by scaling (112 × 133). (c) Resized image by Seam Carving (112 × 133).
In order to observe the change of the similarity associated with the process of image resizing, using Seam Carving, we resize a given image from the size 200 × 133 to 112 × 133 step-by-step. For each step, five seams are removed and an interim image is created. Several images of these are shown in Figure 2.
Figure 2. Images resizing by Seam Carving. (a) Original image (200 × 133). (b) Interim image (155 × 133). (c) Interim image (150 × 133). (d) Target image (112 × 133).
We extract the SIFT feature from them and calculate their distance by both manners mentioned above. The change of similarity between the original and the resized images is illustrated in Figure 3. Associated with increment of the amount of adjustment, the Euclidean distance will increase (see Figure 3a) and the percentage of matched SIFT feature pairs will decrease (see Figure 3b); we know the similarity between the original and the resized image will decrease.
Figure 3. Similarity measure associated with removed seam number. (a) Euclidean distance between SIFT feature vectors. (b) Percentage of matched SIFT features.
We notice that SIFT feature points mainly locate at prominent objects or edges (see Figure 1). While prominent objects and important regions begin to be damaged, both the vector and the number of SIFT features would change obviously. However, comparing with the Euclidean distance measure, it is difficult to find a consistent threshold for different images by the percentage of matched SIFT feature pairs. The possible reason is that, for a SIFT feature point pi in the original image, we decide which SIFT feature point as its corresponding matched-point in the other image like this: If the ratio of the distance from the closest neighbor to that of the second closest is less than 0.6, the closest feature point is thought as the matched point of pi. So, the change of percentage is not straightforward with respect to the change of SIFT features. So, in this article, we utilize the Euclidean distance between SIFT feature vectors to proceed the similarity measure for image resizing.
Since the Euclidean distance is defined as a mean value for all the feature points, the magnitude of distance would exceed a certain bound at this critical state for most images. Several images and corresponding curves of the Euclidean distance associated with dimension adjustment are shown in Figure 4. We observe that there is a large slope for the segment with distance value around 0.15-0.2 for most images.
Figure 4. Curves of the Euclidean distance associated with image dimension adjustment. The upper rows are the original images, the lower rows show the corresponding curves of the distance value change.
4. An image resizing algorithm combining Seam Carving with Scaling
On the basis of observation and analysis above, we notice that when prominent objects and important regions begin suffering from damage and distortion, the Euclidean distance will exceed a threshold for most images. In order to preserve the local structure of an image, we should stop using Seam Carving, and transfer to another method for remnant task. In this article, we propose an algorithm, combining Seam Carving with Scaling, for resizing image effectively. For a single directional image resizing, the detail steps of single-directional resizing algorithm are described as follows:
(1) Extract the SIFT feature from a given image;
(2) Resize the image by Seam Carving step-by-step, remove Δn seams each step, extract the SIFT feature from the resized image, and then calculate the Euclidean distance between the original image and the resized one;
(3) Judge whether the distance value exceeds a threshold θ at the ith step. If less than θ, go to step (2) to continue; otherwise, go to step (4);
(4) Stop using Seam Carving and employ Scaling to resize the (i - 1)th step image to the ultimate size directly.
For the change in both the width and height dimensions, the distortion and artifacts are introduced in both directions simultaneously. To a different/stochastic combination of the resizing amount in both directions, no consistent threshold exists for evaluating the deformation. So, based on the idea of a single directional image resizing, we improve and present an algorithm for the case of both directional resizing. The steps of both direction resizing algorithm are described below.
(1) Utilize single-directional resizing algorithm to resize a given image S in the width and height directions, respectively, based on the Euclidean distance of the SIFT feature, we only capture the resizing amount Lw in width and Lh in height up to the (i - 1)th step, in which the magnitude of distance is less than the threshold θ;
(2) By Seam Carving, Lw vertical seams and Lh horizontal seams with minimal energy are removed according to optimal paths, we get the resized image T;
(3)Scale the image T to the ultimate size, such that we can gain a desired image.
According to above steps, we can resize the original to the preferred size.
5. Threshold setting
The threshold θ is a significant parameter. With different values, we would obtain diverse retargeting effects. To get a preferable threshold, we execute the image resizing and conduct a statistic analysis with the Benchmark, RetargetMe , involving 80 images. First, we fix the aspect ratio of the image, and adjust the horizontal dimension to 200, then adopt the L2 norm of the gradient for evaluating the image energy.
For all images in the Benchmark, image resizing is implemented by Seam Carving, each step we carve out five seams for facilitating the estimation of the distortion degree. If the number of removal seams each step is smaller than 5, the increment of distortion is not obvious in every step, and it is difficult to find the crucial step; conversely much greater than 5, it will weaken the slope of the segment corresponding to the state of salient object damaged. Then the Euclidean distance values described in Section 3 is obtained. We compute the differences of distance value between the adjacent computation points of seam removal, and find out the maximum from the differences. Such that we can get two distance values corresponding with the maximum. We focus attention on the large one of two values, since salient object maybe begin to get damaged associated with this distance value Dcritical. We get the Dcritical for each image within the Benchmark.
For different images, the magnitude of Dcritical is various. In our algorithm, we assume that Dcritical values obey the normal distribution. For a given image, in which salient objects or features distribute over the entire region, the magnitude of Dcritical would be probably under bias toward small or large. To improve the precision, we throw off the smallest and the largest 10 numbers and carry through statistic analysis for remained 60 Dcritical values. The data are shown in Table 1.
Table 1. The Dcritical values corresponding to the maximum difference
We can calculate the mean and standard deviation of the data in Table 1: the mean μ = 0.204173 and the standard deviation σ = 0.054302. If we choose 60% area under the bell curve between (μ-xσ) and (μ+xσ), namely the area from (μ-xσ) to +∞ is 80%, x = 0.84 can be obtained by looking up the standard normal distribution table. Then we calculate the lower limit and get (μ-xσ) = 0.15856. It means if the threshold θ is set to 0.15856, based on the probability distribution the 80% images would avoid being damaged. Hence, in this article, we select θ = 0.159 for a 200 × 133 image. According to this approach, we could get θ = 0.133 for a 500 × 332 image.
6. Experiment and discussion
We execute the image resizing in a single direction or in both the width and height. Examples of resizing the image in a single direction, from the size 200 × 133 to 112 × 133, are shown in Figure 5. Experiments show that, for most distributed and concentrative images, salient objects and important regions can be preserved well by our method. We can get almost the same result, comparing with the literature .
Examples of resizing the image in both directions, from the size 200 × 133 to 112 × 74, are shown in Figure 6. We compare with several potential schemes: (1) Employ the single-directional resizing algorithm for removing width directional seams first, then height directional seams, end up with scaling to the ultimate size, called W-H algorithm. (2) Inversely, remove height directional seams first, then width directional seams, called H-W algorithm. (3) Remove height or width directional seams by means of optimal parts described in Section 4.
Figure 6. Image resizing in both directions. (a) Original image. (b) Seam Carving. (c) Scaling. (d) W-H. (e) H-W. (f) Ours.
We find contents in height (width) direction suffer from more damages for the W-H (H-W) algorithm. A possible reason is that structures of image have been changed after a single-directional resizing. In this scenario, the threshold θ is not suitable for the amount of residual direction. However, the proposed algorithm is generally able to get better global visual effect as shown in Figure 6f.
According to the SIFT algorithm, for a given image, extracted vectors and the number of SIFT features will vary with its scale. Like this, there exists a difference threshold for an image with different scale. For images with various resolutions, two approaches could be taken: (1) Resize image by the threshold corresponding to the resolution. Through experiments and statistic analysis, a suitable threshold for that scale could be selected. (2) Normalize the input to a uniform scale, resizing image with the threshold for the uniform scale, then return to the original scale.
For a certain scale, the different retargeting effects could be attained with different thresholds as well (see Figure 7). In fact, it decides the amount of adjustment with Seam Carving. If θ is set to a small value, the number of carved seams may be small, the great mass of resizing work will be realized by Scaling, whereas the great mass of resizing work may be carried out by Seam Carving.
Figure 7. Image resizing with different θ. (a) θ = 0.10. (b) θ = 0.159. (c) θ = 0.25.
In contrast to iteration optimization methods such as BDS, our algorithm is efficient. We employ Seam Carving, at the beginning of resizing, to remove unimportant regions, then Scaling to preserve the salient objects and global visual effect. So, our algorithm is able to achieve the trade-off between the time-cost and retargeting effect. We implement the algorithm on a desktop PC with Core 2 Duo 2.66 GHz CPU 2 GB RAM. To speed up, we adopt the optimization to search the switch point from Seam Carving to Scaling, as well as employ the L1 norm to calculate the SIFT distance. The resizing time-cost varies with the number of SIFT features extracted from the image. In most case, it would take about 1-2 s for resizing an image in a single direction from 200 × 133 to 100 × 133, and about 10-40 s from 500 × 332 to 250 × 332. Figure 8 shows resizing results with several methods. The computation time by our method, respectively, are 63 (top row), 9 (middle row), and 16 (bottom row) s.
Figure 8. Resizing image in a single direction. (a) Original image. (b) Seam Carving. (c) Scaling. (d) Multi-operator. (e) Streaming Video. (f) Ours.
We conducted a user study to evaluate the retargeting effects with different methods. Sixty-one images are randomly chosen from the RetargetMe benchmark and resized to 75% or 50% in a horizontal direction using our algorithm. A comparison is implemented with other six operators, namely Seam Carving (SC), SNS, Shift-maps (SM) , Multi-operator (MULTIOP) , non-homogeneous warping (WARP) , and Streaming Video (SV) . For an original, seven resizing images are afforded, and a participant is able to choose one or several favorable images to casting in terms of one's preference. A total of 30 participants took part in the test, and received votes and ranking of the seven methods are shown in Figure 9. We observe that three operators, namely SV, MULTIOP, and ours, rank better than the others.
Figure 9. Received votes and ranking of seven methods.
In this article, we research an approach to distance measure between an image and its resized version based on the SIFT feature vector. Based on the distance measure, we propose a rapid image resizing algorithm combining Seam Carving with Scaling. The algorithm can avoid the disorder and distortion of image contents and preserve both the important regions and the global visual effect of the original image.
For the future work, we will further research an adaptive-combined resizing algorithm, choosing the best scheme for different scenarios, and searching an adaptive threshold for various resolutions. Moreover, we will research the evaluation method that shows a high agreement with subjective perception for assessing resizing effect by different algorithms.
The authors declare that they have no competing interests.
YF Zhang, SM Hu, RR Martin, Shrinkability maps for content-aware video resizing. Comput Graph Forum 27(7), 1797–1804 (2008). Publisher Full Text
G Roberto, E Ardizzone, R Pirrone, Real-time content-aware image resizing using reduced linear model. Proceedings of 2010 IEEE 17th International Conference on Image Processing, Hong Kong, 2813–2816 (2010) 26-29
L Itti, C Koch, E Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11), 1254–1259 (1998). Publisher Full Text
QX Huang, R Mech, N Carr, Optimizing structure preserving embedded deformation for resizing images and vector art. Comput Graph Forum 28(7), 1887–1896 (2009). Publisher Full Text
GX Zhang, MM Cheng, SM Hu, RR Martin, A shape-preserving approach to image resizing. Comput Graph Forum 28(7), 1897–1906 (2009). Publisher Full Text
S Avidan, A Shamir, Seam carving for content-aware image resizing. ACM Trans Graph 26(3), 10–18 (2007). Publisher Full Text
M Grundmann, V Kwatra, M Han, I Essa, Discontinuous seam-carving for video retargeting. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 569–576 (2010)
JW Han, KS Choi, TS Wang, SH Cheon, SJ Ko, Improved seam carving using a modified energy function based on wavelet decomposition. The 13th IEEE International Symposium on Consumer Electronics (ISCE2009), Kyoto, Japan, 38–41 (2009)
S Kopf, J Kiess, H Lemelson, W Effelsberg, FSCAV: fast seam carving for size adaptation of videos. MM'09: Proceedings of the Seventeen ACM International Conference on Multimedia, New York, NY, USA, 321–330 (2009)
DS Hwang, SY Chien, Content-aware image resizing using perceptual seam carving with human attention model. IEEE International Conference on Multimedia and Expo(ICME2008), Hannover, Germany, 1029–1032 (2008)
S Kopf, T Haenselmann, J Kiess, B Guthier, W Effelsberg, Algorithms for video retargeting. Multimed Tools Appl 51(2), 819–861 (2011). Publisher Full Text
DY Chen, YS Luo, Content-aware video resizing based on salient visual cubes. J Visual Commun Image Represent 22(3), 226–236 (2011). Publisher Full Text
D Simakov, Y Caspi, E Shechtman, M Irani, Summarizing visual data using bidirectional similarity. IEEE Conference on Computer Vision and Pattern Recognition2008 (CVPR 2008), Anchorage, AK, USA, 1–8 (2008)
TS Cho, M Butman, S Avidan, WT Freeman, The patch transform and its applications to image editing. IEEE Conference on Computer Vision and Pattern Recognition 2008 (CVPR 2008), Anchorage, AK, USA, 1–8 (2008)
A Maalouf, MC Larabi, Image retargeting using a bandelet-based similarity measure. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 942–945 (2010)
SG Hua, XX Li, Q Zhong, Similarity criterion for image resizing. EURASIP J Adv Signal Process 2011, 27 (2011). BioMed Central Full Text