Home
        PFC final - Universidad Autónoma de Madrid
         Contents
1.                                                                                                                Figure 4 12  Path between ROIs involving zoom factor change     As told before  the upper left corner of the sampling window follows the interpolated curve   while the lower right corner of the sampling window will implicitly follow a different trajectory in  order to achieve the simulation of zooming  The curve for the lower right corner is not  precalculated  but is computed    in real time  for each sampling window taking into account the  actual point of the upper left corner and that the window is increasing or decreasing also with    parabolic ease in and ease out  Figure 4 11b  between two keyframes     In these cases with zoom factor change  the pixel jump calculated  remember the base speed  calculation in chapter 4 6  is assigned to the curve of the corner that scans the longest distance by  reducing when necessary the pixel distance when interpolating the Catmull Rom curve of the  upper left corner  The distances to be covered by each corner are roughly approximated by the  straight line junction between each two points  so the actual jump differs from the ideal one  This    error in the speed is not noticeable     259    This pixel distance adjustment is essential for the smoothness of the video and is a way of  controlling the zooming speed without having to set up an additional function  Otherwise  in cases  of two close ROIs with noticeable different s
2.                      21    2  Attention model                                    22    3  Optimal Path Generation                          22    4    Video CEN dd                       23    2 4 2 Rapid Serial Visual Presentation     Microsoft Research Asia                      23          Preprocessing the          itis rd   24    2   Optimal Path Generation soe inanici  n   24    3  Dynamic Path Adjusting       5  ettet terree terea posee DNE e eR Sra Roe Y  25   2 4 3 Photo2Video     Microsoft Research Asia                                         25   le Content Analysis         daa   25   2  Story GEMETAUON esed cuia tee bars OLI de trates dE VY            26    SEL Cnr 5              P   26    DAR Ee DL T      SU   27    2 4 4 1 Differences between the Image2Video                                                          27    2 4 4 2 Contributions of this master thesis                     esee   28              RAS   29      CONTENTS       3 T  Definitions dit    EEE T A Ne OR UG   29      A                                 Mee E   30     2 3 Internal data structures  date  31   SN en m   31    SNC A DIE uci PT HET  32   3 3 3 Trajectory structure ut eee           na iP ipsu Eo ee  32    SA ROI specification            dee re                               32    3 5 Zooming and shrinking images                 eeessseeeeeeeeeeeeenenee enne   34     Ze velppmenta                      Oe               37     41 ROLELIGAIZIDH io ii oo ds   37    4 1 1 ROT initialization from 
3.                     Eid   2       Figure 2 11  Microsoft   s Photostory initial dialog     20      2 4 1 Image2Video adaptation system   IST Lisbon    This project  1  has been led at the Instituto de Telecomunicag  es in Lisbon by Professor  Pereira and two of his students  Baltazar and Pinho  The developed application appears to be an    alternative implementation of the described in Liu et al   s article  9   discussed in the next section     The authors divide their Image2Video application in 4 steps as shown in Figure 2 12     Input Image         Bottom up Top down  Saliency Face Attention Text Attention  Attention Model Model Model  Attention Models Integration O            Display Size    Adaptation O    Browsing Path  Generation    User       Preferences    Video  Sequence         Figure 2 12  System architecture    1  Composite Image Attention Model    The objective of this step is to determine the attention objects  which will be classified into  saliencies  faces and text  For this task they build upon previous work done in Pereira s group   using a face detection solution integrating automatic and user assisted tools  14  and automatic text  extraction  15   They consider faces as one of the semantic objects most likely to captivate the    attention and text as a rich font of information that ties the human mind s focus       21      2  Attention model integration    Step 2 is an integration stage  where the authors pretend to create a unique attention map using  
4.          20  0    200440446660                                               33     FIGURE 3 4  COMPARISON OF THE PERFORMANCE      THE DIFFERENT INTERPOLATION METHODS   FROM LEFT TO RIGHT AND UP TO DOWN  WE HAVE THE ORIGINAL IMAGE  THE INTERPOLATED  IMAGE USING NN INTERPOLATION  USING BILINEAR INTERPOLATION AND BICUBIC  INTERPOLATION  THE IMAGES HAVE BEEN GENERATED SHRINKING THE ORIGINAL IMAGE TO A  RESOLUTION OF 50X50 PIXELS AND THEN ZOOMING IN TO A RESOLUTION OF 1200X1200    34      FIGURE 4 1  CLOCKWISE AUTOMATIC VIDEO GENERATION WITHOUT SPECIFICATION OF ROIS     38         iv    FIGURE 4 2  EXAMPLES FOR THE ASPECT RATIO ADAPTATION IN BAD CASES  A amp B  AND IN BETTER  CASES CGE  A A                 Alot cee Poe haa tt a         40      FIGURE 4 3  TWO SIMULATION EXAMPLES OF THE SIMULATED ANNEALING WITH A RANDOM SET    OPDATA POINTS ii da   43    FIGURE 4 4  EXAMPLES OF KEYFRAMES IN A ROIS2 VIDEO SEQUENCE            eee   44    FIGURE 4 5  PIXEL DISTANCE BETWEEN ROIS IN DIFFERENT SIZE PICTURES             eene   46    FIGURE 4 6  INTERPOLATION AND APPROXIMATION            eerte ne  47     FIGURE 4 9  THE EFFECT OF C  EXAMPLE OF BROWSING PATHS WITH DIFFERENT CURVATURE  VALUES  FROM STRAIGHT TO EXAGGERATED INTERPOLATIONS   ALL DONE WITH CATMULL     ROM INTERPOLATION 5  2  e cai iet eei tet et eden EI ete ae          ck adas Pee hdd datan reae dar dd   49    FIGURE 4 10  ARC LENGTH APPROXIMATIONS WITH INCREASING                                                  51   FIGURE 4 1
5.         20r     E 4  k Y     j  l py  0 1 L     0 5 10 15 20 25    7 Information about interpolation can be found under the following links     http   arantxa ii uam es  pedro graficos teoria        http   jungle cpsc ucalgary ca 587 pdf 5 interpolation pdf       http   www cs cmu edu  fp courses graphics asst5 catmullRom pdf        48     Figure 4 8  Local control   Moving one control point only changes the curve over a finite    bound region    The curvature of the interpolation is easily modifiable through a single parameter c   which means the user can select if he wants a strongly curved interpolation or prefers a    more straightened interpolation between the control points  see Figure 4 9         120       100    90r    80r    70r    50r    40L    30r       100r    40     20                    20       120       120    100    80     40h      20r       y 40L                   Figure 4 9  The effect of c  Example of browsing paths with different curvature values  from    straight to exaggerated interpolations   all done with Catmull Rom interpolations     The Catmull Rom interpolations are easily implemented through its geometry matrix    0 1    pos  t t x                     2       2c    3 3 2c    0 Di     0    Pi   c Pi   C          49     25    that can be traduced into the following pseudocode     SFirst we prepare the tangents  For each data_point i  If the data point is the first or last of the array  ml  m2 i  0            ml iJ c   data itl  y data i l  y     m2
6.    A more elaborate interpolation  often used in different kinds of motion simulation  is the    Catmull Rom interpolation  18   19   named after its developers Edwin Catmull and Raphie Rom   Even though it is often called the Catmull Rom spline  it is not really a spline  smooth  piecewise   polynomial curve approximation   because it does not verify the C  property  its second derivative    is not continuous   Instead  it imposes derivative restrictions at the control points  Some of the    features of the Catmull Rom interpolation are     e C  continuity    The specified curve will pass through all of the control points  what is not true for all    e  types of splines   This is desirable for our application  as we want to centre the camera  precisely on the regions of interest  although a small error could be acceptable    e The specified curve is local  i e  moving one control point affects only the interpolated    points in a limited bound region and not all the points  This is not really important for  our tool  as we are dealing with static images converted automatically into video  without user interaction  but would be desirable if for example the user could stop the    video and change the order of the control points or change the regions of interest     Local control of Catmull Rom interpolations                120 r r r r  100       J  rs             sot   3 4          i          gt  60r   e J           Lure       ee    40           P      X o               IN    
7.    Each feature can be computed efficiently  but computing the entire set is completely unfeasible  In  the Viola Jones classifier  a variant of AdaBoost  13   short for Adaptive Boosting  is used to select    the features and to train the classifier     AdaBoost combines many weak classifiers in order to create a strong classifier  Each weak  classifier finds the right answer only a bit better than 50  of the times  almost a random decisor    The final strong classifier is a weighted combination of the weak classifiers  being the weights  distributed initially uniformly and then re weighted more heavily for the incorrect classifications as    shown in Figure 2 5     Incorrect classifications  of the previous step      Face    Non Face       Weak classifier 1 Weak classifier 2 Weak classifier 3    ls    Figure 2 5  1  Initially  uniform weights are distributed through the training examples   2 amp 3  Incorrect classifications are reassigned with more weight  shown as bigger dots    The final classifier is a weighted combination of the weak classifiers    Viola and Jones combine weak classifiers as a filter chain  see Figure 2 6   where each weak  classifier consists of a single feature  The threshold for each filter is set low enough to pass almost  all the face examples  If a weak classifier filters a subwindow  the subwindow is immediately    tagged as    no face        Image  sub   windows    Face  gt  Further  processing    Nota face  gt  Reject sub window       Figure 2 
8.    Speed control is applied to determine the precise camera speed along the trajectory  Real  camera motion will normally include ease in and ease out  that is  the camera will move slower at  the beginning and at the end of the trajectory  To include the ease in and ease out effects  the step  distance on L is not constant and has to be calculated according to some speed function  as shown    in Figure 4 11           a  Smooth ease in amp ease out b  Parabolice ease in   amp ease out with    constant acceleration    Figure 4 11  Typical speed functions for ease in amp ease out camera movement     In the present ROIs2Video application the broken line speed function  see Figure 4 11 b   has  been chosen for simulating a real camera motion  There is no special need to complicate the speed    control function  as the eye will notice no difference and this alternative is sufficiently good     ds    If the user prefers the camera to move at constant speed without ease in and ease out  he has  the possibility of selecting this option when running the program  This option is useful for example  when generating dummy ROIs automatically  as the camera won   t stop at the keyframes and it has    no sense to slow down the camera speed     4 7 5 Zoom control    An additional fact to consider is when movement between two ROIs  that require differently    sized sampling windows and thus zoom factor change along the trajectory as shown in Figure 4 12                                         
9.    a partir de los 15 d  as naturales del replanteo  oficial de las mismas y la definitiva  al a  o de haber ejecutado la provisional  procedi  ndose si no  existe reclamaci  n alguna  a la reclamaci  n de la fianza     18  Si el contratista al efectuar el replanteo  observase alg  n error en el proyecto  deber    comunicarlo en el plazo de quince d  as al Ingeniero Director de obras  pues transcurrido ese plazo  ser   responsable de la exactitud del proyecto     19  El contratista est   obligado a designar una persona responsable que se entender   con el  Ingeniero Director de obras  o con el delegado que   ste designe  para todo relacionado con ella  Al  ser el Ingeniero Director de obras el que interpreta el proyecto  el contratista deber   consultarle  cualquier duda que surja en su realizaci  n     20  Durante la realizaci  n de la obra  se girar  n visitas de inspecci  n por personal  facultativo de la empresa cliente  para hacer las comprobaciones que se crean oportunas  Es      98      obligaci  n del contratista  la conservaci  n de la obra ya ejecutada hasta la recepci  n de la misma   por lo que el deterioro parcial o total de ella  aunque sea por agentes atmosf  ricos u otras causas   deber   ser reparado o reconstruido por su cuenta     21  El contratista  deber   realizar la obra en el plazo mencionado a partir de la fecha del  contrato  incurriendo en multa  por retraso de la ejecuci  n siempre que   ste no sea debido a causas  de fuerza mayor  A la terminaci  
10.   1        data i 1   x data i 1  x             SNow we calculate the interpolated points between the data points  For each data_point i  2 0   For t 0  t  1  t t T  h00 2 t  3 t 41   h01  2 t  r3 t    h10 t   2 t   t   hil t  t      interpolated data jl y h00 data i   10 ml i  hOl data i  y h11 m1   1 j   i                interpolated_data j  x h00 data 10 m1  i  h0l data h11 m1                 14             14  j          Even if the loop increases t linearly in steps of t  see previous pseudocode   the resulting  interpolated data is not equally separated  A camera travelling through this trajectory will not move  at constant speed  Thus  the following step is to reparameterize the obtained interpolated data so the    camera moving through it has the desired speed function     4 7 3 Arc length reparameterization    A simple way to reparameterize the data is the arc length reparameterization  which  precomputes a table of values by running a standard Catmull Rom interpolation with unevenly  separated data  The number of entries in the table will depend on the needed precision in the arc   length calculations  The arc length is the distance of the path walked over the interpolated curve  points  It is approximated through the distance travelled over the straight lines joining the  interpolated points and is therefore a good approximation if the interpolated points have a high    density on the curve  see Figure 4 10       50        Figure 4 10  Arc length approximations wit
11.   21      859 2    C 2  Adaptation process  The CATs have to be delivered with a description of their adaptation capabilities  which will be  used in the decision module together with the usage preferences  terminal capabilities  network    capabilities and content descriptions to select the adequate CAT for the adaptation     Practicas de Indexacion y Acceso  Curso 2007 08  Archive         Multiple Files   CATs   Profile Selection  amp  CATs Media Selection    Usage Preferences    Not Parsed  Media Information                     Clear          Terminal Capabilities Not Parsed        Adapted Media    ome     Output Prefix      Output File   Network Capabilities    Not Parsed  E  Media Information    Not output media created            Clear     Edit   Parse                                   Clear   Edit   Parse                           Clear All                      Figure C 2  Appearance of the graphical user interface built internally in the GTI UAM for  demo purposes of the CAIN framework  On the left side of the window the adaptation  preference files have to be selected  while on the right side the input file and output filename  have to be selected     The usage preferences  terminal and network capabilities have to be delivered in XML files to  allow the work of the decision module  The following are examples of XML files corresponding to    the categories enumerated above       86         lt DIA xsi schemaLocation  urn mpeg mpeg2 1 2003 01 DLA NS  CiuwwiMPEG 21 DIAU
12.   http   opencvlibrary sourceforge net         D 1 CxCore    This library implements the data structures and necessary functions to manage images and    associated data  as well as the linear algebra functions       Structures  These structures go from basic definitions of a point or a rectangle up to an    image structure to load its header and the pixel information     Description    CvPoint   CvPoint2D32f   CvPoint3D32f       CvSize             CvSize2D32f Rectangular dimensions in pixels    Points in 2D or 3D with coordinates in integer or floating point  precision    Rectangular dimensions with offset  CvMat   CvMatND  Different multidimensional and or multichannel matrixes    CvSparseMat    IPL   Intel Image Processing Library     image header  Contains the  IplImage necessary fields for the image description and a pointer to the image  data itself  see chapter 3 3 1 Image structure      Table 6  Structures implemented in CxCore         Dynamic structures  OpenCV also provides a complete set of data storage structures  Each  dynamic structure is delivered with the complete implementation of the insertion  deletion and    extraction functions     Name Description      CvMemStorage Dinamically growing memory storage  that expands automatically as  needed    Growable sequence of elements  Sets Collection of nodes  CvGraph Oriented or unoriented weigted graph       Table 7  Dynamic structures implemented in CxCore      90      For the development of the ROIs2Video applicati
13.   if the ROIs are specified in files      a 0    or     a    was not specified   one ROI file per image  a    sequence of image path   ROI path pairs has to be specified  If the          tag was not indicated     Some execution examples  supposing the executable file is named Image2Video  are     Execution specifying the bitrate  video dimensions  a codec and two images with the    corresponding ROI files     gt  gt  Image2Video  b 5000  w 400  h 200  cod wmvl imagel jpg roisl txt    image2 bmp rois2 txt    Execution specifying video dimensions and generating the ROIs automatically        gt  gt  Image2Video  w 400  h 200  a 1 imagel jpg image2 bmp image3 Jjpg    Wrong execution of the last example  because the order of the tag value pairs and the    image paths are inverted        gt  gt  Image2Video imagel jpg image2 bmp image3 jpg  w 400  h 200  a 1  Wrong execution  if the option     a 1    is activated  it is incorrect to specify ROI files     gt  gt  Image2Video  a 1 imagel jpg roisl txt image2 bmp rois2 txt     82     B Manual ROI annotation tool    The graphical user interface  developed in Java using Swing classes  has been built for demo  purposes and allows drawing the ROIs on the image and generating automatically the file  containing the ROI specifications  If the ROIs want to be loaded from a file it is also possible  The    GUL is a first version and can clearly be improved  Some of its limitations are     The execution finishes after generating a video and has
14.   see Figure 4 1  and generates a clockwise camera movement  This mode  defines internally four dummy ROIs  positioned at  1  1    w 2  1    1  h 2  and  w 2  h 2   where w  is the image width and h the image height  The dummy ROIs have dimensions w 2 x h 2     When running the ROIs2Video application in this mode no ROI sorting will be applied   because the order wants to be maintained as shown in Figure 4 1  Also the interpolated curves  between ROIs will consist of straight lines and the curvature parameter  explained in chapter 4 7 2     Catmull Rom interpolation   eventually selected by the user  will be ignored     This option is a very first approximation to the generation of videos from images without a    previous semantic analysis  This can be applied  for example  for image preview purposes     Another possibility would be a zoom in into the centre of the image  assuming the major part    of the information is confined in the centre        Figure 4 1  Clockwise automatic video generation without specification of ROIs    4 2 Aspect ratio adaptation of the images    The resolution of the output video  i e  width x height pixels  is user configurable  the user can    choose the resolution of the video according to the display size of his device  It should be noticed       38      that if the selected video resolution and the original image do not have the same aspect ratio  it is  necessary to adapt the image to the screen by adding black horizontal or vertical bars  W
15.   such as  the video bitrate  the video resolution and others      87         DIA xsi schemalL ocation  urnmpegmpeg21 2003 01 DIA N5  gt        lt Description xsi type  UsageEnvironmentPropertyType  gt      lt UsageEnvironmentProperty xsi type  TerminalsType  gt      lt Terminal gt      lt TerminalCapability xsi type  CodecCapabiitiesType  gt      lt Decoding xsi type  ImageCapabilitiesType  gt       Format href  urn mpeg mpeg  cs VisualCodingF ormatCS 2001 4  gt    lt mpeg7  Name xml lang  en  gt GIF lt  mpeg7  Name    lt  Format gt    lt  Decoding gt    lt  TerminalCapability gt    lt  Terminal gt    lt  UsageEnvironmentProperty gt    lt  Description gt    lt   DIA gt     b  Terminal capability description  In this example file  the terminal can display GIF files       lt DIA xsi schemaLocation  urn mpeg mpeg2 1 2003 01 DLA NS  Ci  twiMPEG 21 DIA UU sageEnvironment xsd  gt       Description xsi type  UsageEnvironmentPropertyType  gt      lt UsageEnvironmentProperty xsi type  NetworksType  gt      lt Network gt    lt NetworkCharacteristic xsi type  NetworkCapabiityType  maxCapacity  2000000   minGuaranteed  1000000   gt    lt    Network gt    lt  UsageEnvironmentProperty gt    lt   Description gt    lt  DIA gt         Network capability description  The description shows  a network with a high transfer rate capability     The decision module selects the best CAT using a constraints satisfaction and optimization    algorithm  The constraints are distinguished in two classes  
16.  840   25 29  61 335    Table 5  Execution results running the ROIs2Video application under Linux on Lab  computer 1       It can be observed how the execution time is longer     for higher resolution images     compare for example execution 1 with execution 3            the same image with a slower speed factor   for example execution 6 against    execution 7      not necessarily if the image contains more ROIs than another with the same resolution      It depends on the distribution of the ROIs in the image        if the image has been adapted to the video aspect ratio  because the black bars increase    the size of the image  A set of example output videos can be found following the URL     http   www gti ii uam es publications image2video          69      6 Conclusions and future work       6 1 Conclusions    The developed system has generally reached its goals  offering smooth camera simulation for  most of the images  Camera motion is improved significantly when reparameterizing the Catmul     Rom interpolation and adding ease in and ease out at the beginning and between two ROIs     It is important to mention that the application shows better video results  smoother motion   when using higher resolution images as input  On the one hand  the system has been designed and  tested mostly for pictures with decent resolution  On the other hand it has less use to transmode    pictures with very poor resolutions  as they can be visualized directly on a small display     Videos
17.  Image2VideoCAT class    The Java interface has to include a Image2VideoCAT class file that extends the class    CATImplementations and implements its adapt method  needed for the integration of every CAT     To have an overview  the Java adapt method carries out the actions seen in Figure 5 1     1  Before calling the native routines to generate the video  it is necessary to check if the  temporal folder exists and has any temporary files left over from previous erroneous  executions  In case the folder exists  any files in it will be deleted  On the contrary  if  the folder does not exist  it is created  The temporal folder can t be created anywhere in  the file tree  because it could interfere with files from other programs or CATS  It is  created inside the folder where the CAIN uncompresses the  jar package  The path to    this folder has to be transmitted to the native program     2  The native ROIs2Video application is called  This step results in the generation of the    temporal images  the videoframes  that are left in the folder created in step 1   3  The videoframes are converted to a video file using the ffmpegJNI CAT     4  The temporal folder is cleaned and removed      62     Videoframes       Figure 5 1  Image2Video CAT operation    5 2 2 Mandatory XML description file  Image2VideoCAT xml    The XML description file states the actions the corresponding CAT fulfils  In our case  the  shown Image2VideoCAT xml file informs CAIN s decision module that the input f
18.  Matlab to probe if the method was really valid  Besides the  distance  other cost functions have been tried out using the Matlab model   obtaining worse results  For example the sum of the turned angles or  combinations of the sum of angles and the travelled distance has been tried out    as cost functions     The first approximation set the pixel jump on the input image between  frames as a constant value and it had to be modified manually for each  execution  trying out values for the different image resolutions  The  experimental observations led to the final decision of establishing a standard  velocity that adapts automatically to each resolution and leaving open the    possibility for the user of selecting a faster or slower camera motion     The curve interpolation passed from linear to Catmull Rom interpolation   Catmull Rom interpolation was chosen after comparing it with other  interpolating methods and searching a compromise between simplicity and  quality of the interpolated data  The Catmull Rom interpolation was first  simulated in Matlab due to inexperience in the field of curve interpolation and  because the results were rather unknown and had to be tested on a simple    interpreter before programming the algorithms in C     The next step was to reparameterize the data  For reparameterizing the  data and before having found information about the  arc length  reparameterization  a more simple method was used  generating much denser    curves than needed and di
19.  ROI1 gt    lt x gt  988  lt  x gt    lt y gt  454  lt  y gt    lt width gt  347  lt  width gt    lt height gt  433  lt  height gt    lt importance gt  1  lt  importance gt    lt  ROI1 gt    lt ROI2 gt      lt x gt  986  lt  x gt    lt y gt  961  lt  y gt    lt width gt  389  lt  width gt    lt height gt  569  lt  height gt    lt importance gt  1  lt  importance gt    lt  ROI2 gt    lt  opencv_storage gt        When opening the XML file  OpenCV automatically controls the integrity of the XML file and  checks that some necessary nodes are included  The XML files have to start with the declaration of    the xml version and the root node has to be tagged as  lt opencv_storage gt      The next step is to read each ROI and insert it into a CvSeq sequence  where all the ROIs will  be stored and returned  Each ROI is tagged with  lt ROIx gt   where x is an increasing counter  The    data of each ROI is retrieved in two steps     1  The file node containing the requested data is found using the cvGetFileNodeByName  that    returns the map of a particular node    e    2  The ROI data  x  y  width  height  amp  importance  is extracted from the node using the specific  read method  cvReadIntByName or cvReadRealByName depending on the case      4 1 2 Automatic ROI initialization    In the particular case in which there are no ROIs associated to the original image  the  ROIs2Video can generate a simple automatic division of the image  Currently  this process divides  the image in 4 parts
20.  Serial Visual Presentation for the  result output of their system  The Rapid Serial Visual Presentation can be regarded as a type of  video which displays serially the different parts of the image  each for a short period of time and    scrolls between the regions  though it is not saved as a proper video file     The Image2Video system developed at the IST  presented in the previous chapter  clearly is  built upon the ideas presented in these articles  The similarity between both system architectures  results evident when comparing both frameworks  see Figure 2 12 and Figure 2 13   Thus  this    section will only comment briefly some the Rapid Serial Visual Presentation  omitting details     In the articles Xie  Ma and Zhang focus on the description of the browsing path generation and  leave the image modeling stages  attention object detection  apart  The authors distinguish between  the fixation status  in which a particular ROI is exploited  and the shifting status  where the  presentation shifts between one ROI and the next one  The shifting between two ROIs is simulated  by traveling the straight lines that link the attention objects and never exceeding maximal panning    or zooming speeds     e    Input Image    Image  Analysis    Bottom up    Optimal Browsing Pre processing  Path Generation       Figure 2 13  System architecture    1  Preprocessing the ROIs    In order to find the optimal path  it is essential to preprocess the ROIs     splitting attention objects la
21.  Total presupuesto    e Total Presupuesto    Madrid  Septiembre de 2007    El Ingeniero Jefe de Proyecto    Fdo   Fernando Harald Barreiro Megino  Ingeniero Superior de Telecomunicaci  n    2 305 6        16 715 6         95     PLIEGO DE CONDICIONES    Este documento contiene las condiciones legales que guiar  n la realizaci  n  en este  proyecto  de un    Sistema de Adaptaci  n de Im  genes a V  deo    para ser visto en pantallas de baja  resoluci  n  En lo que sigue  se supondr   que el proyecto ha sido encargado por una empresa cliente  a una empresa consultora con la finalidad de realizar dicho sistema  Dicha empresa ha debido  desarrollar una l  nea de investigaci  n con objeto de elaborar el proyecto  Esta l  nea de  investigaci  n  junto con el posterior desarrollo de los programas est   amparada por las condiciones  particulares del siguiente pliego     Supuesto que la utilizaci  n industrial de los m  todos recogidos en el presente proyecto ha  sido decidida por parte de la empresa cliente o de otras  la obra a realizar se regular   por las  siguientes     Condiciones generales    1  La modalidad de contrataci  n ser   el concurso  La adjudicaci  n se har    por tanto  a la  proposici  n m  s favorable sin atender exclusivamente al valor econ  mico  dependiendo de las  mayores garant  as ofrecidas  La empresa que somete el proyecto a concurso se reserva el derecho a  declararlo desierto     2  El montaje y mecanizaci  n completa de los equipos que intervengan ser   
22.  Window Sampling window  Rectangle of pixels copied from the original image  It is the    part of the original image captured by the virtual camera  see Figure 3 1 a        Frame  The sampling window that travels through the original image is resized to the  video dimensions and constitutes a frame of the generated video  see Figure 3 1 b   The    video will show 25 frames per second        Keyframe  Frame of special interest where the camera movement is stopped  For example    that frames corresponding to the ROIs    locations       ROI or attention object  Both terms are used sometimes indistinctively  although the  definition of attention object denotes more information  minimum perceptible time and  size  attention value  etc    A ROI is the spatial region occupied by the attention object  In  this text  both terms are used to designate the regions where most semantic information is  concentrated in the image and where the sampling window has to centre to extract the    keyframes  see Figure 3 1 a         a  ROIs and sampling windows centered on the ROIs     29         b  Frames generated by resizing all the sampling windows to the video   s dimensions  The  frames shown are the keyframes corresponding to the ROIs    Figure 3 1  Examples for sampling windows  ROIs and frames on a picture    3 2 System overview    In this section a general block diagram  see Figure 3 2  of the ROIs2Video algorithm and an  overall description of each point will be presented  The specific ta
23.  detector  The file will be read out and stored into the ROI structures presented in 3 3 2  The file will  be written in XML format  and will have to contain a numbered node  lt ROIx gt  for each ROL  Nested in the  lt ROIx gt  node  the information for the coordinates  x y  for the upper left corner  the    width  height and the relevance of the ROI have to be found like shown in the following example           opencv_storage gt      lt        gt    lt    gt  1  lt     gt    lt y gt  2   ly    lt width gt  3  lt  width gt    lt height gt  4  lt Mheight gt    lt importance gt  1  lt fimportance gt    lt  ROII gt      lt          gt    lt x gt  5  lt  x gt    lt y gt  6   ly     lt width gt  7  lt  width gt    lt height gt  8  lt  height gt    lt importance gt  1  lt fimportance gt    lt        2 gt      lt ROB gt    lt    gt  9  lt  x gt    lt    gt  10   ly     lt width gt  11  lt Awidth gt    lt height gt  12  lt  height gt    lt importance gt  1  lt fimportance gt    lt   ROB gt    lt  opencv_storage gt     The meaning of the first four tokens  x  y  width  and height  is cleared in Figure 3 3     Width       an    Height    Figure 3 3  Specification of a ROI    The meaning of the Importance token is the importance of the ROI and will be explained later     see chapter 4 8        33      The read out of the xml file will be done using the available file storage functions in OpenCV     This is the reason why the root node obligatorily has to be tagged with  lt opencv_storag
24.  different solutions  especially when the set of data points grows  The actual quality    of the solutions is hard to predict     This approach works much better than elementary browsing path generations  for example our  first implementation of jumping to the next unvisited ROI  and generally shows good results   especially in pictures that contain a moderate number of ROIs   lt 15   For pictures with many  attention objects  e g  group photos   the path generated by this algorithm is acceptable in most of    the cases  but sometimes it could be improved     Figure 4 3 shows two different simulations of camera path generation with the Simulated  Annealing algorithm using the same parameters  Each one shows a plot of the cost function  evolution along the iterations and a plot of the obtained path through all the cities  in this case the    ROIs  showing the random character of this algorithm        Cost             1 L  20 40 60 80 100 120        Iterations  0     qd  21 24 413  9 40  2  WZ      A    A  DN ZW  4r 20 2 de 5 d N q       Y N   19 46 67    12  8 E X          a  Simulation example with a relatively bad path found    zd s       200                Cost                         100 120 140  0  a2  49 48  46 46  2 Fi          4 do A7 44  TUB  qtia     a3  8    10 6 7 8        40  L 1 1 1 if 1  5 0 5 10 15 20    b  Simulation example with a good path found    Figure 4 3  Two simulation examples of the Simulated Annealing with a random set of data  points    The advanta
25.  libraries  see Appendix E for more  information   The generated video file can use various video codecs and bitrates  which have to be  specified when invocating the ROIs2Video program  The generated temporal files are deleted    before leaving the program     22877    5 Integration and testing       5 1 Integration of the modules    The lifecycle of the application has followed a protoyped development  In the first phase of the    project a very simple and rather limited ROIs2Video application was developed and afterwards the    single modules where improved and generated individually  The change history of each module of    the ROIs2Video is summed up in Table 2     Module    ROI  initialization    Image  adaptation    Changes    At the beginning  the ROIs were not read out from a file  but generated by  OpenCV   s face detector  As the results of the detector did not satisfy the  expectations  it was changed to manual file annotation and file read out  The  possibility of autogenerating a set of ROIs was added afterwards  answering to    the existing demand in the aceMedia project     At the very first moment  there was no image adaptation  as the video    resolution was fixed and the user could not change it     In the next step  the user was able to decide the video resolution  but really  there was no image adaptation and the user had to generate videos in a similar  aspect ratio as the image  in order to not distort the full image on the video  If  the selected video 
26.  movement  the  camera is not able to fulfil is rotation  Rotation would require additional external annotation  but    could be desirable for example in cases where an angled text was wanted to be viewed horizontal                                               d  Combined pan and zoom    Figure 4 13  Some examples for the defined camera movement    The whole path the virtual camera follows during the entire video is divided into trajectories   Each trajectory starts and ends with a ROI or the whole image in the first and last route  For its  movements  the virtual camera needs the information about the starting and ending window  dimensions of each trajectory  as well as the Catmull Rom curve that joins both sampling windows     The virtual camera will stop at each interpolation point of the curve  copy the pixels inside the    295 2    sampling window  resize the copied pixels through bilinear interpolation to obtain a frame and  store the frame as an image file in a specific temporary folder  These temporary images can be  converted afterwards into a video externally using ffmpeg  The image files are stored in JPEG  format  although they could be written to the hard drive in all the formats allowed by OpenCV   s  libraries  The files will be named increasingly  starting at 0000 jpg  so maximally 10 000 files can  be stored  This number is more than sufficient for the video generated from a single image  where  normally an amount around a thousand files is written  500 for 
27.  non face images and retraining the system     2 3 2 3 Face localization          Figure 2 9  Steps in face localization  Image copied from  7      As seen in Figure 2 9  in order to find the faces with resolution close to 32x36  a multiscaled    pyramid of the image is created  For each image in the pyramid  the Convolutional Face Finder is      18      applied  resulting in a set of face candidates in the original scaled image  Finally a neural filter is  applied to the fine pyramid centered at the candidates and  depending on the percentage of positive    answers each candidate is classified as Face or Non face     2 3 2 4 Detection results       b  6 detected faces  0 not detected faces  1 erroneously detected faces     19               detected faces  0 not detected faces  0 erroneously detected faces    Figure 2 10  Performance of the Convolutional Face Finder    2 4 Existing Image2Video approaches    In this chapter former Image2Video applications will be presented and compared  Principally  there  are three existent approaches  omitting other applications that basically generate video slideshows  adding special effects  soundtracks etc   such as for example Microsoft   s Photostory  see Figure    2 11       E Photo Story 3 for Windows    Import and arrange your pictures       Select the pictures you want to use in your story   and then arrange them in order on the film strip     Learn more about editing your pictures    Import Pictures          V            C         
28.  only the approximate  relative position of the features in a face is    important       The     layers contain classical neural networks and decide the final classification     based on the extracted features in the previous layers        Convolutions    9x9    3x3    Subsampling    Figure 2 8  The Convolutional Face Finder  Image copied from  7      2 3 2 2 Training the parameters    Each layer has trainable coefficients for extracting and classifying the features          104 trainable parameters    5   8 trainable parameters         194 trainable parameters      5  28 trainable parameters    Jae           602 trainable parameters         15 trainable parameters    These parameters are trained with a set of 3702 different face areas  showing faces in  uncontrolled natural environments  The faces are manually annotated to indicate the eyes and  mouth positions and cropped to the 32x36 pixel size of the retina  The faces passed to the retina  deliberately include the borders of the face  because the system is fed with more information and  thus will reduce the false positives  Note that the size of the retina is bigger than the size of the    images in the Viola Jones method  24x24 pixels      The parameters also want to be trained with non face images  what is more difficult  as any  random 32x36 image not containing a face can be used as a non face example  Therefore the  bootstrapping algorithm is used  that trains the system with false positives found in a set of 6422   
29.  speed control    and the local zooming speed control    The output of this step is the video information that is added to the music file  in order to    generate the complete  composed video     2 4 4 Conclusions    2 4 4 1 Differences between the Image2 Video approaches    The presented articles have shown a general insight into the existing Image2Video  applications  As anticipated before  the Photo2Video application seems to be the most advanced  application in image to video transmoding  presenting the most extense prior processing and    explaining in detail the followed algorithms to generate the simulated camera motion     IST s approach does not include striking new features and seems to be an alternative  implementation of Microsofts Rapid Serial Visual Presentation  as far as the articles show   Both  articles present a similar previous semantic analysis of the image  the same preprocessing of the  detected ROIs and finally present a similar browsing path generation  Both articles don t mention  how to generate the simulated camera motion  how they interpolate the curves or how they control  the speed of the movement  This leads to think that they haven t focused their work on these  aspects  but have concentrated on the ROI generation and processing  grouping  splitting      The  Time Based and Amount of Information Based  or Perusing and Skimming mode  video  generations don t appear to be very useful or optimal solutions  as a certain amount of information    ca
30.  the  process  when the temperature is high  more changes will be admitted  while worse changes will be  admitted more seldom when the temperature is cold at the end of the process  The allowance for       uphill    moves saves the method from becoming stuck at local minima     As the trained eye might have noticed  it is important to define the appropriate cooling ratio  If  we define a high rate of temperature decrease  the algorithm will take less time  but probably will  find a relatively bad solution  On the other hand  a very slow cooling schedule will find solutions    near to the optimal with higher probability  but will take more processing time  Therefore it is         http   www tsp gatech edu       http   en wikipedia org wiki Simulated_annealing        41     necessary to find a compromise between computing time and the actual quality of the solution  In  our particular case  the path has to be as short as possible  because a longer path will mean less  comfort in the viewing of the output video  as the simulated camera will move back and forth    through a strange path   The simulated annealing algorithm has a strong random character  marked by   e        initial random path     e The defined exchange function  which in our particular implementation exchanges    fortuitously two nodes of the path   e The acceptation function for unfavourable exchanges     This random character has as a consequence that two executions of the same data set will most  probably have
31.  time frontal faces detector  system running at 15 frames per second on a conventional PC  OpenCV provides the code for  testing this system with a webcam and it works fairly well  although it sometimes detects parts of    the background as faces     The Viola Jones algorithm can be adapted to detect other objects  for example hand detection   which has been implemented at the University of Ottawa changing the training data and haarlike    features     2 3 2 Face detection in aceMedia  the Convolutional Face Finder    Face detection in aceMedia is based on Cristophe Garcia and Manolis Delakis    Neural  Architecture for Fast and Robust Face Detection  7   Using the Convolutional Neural Network   called Convolutional Face Finder in this article  this research line aims to achieve high detection  rates with a low rate of false positives  even in difficult test sets with faces that can be rotated  20    degrees in image plane and turned up  60 degrees     2 3 2 1 Convolutional Neural Network Architecture    The Convolutional Face Finder consists of six layers plus the retina that receives a matrix of  32x36 pixels that wants to be classified as face or non face  see Figure 2 8   The layers are divided    into two alternated C  and S  layers and finally followed by two N  layers       16       The C  layers are convolution layers  responsible for detecting face features       The S  layers follow the     layers and reduce the precision of the position in the feature  map  because
32.  to be run again for generating    another video    It generates only videos using a single image    If a ROI has been drawn incorrectly it is impossible to correct it  only the possibility of    starting drawing all the ROIs again exists     Image2Video     _  Automatic video  C  Simulated annealing        400   Width  l400   Height    Bitrate    Speed      Curvature         Maximal zoom    Codec    mpeg2video   v             Reset ROIs   Save ROIs                   Load Image   livo Cadiz Semana Santa 2007 IMG_0189 jpg          Load Rois    uments and Settings FerditEscritoriottestxml               A           Figure B 1  Appearance of the Graphical User Interface    To generate a video using the GUI the order for proceeding is     1  Load an image clicking on    Load Image     The image will appear on the GUI      83      2  Draw the ROIs on the image  If the user committed a mistake  he can delete all the ROIs  clicking on    Reset ROIs     When finished defining the ROIs  the user has to click the    Save  ROIs  button     3  Alternatively  if the ROIs are already specified in a file  the user can load these clicking on       Load ROIs      4  At any moment  the user can change the parameter settings on the right side of the window     5  The last step  when everything is correct  is to click the    Generate video     button and the    video will be generated      84     C CAIN system overview  21     C 1  Architecture    CAIN   Content Adaptation Integrator     is a mul
33.  with dimensions close to the image resolution show particularly bad results  as the  sampling window is confined to an area not much bigger than itself and is not able to move freely   In these cases  the sampling window will be centred with a high probability exactly on the same  position for more than one ROI and there will be no panning between ROIs  what can lead to    confusion see Figure 6 1         Figure 6 1  The same sampling windows centres the three faces situated at the right part of  the image     The profit of the transmoding is optimized for high resolution pictures and for video  dimensions inferior than the image resolution  It can be said that for these cases the Image2Video    transmoding offers a really visual attractive solution     6 2 Future work    To continue the present work and improve the performance of the ROIs2Video application  further    investigation could be applied to certain points     1  As mentioned before  due to the dependance on Ffmpeg  it is necessary to write read a    high number of temporal images to from hard disk  which slows down significantly the     71     performance of the application  The video coding could be incorporated to the system    so all the data is kept in RAM     2  The high number of image resizings also takes in account a considerable amount of  time  Future investigation could investigate in how to optimize the resizing of the    images     3  Other sorting algorithms and cost functions could be tried out  al
34. 1  TYPICAL SPEED FUNCTIONS FOR EASE IN amp EASE OUT CAMERA MOVEMENT             52    FIGURE 4 12  PATH BETWEEN ROIS INVOLVING ZOOM FACTOR CHANGE                2        53    FIGURE 4 13  SOME EXAMPLES FOR THE DEFINED CAMERA MOVEMENT               eere   55    FIGURE 4 14  SCHEME OF THE CAMERA 6                                                        56    FIGURE 5 1  IMAGE2 VIDEO CAT OPERATION         cccssccessssseceesescececsesaececeeseececsesaececseaueceeneaaeeeensaes   63      FIGURE 6 1  THE SAME SAMPLING WINDOWS CENTRES THE THREE FACES SITUATED AT THE RIGHT                                 55 2 0  2 cnc Irae Repo re led oi eost      eee iea aT eres eo paleas      reed do sea     71     FIGURE 6 2  SCANNING PATHS FOR SPLITTED ROIS  THE RECTANGLE WITH BOLD STROKES  REPRESENTS THE SAMPLING WINDOW                                    72      FIGURE 6 1  LA MISMA VENTANA DE MUESTREO CENTRA LAS TRES CARAS DE LAS PERSONAS  SITUADAS EN LA PARTE DERECHA                                                                     74      FIGURE 6 1  CAMINOS DE ESCANEADO DE ROIS SUBDIVIDIDAS  EL RECT  NGULO NEGRO DE  TRAZADO ANCHO REPRESENTA LA VENTANA DE MUESTREO                                  76         TABLE INDEX    TABLE 10  FUNCTION CLASSIFICATION IN CVAUX   e cececccssecessessececeeeecesseaeececseaaececnseaaeeecseseeeensaas    TABLE 11  FUNCTION CLASSIFICATION IN                             enne enne    TABLE 12  COMPONENTS OF FFMPEG    TABLE 13  MOST IMPORTANT MULTIMEDIA COMPRESSIO
35. 26     19  Michael E  Mortenson   Geometric Modeling   John Wiley  amp  Sons  New York  1985     20  aceMedia project  Content Adaptation Tools Development Tutorial      21  Javier Molina  Jos   Mar  a Mart  nez  Victor Vald  s  Fernando L  pez       Extensibility of Adaptation Capabilities in the CAIN Content Adaptation Engine      1  Internation Conference on Semantic and Digital Media Technologies  December  2006      78      Glossary       JNI  ROI  CAT  CAIN  CME  PDA  GUI    Java Native Interface   Region Of Interest   Content Adaptation Tool  Content Adaptation Integrator  Cross Media Engine   Personal Digital Assistant  Graphical Use Interface     79     Appendices       A Running the application    The invocation parameters of the application are divided in two parts  In first place  it is    possible to specify the desired execution options  by adding tag value pairs  The possible tags are   Video parameters         b    Bitrate of the generated video  Reasonable bounds for the video bitrate are 100  very    low  to 99999  very high           cod    Video codec to be applied in the video coding  Some example codecs that can be    specified using the following strings are   mpeglvideo      mpeg2video        mpeg4      wmvl              w    Video width  It has to be bound between 1 and the image width  although values    close to the limits are not practical            h    Video height  It has to be bound between 1 and the image height  although values close    to t
36. 6  The classifier cascade is a chain of single feature filters    2 3 1 3 Scanning an image    To search the object across the image after the training  a search window scans the image  looking for the object  As the object does not have to be of the same size as the trained examples     the search window  not the image itself  has to be resized and the procedure repeated several times     2 3 1 4 Detection results and general comments    The Viola Jones classifier was used at the beginning of the Image2Video application   s  development in order to have automatical ROI annotation and not have to annotate the ROIs  manually  The face detector does not detect 100  of the faces  especially not when the head is  turned or a part of the face covered by something  What is more annoying for the Image2 Video  application is that the Viola Jones face detector frequently classifies erroneously parts in images as  faces  that really are not  When the simulated camera stops at these parts of the image the viewer    gets confused     The following figures are examples of real executions of the Viola Jones face detector using    the trained data from the file haarcascade_frontalface_alt xml       14         b  2 detected faces  4 not detected faces  0 erroneously detected faces       152              detected faces  0 not detected faces  6 erroneously detected faces    Figure 2 7  Performance of the Viola Jones detector    Just out of curiousity  the Viola Jones seems to be the first real
37. It will be left to the user to establish his preferences and taste  Another    approach could be to group close ROIs and show them together     Large ROIs are reduced in order to fit into the sampling window  Further investigation could  examine the possibility of splitting large ROIs and travelling through them without reducing the    ROIs resolution     4 6 Base speed calculation    The motion speed of the camera is given by the distance  measured in pixels on the original  image  jumped from one frame to the next one  Therefore the curve defining the position of the  upper left corner of the sampling window  which guides the movement of the sampling window as  will be seen in the next section  will contain an array of points each one separated a particular pixel    distance from its neighbours and thus characterizing the movement    s speed pattern     It must be realized that a displacement of i pixels is not the same in pictures of different  resolution  Figure 4 5 shows a possible case  where a camera panning movement  movement in the    xy plane  between to ROIs is simulated in two different size versions of the same image       45         2               Figure 4 5  Pixel distance between ROIs in different size pictures     In this case  the camera would take double time to simulate the panning motion in the bigger  image  because the distance between the attention objects is double as long  In order to solve this  problem  a base speed  which has been experimentally 
38. N FORMATS ACCEPTED IN FFMPEG             vi    1 Introduction       1 1 Motivation    Communication networks  i e  mobile  traditional telephone  and Internet are tending towards a  unique universal network  which wants to be accessed via different client devices and with very  different user preferences  Internet providers need to improve their quality of service by offering  adaptive and customized information and content  in order to stand up to the population boom of    mobile equipment  PDAs  smartphones  mobile phones  as Internet clients     In the foreseeable future the present limitations of mobile devices will get more and more  reduced  as their performance  bandwidth  computational resources  data storage etc  will slowly  converge with the featured performance on desktop computers  and thus becoming more than  sufficient to access Internet and fully profit of its multimedia content  The principal remaining  limitation will then be the screen size  which can not be increased because of mobility and comfort    reasons     The great deal of information on the Internet presented or shared as images has to be adapted to  this limited screen size  Actually  the predominant methods are downsampling  cropping or manual  browsing of pictures  what involves information loss in the two first cases and time consumption in    the latter case     1 2 Objectives    A more recent approach to solving the aforementioned adaptation problem of large images to  mobile displays  most 
39. This library contains experimental and obsolete functions     Operation classification Description                        Eigen Objects  PCA  Functions  Embedded Hidden Markov Models       Oh       Table 10  Function classification in CvAux    D 4 HighGUI    HighGUI is a set of functions to design quick and experimental user interfaces  However  the  library is not intended for end user applications  as it only provides simple methods to display    images or allow some user interaction     The HighGUI library also has functions to manage image files  loading them or writing them to  disk  The video I O functions allow the developer to easily use camera input  but does not include    exhaustive error handling     Operation classification Description    Functions to open windows that present images and  Simple GUI trackbars and functions to listen to mouse or key  events      BMP  JPEG  PNG  TIFF etc     Video capturing from a file or a camera  Utility and system functions         Table 11  Function classification in HighGUI         93      E Ffmpeg    The Ffmpeg library  collection was started by Fabrice Bellard and was named after the MPEG    Moving Pictures Expert Group   video standards group with the prefix ff  for fast forward   The    Ffmpeg software is a command line tool which allows    to convert digital audio and video between various formats    to generate videos from an array of image files    streaming real time video from a TV card    It consists of different com
40. UNIVERSIDAD AUTONOMA DE MADRID  ESCUELA POLITECNICA SUPERIOR    M2 UAN    Escuela  Polit  cnica UNIVERSIDAD  AUTONOMA    Superior DE MADRID    Transformaci  n de im  genes a v  deo    Proyecto de fin de carrera    Fernando Harald Barreiro Megino  Septiembre 2007    Transformaci  n de im  genes a video    Autor  Fernando Harald Barreiro Megino  Tutor  Jos   Maria Martinez Sanchez    Grupo de Tratamiento de Imagenes  Departamento de Ingenieria Informatica  Escuela Polit  cnica Superior  Universidad Aut  noma de Madrid  Septiembre 2007    PROYECTO FIN DE CARRERA    Titulo  Transformaci  n de im  genes a video    Autor  Fernando Harald Barreiro Megino    Tutor  Jos   Maria Martinez Sanchez    Tribunal     Presidente  Jes  s Besc  s Cano    Vocal  Pedro Pascual Broncano    Vocal secretario  Jos   Mar  a Mart  nez S  nchez    Fecha de lectura     Calificaci  n     Keywords       Content Adaptation  Image browsing  Image to Video transmoding  Regions of Interest  ROIs      Information fidelity  Visual attention model  Browsing path  Simulated camera movement    Abstract       This master thesis proposes an image to video adaptation system using the human attention  model  in order to view large images on mobile displays without a significant loss of information   This approach tries to automate the process of scrolling and zooming through an image with a  minimal user interaction by simulating a virtual camera movement through the picture  The process  is automatic and the user inte
41. a     ais        aint etre ts        conde eh ne                        6   B ciTe Er M O  7   Za Visual AU ALLO      suet                                                7   A TORIMNAU OM       525 6                                                8   2 1 2 Composite image attention model                          eese  8   2 2 Other approaches to the adaptation of large images to reduced displays                9   2 2 1 Direct downsamplifo usce bati ons e Une p REI MD PR Med ded  9   AO   s  9   2 2 3 Manual DrOWSING                                  10    2 3 Attention focus detection    ese oder tee ott ie Po Rocca vig lcd   11   2 3 1 The Viola Jones face detection                                11   DL A o ed tue          12    2 3 1 2 AdaBoost machine learning method                            sere   13    2 3 1 3 Scanning AN MA A cea   14    2 3 1 4 Detection results and general comments        14    2 3 2 Face detection in aceMedia  the Convolutional Face Finder                       16   2 3 2 1 Convolutional Neural Network Architecture                  16   20 242 Tranmg the                                               17    AZ Face CAI   18    2 9 2 Detection deret                                        19    2 4 Existing Image2Video approaches              cceeseceesscecseccecseececseecesseeeeeseeeeseeeeaees   20    2 4 1 Image2Video adaptation system   IST     5                                                  21    1  Composite Image Attention                    
42. a  donde una patrulla m  vil  ambulancia  polic  a  bomberos  seguridad    privada  est   provista de un dispositivo m  vil y recibe un v  deo generado a partir de una imagen de      Las regiones de inter  s ROIs se usan en el campo del tratamiento de im  genes para definir los  l  mites de un objeto  En tratamiento de im  genes m  dico el concepto es com  nmente conocido y usado para  medir por ejemplo el tama  o de un tumor  En tratamiento de im  genes no m  dico el est  ndar m  s conocido  es JPEG2000 que incluye mecanismos de anotaci  n de ROIs en una imagen     alta resoluci  n tomada en la zona de los hechos  De esta manera  la patrulla m  vil puede prepararse  para la situaci  n que les espera y  dado el caso  pedir refuerzos o unidades especiales de apoyo  El  sistema de seguridad necesitar  a un m  dulo de identificaci  n de ROIs para los objetos deseados y    pasar esta informaci  n a la aplicaci  n de adaptaci  n Image2Video   1 3 Organizaci  n de la memoria  Esta memoria est   dividida en los siguientes cap  tulos       Capitulo 1  presenta la introducci  n  motivaci  n y objetivos  del proyecto fin de    carrera       Capitulo 2  muestra otros intentos de adaptaci  n de im  genes grandes a pantallas de  baja resoluci  n  as   como los programas existentes de Image2Video  Adem  s se  describir  n los m  dulos de identificaci  n de ROIs  particularmente de caras  que se han    usado durante el desarrollo de la aplicaci  n       Capitulo 3  Presenta la arquitectura 
43. age attempts to generate a story line based on generating    Temporal Structure Layers  It is completed in three steps   1  Photograph selection and grouping  li  Specify leading actor        Advanced Story Generation  where the user is able to interact  undo previous automatic    actions  provide scene titles and impose some other desires     The result of this stage is a group of XML files representing a timeline and the moments each    specific action starts and ends     3  Framing scheme    The framing scheme is divided in Key frame Extraction  Key frame Sequencing and Motion    Generation     The Key frame Extraction defines the origin and destination frames of the simulated camera  movement  in order to generate smooth motions  The authors define different types of frames     classificating them by the area of the picture they include  Full  medium and close up frames    The Key frame Sequencing will establish the order in which these extracted key frames are    presented  for example Full frame Medium frame  gt  Close up frame      Finally  the Motion Generation step is in charge to simulate the virtual camera movement  between the key frames with the principal target of generating a smooth motion  The necessary    controls needed for this task are     Dee    e Panning Motion Trajectories  The trajectories will be generated by cubic interpolating    splines with the smallest maximal curvature     e Speed Control  Determining the average speed control  the local panning
44. alidad de visualizaci  n  4      Por ello  el objetivo principal de este proyecto es dise  ar y desarrollar una herramienta de  adaptaci  n Image2Video que genere un video de salida  que ofrezca a la vez alta fidelidad de  informaci  n y una presentaci  n agradable  La presentaci  n deber   tener en cuenta algunas  preferencias del espectador  que se podr  n fijar durante la ejecuci  n de la herramienta  Un diagrama    b  sico de la aplicaci  n propuesta se muestra en Figure 1 1                  User preferences    Inputimage     j    Image2 Video  Transmoding  Tool    Display limitations       T    y      Figure 1 1  Diagrama b  sico de la adaptaci  n Image2Video    El desarrollo de la herramienta de adaptaci  n Image2Video implica determinar las regiones de  inter  s  ROIs   de una imagen  encontrar el recorrido   ptimo y generar el v  deo final  La aplicaci  n  depender   de la identificaci  n externa de los objetos de atenci  n y se centrar   en la generaci  n del  v  deo  La determinaci  n de regiones de inter  s se puede realizar manualmente usando una interfaz  gr  fica  v  ase Annex B  o autom  ticamente usando cualquier m  dulo de determinaci  n de ROIs     como el Convolutional Face Finder facilitado por aceMedia en su paquete de trabajo WP4   5   6      7       Aunque pueda parecer que la aplicaci  n est   dirigida a prop  sitos de entretenimiento  la  aplicaci  n puede usarse en una gran variedad de entornos  como por ejemplo aplicaciones de  seguridad y vigilanci
45. an be horizontally or    vertically adjacent and have to be the same size  Figure 2 3 a          Three rectangle features  The value of such a feature is the sum of the pixels in the outside    rectangles minus the sum of the pixels in the center rectangle   Figure 2 3 b          Four rectangle features  The value is computed as the difference between diagonal pairs of    rectangles as shown in Figure 2 3 c      EE            a  Two rectangle feature b  Three rectangle feature c  Four rectangle feature          d  Weak classifiers    Figure 2 3  Example rectangle features    The base resolution of the detector is 24 x 24 pixels  which tends to the smallest window that    can be used without losing important information     2x     For the calculation of the rectangle features  an intermediate representation for the image     the    integral image ii     is used     Integral  image                                           ixels   ii x   y4    ii x4  y        ii x  y                 4  J4 2 J2  a  The value of the integral image at point    Calculating the rectangular sum using            the integral image    Figure 2 4  Integral image    Using the integral image  the rectangular sum of pixels can be calculated in four steps  see    Figure 2 4 b       2 3 1 2 AdaBoost machine learning method    Using the rectangular features and a set of positive and negative examples  a classification  function can be learned  There are 160 000 rectangles associated with each image sub window
46. aspect ratio was different from the image aspect ratio  the  image would be distorted changing its aspect ratio in order to fit into the video    screen     A new attempt considered that the images had basically only two aspect  ratios  a vertical of approximately w h 3 4 and a horizontal of approximately  w h 4 3  what is true for most standard images taken with digital cameras   The video also had to be generated with one of both aspect ratios and  in the  case the video had the opposite aspect ratio than the image  the image would  be flipped 90 degrees  This case was wrong for two reasons  the screen  displaying the video also had to be turnable and the adaptation did not  consider all the cases where the image or the video didn   t have the expected    aspect ratios     The final decision was to add the black bars  as explained in chapter 4 2          59      Keyframe  extraction    Sampling  Window  Centring    Optimal path  calculation    Camera  motion speed  control    Curve  interpolation    The keyframe extraction started considering only the ROIs  without  generating the ring sentence  because the camera movement did not consider    zooming and was only able to pan through the image     No remarkable changes made in this module     The optimal path was not calculated at the beginning of the execution  but  at the moment the camera arrived at a ROI the nearest unvisited ROI was    chosen     For implementing the Simulated Annealing  first a model was  programmed in
47. at the body below the face attracts the user   s attention  In our case the specific      A4      type of each ROI is not defined  as the implementation is independent of the ROI types  and  therefore this kind of case is not taken into account  Exceptionally  when the ROI is next to an  image border  the window can   t centre the attention object and will be situated at the border s  of    the image   When centring the ROI in the sampling window  it is necessary to decide   e How to show objects of a size bigger than the display size  in pixels      e Up to which zooming factor small attention objects will be shown in the video  see    section 2 1 2for    minimal perceptible size    concept definition      In the first implementation of the Image2Video CAT  the video displays the defined ROIs in  their original size if the ROI   s size is smaller than the output video resolution  spatial scale 1 1  or  in the biggest possible size if the ROIs    dimensions are greater than the output video resolution     obtaining a ROI which occupies the whole video screen     In the actual version  the user can set the maximal zooming factor he wants to apply to small  ROIs  This way  the presentation size of a ROI is limited either by the maximal zooming factor or  by the video resolution  When setting the zooming factor  the user has to be aware that if the ROI is  zoomed in excessively  the corresponding frames generated through bilinear interpolation will lose  resolution and quality  
48. ay sizes     IEEE  Transactions on Multimedia  Vol  8 No  4  August 2006     11        Chen  X Xie  X  Fan  W Y        HJ  Zhang               Zhou     A visual  attention model for adapting images on small displays     ACM Multimedia Systems  Journal  2003    12  P  Viola  M J  Jones     Robust Real Time Face Detection  International Journal  of Computer Vision     Vol  57  No  2  May 2004  pp  137 154    13  Freund  Schapire     A decision theoretic generalization of on line learning and  an application to boosting     Journal of Computer and System Sciences  no  55  1997    14  Ascenso  Correia  Pereira     A face detection solution integrating automatic and    user assisted tools     Portuguese Conf  on Pattern Recognition  Porto  Portugal  Vol 1   pp 109 116  May 2000     z7        15  Palma  Ascenso  Pereira     Automatic text extraction in digital video based on  motion analysis     Int  Conf  on Image Analysis and Recognition  ICIAR    2004   Porto   Portugal  September 2004    16  Gonzalez     Digital Image Processing     Chapter 2  Digital Image Fundamentals   24 Edition 2002  Prentice Hall     17  V  Cerny     A thermodynamical approach to the travelling salesman problem  an  efficient simulation algorithm     Journal of Optimization Theory and Applications   45 41 51  1985    18  E  Catmull and R  Rom     A class of local interpolating splines  In Computer  Aided Geometric Design     R E Barnhill and R F  Reisenfeld  Eds  Academic Press   New York  1974  pp  317 3
49. ches to the adaptation of large images to  reduced displays    2 2 1 Direct downsampling    Image downsampling clearly results in an important information loss  as the resolution is  reduced excessively in many cases  Downsampled images can be compared to thumbnail images   which are used to recognize an image  but never to view the entire information  as the low    resolution does not allow the viewer to distinguish details     2 2 2 Cropping    There are two different cropping modes  blind and semantic  which differ by analyzing    previously or not the semantic content in the image     The blind cropping approach always takes the central part of the image  cutting off the borders    of the image  where the major part of the information could be concentrated     The semantic cropping based image adaptation  as described in  11   tries to select the part of  the image where most of the information is concentrated  in order to maintain the highest  information fidelity possible  Nevertheless  this strategy assumes that most of the information is    confined to a small part of the image  which is not true for most real images  When trying to adapt     9     the image to a small display  this approach has either to select a very small part of the image or has    to downsample the segment  The result does not seem very optimal     2 2 3 Manual browsing    Manual browsing avoids information loss  but is often annoying for the viewer  as he has to  scroll and zoom through the imag
50. commonly known as Image2Video 1  2  or Photo2Video  3   tries to convert  the input image into a video by simulating a fly through of a virtual camera which will focus on the  regions of interest present in the image  The target of this transmoding  is to maximize the    information fidelity and the user   s experience  4      The main objective of this project is to design and develop an Image2Video transmoding tool with  the purpose of generating an output video  which maximizes the information fidelity at the same  time it offers a pleasant presentation  The presentation should take into account some of the  viewer   s preferences  which s he will be able to introduce before the execution  A basic diagram of    the proposed application is shown in Figure 1 1          Transmoding is refered in the literature as the adaptation that changes the modality    e g   image   video  audio  text  of the original media       Display limitations User preferences    Image2 Video  Transmoding  Tool          Figure 1 1  Basic Image2Video diagram    The development of the Image2Video adaptation tool implies determining the Regions of  Interest     ROIs  in an image  finding the optimal browsing path and generating the final video  The  Image2Video application will rely on the external generation of attention objects and will focus on  the video generation  The determination of the regions of interest can be done manually using a  simple graphical user interface  see Annex B  or automatica
51. consiguiente  el n  mero de  unidades que se consignan en el proyecto o en el presupuesto  no podr   servirle de fundamento  para entablar reclamaciones de ninguna clase  salvo en los casos de rescisi  n     8  Tanto en las certificaciones de obras como en la liquidaci  n final  se abonar  n los    trabajos realizados por el contratista a los precios de ejecuci  n material que figuran en el  presupuesto para cada unidad de la obra      97     9  Si excepcionalmente se hubiera ejecutado alg  n trabajo que no se ajustase a las  condiciones de la contrata pero que sin embargo es admisible a juicio del Ingeniero Director de  Obras  se dar   conocimiento a la Direcci  n  proponiendo a la vez la rebaja de precios que el  Ingeniero estime justa y si la Direcci  n resolviera aceptar la obra  quedar   el contratista obligado a  conformarse con la rebaja acordada     10  Cuando se juzgue necesario emplear materiales o ejecutar obras que no figuren en el  presupuesto de la contrata  se evaluar   su importe a los precios asignados a otras obras     materiales an  logos si los hubiere y cuando no  se discutir  n entre el Ingeniero Director y el  contratista  someti  ndolos a la aprobaci  n de la Direcci  n  Los nuevos precios convenidos por uno  u otro procedimiento  se sujetar  n siempre al establecido en el punto anterior     11  Cuando el contratista  con autorizaci  n del Ingeniero Director de obras  emplee  materiales de calidad m  s elevada o de mayores dimensiones de lo estipulado 
52. d  images will then be converted and coded to video with Ffmpeg libraries   The video  generation will allow certain flexibility in relation to the video characteristics  such as    bitrate  codec or resolution     3 3 Internal data structures    3 3 1 Image structure    The structure used for loading and dealing with an image is the               structure  delivered in    the OpenCV library and that presents following fields           typedef struct _IpliImage       int  int    int    int    nSize     sizeof IplImage       nChannels     Most of OpenCV functions support 1 2 3 or 4  channels      depth     pixel depth in bits          DEPTH 80                   DEPTH 85          DEPTH 16U IPL DEPTH 165     IPL DEPTH 32S  IPL DEPTH 32F and IPL DEPTH 64F  are supported     dataOrder     0   interleaved color channels   1   separate color channels   cvCreatelmage can only create interleaved    images                                                http   ffmpeg mplayerhq hu        za              int origin     int width   nt height   int  imageSize       pe    Q    har  imageData   nt widthStep     p         IplImage                                     0   top left origin         bottom left origin  Windows bitmaps style           image width      pixels      image height in pixels      image data size in bytes    image  gt height image  gt widthStep  in case of interleaved data      pointer to aligned image data      size of aligned image row in bytes             Note  Fields i
53. ddition  the algorithms  of the external attention focus detection programs  used during the development  will    be shortly introduced       Chapter    presents the architecture of the system and analyzes the data and external    file structures       Chapter 4 describes the implemented algorithms and offers some insight to the    decisions  problems and solutions found during the implementation       Chapter 5 covers integration of the system and integration of the application in the    CAIN framework  Testing results of the application       Chapter 6 presents some of the conclusions obtained after the development of the    system and the possible improvements for future work   Additionally there are different appendices     Appendix A  User manual for running the application and execution parameters    Appendix B  Presentation of the graphical user interface   s prototype    Appendix C  Overview of aceMedia   s CAIN system    Appendix D  Description of the OpenCV library      Appendix E  Short description of the Ffmpeg library    1 Introduction       1 1 Motivacion    Las redes de comunicaci  n  por ejemplo de tel  fono m  vil y tel  fono convencional  e Internet  est  n tendiendo hacia una red universal  que quiere ser accedida desde diferentes dispositivos y con  preferencias de usuario distintas  Los proveedores de Internet necesitan mejorar la calidad de  servicio ofreciendo informaci  n y contenido adaptado y personalizado para hacer frente al  crecimiento explosivo 
54. de equipamiento m  vil  PDAs  tel  fonos m  viles inteligentes etc   como    clientes de Internet     En el futuro previsible  las limitaciones de los dispositivos m  viles tender  n a desaparecer  r  pidamente  al mostrar rendimientos  velocidad de transferencia de datos  recursos de  procesamiento y de memoria  cada vez m  s similares a los de ordenadores de sobremesa y siendo  as   m  s que aptos para acceder a Internet y disfrutar del contenido multimedia  La limitaci  n  principal  m  s dif  cil de solventar  ser   el tama  o de pantalla  que no puede crecer por motivos de    comodidad y movilidad     La gran cantidad de informaci  n compartida en Internet en forma de im  genes debe por ello  adaptarse al tama  o reducido de pantalla  Actualmente  los m  todos de adaptaci  n predominantes  son reducir el tama  o de la imagen  downsampling   recortar la imagen  cropping  y visualizaci  n  manual  manual browsing  de las im  genes  lo que supone p  rdida de informaci  n en los dos    primeros casos y consumo de tiempo en el   ltimo caso     1 2 Objetivos    Una soluci  n m  s reciente para realizar la adaptaci  n de im  genes grandes a pantallas de menor  resoluci  n  habitualmente conocida como Image2Video 1  2  o Photo2Video  3   intenta convertir  la imagen de entrada en un v  deo simulando un recorrido de c  mara virtual que se centra en las  regiones de inter  s presentes en la imagen  El objetivo de esta adaptaci  n es maximizar la fidelidad    de informaci  n y la c
55. del sistema y analiza las estructuras de datos    internas y de los ficheros externos       Capitulo 4  Describe los algoritmos implementados y ofrece una visi  n de las    decisiones  problemas y soluciones tomadas durante la implementaci  n       Capitulo 5  Cubre la integraci  n del sistema durante su desarrollo independiente as    como en la arquitectura de adaptaci  n de contenido CAIN  Tambi  n muestra los    resultados de las pruebas realizadas       Capitulo 6  Presenta algunas conclusiones obtenidas despu  s del desarrollo y enumera    las posibilidades de mejora para trabajo futuro   Adicionalmente hay diferentes ap  ndices     Ap  ndice A  Manual de usuario para ejecutar correctamente la aplicaci  n     Ap  ndice     Presentaci  n del prototipo para la interfaz gr  fica de usuario     Ap  ndice C  Visi  n general del sistema CAIN     Ap  ndice D  Descripci  n de la librer  a OpenCV       Ap  ndice E  Breve descripci  n de la librer  a Ffmpeg     2 State of the art       2 1 Visual attention    The Image2Video application is based upon visual attention models observed in humans and  takes advantage of some of its limitations  8   When watching a picture  the viewer centers his  attention on some particular regions  which in many applications and papers   1  3  9  10   are said  to be faces  texts and other saliencies  Nonetheless  it is important to underline that our application  is independent of the semantic value of the regions of interest and is not bound to a 
56. e  Backward sentence     According to the Minimal Perceptible Time    MPT  concept  chapter 2 1 2   it would be  desirable if the camera stopped at the end of each route to be able to see the ROIs in detail  Liu et  al  state in  9  that the MPT of a face is about 200ms and that the MPT of a text is 250ms for each  word  As our Image2Video CAT is aimed to be used for any type of object  we will let the external  ROI descriptor decide the relevance of the attention object  A standard stop of  8frames 25fps 320ms is set at each object and the external ROI detector will be responsible for  giving a relevance factor for the object  The time stopped at each attention object will be calculated  as the product of the standard stop multiplied by the relevance factor  For example  for a relevance  factor set to 2  the camera movement will stop at this object 2 320ms 640ms  In the opposite case   for an absolutely non relevant object  the importance factor can be set to 0 and the camera  movement will pass through this object without stopping  However  if the speed control is set to  ease in and ease out  the camera will move slower when passing through the ROI  even if it has    zero relevance  If the relevance of each attention object is not defined  it will be considered as 1     4 9 Video coding from the temporal image files    The last step is to generate a proper video file from the temporal image files that were saved on  the hard drive  The video coding is done using the Ffmpeg
57. e by himself and makes him loose time  The Image2Video    approach simulates and automatizes the process of manual browsing     A result example of the three aforementioned approaches can be observed in the following  pictures  which simulate these procedures  It is important to realize  that the example has    deliberately been chosen to confine the important information in a relatively small  not centric area        a  Original image       b  Downsampled image       c  Cropped image with and without prior semantic analysis  Semantic vs blind cropping      10        d  Keyframes in manual browsing or in automated Image2 Video    Figure 2 2  Comparison between the different existing approaches    2 3 Attention focus detection    As already mentioned before  the Image2Video application relies on the external image  analysis and ROI generation  separating clearly the image analysis from the ROI based content  adaptation  To underline this fact  from now on we ll divide the Image2Video application into  ROIExtraction plus ROIs2Video  video generation out of an input image and a set of ROIs   being    this work centered in the ROIs2Video development     The transmoding tool is focused mainly on the video generation  independently on the  generation of semantic values of the ROIs  This way  any external program can use a desired object    detector and pass the ROI specification file to the ROIs2Video algorithm     In the deployment of the Image2Video CAT  Content Adaptation Tool  
58. e gt      Alternatively  if no ROIs want to be defined  the application has to present a mode that    generates a basic video preview of the image  This mode will be detailed in chapter 4 1 2     The XML file will not follow MPEG standards  because this would imply heavier labeling but  could be desirable for a 1OO MPEG compliant application     3 5 Zooming and shrinking images    During the generation of the output video  it is necessary to oversample  zoom  or undersample   shrink  images  when adapting sampling windows to frames  16   Zooming requires the creation  and assignment of values to new pixels  The easiest and fastest method is the Nearest Neighbour  interpolation  which tends to replicate the nearest pixel  A special case of the Nearest Neighbour  interpolation is in fact the Pixel Replication  applicable when the size of an image wants to be  increased an integer number of times  Each column is replicated n times and then each row is    replicated n times  Although the method is fast  it produces pixelation  checkerboard effect  for    Figure 3 4  Comparison of the performance of the different interpolation methods  From left  to right and up to down  we have the original image  the interpolated image using NN  interpolation  using bilinear interpolation and bicubic interpolation  The images have been  generated shrinking the original image to a resolution of 50x50 pixels and then zooming in to     resolution of 1200x1200     high factors           Bs    A slig
59. e gt    lt value gt 30000 1001 lt  value gt    lt value gt 30 lt  value gt    lt value gt 50 lt  value gt    lt value gt 60000 1001 lt  value gt    lt value gt 60 lt  value gt    lt  rate gt    lt  Frame gt    lt  VideoCoding gt    lt  CommonParameters gt    lt  FlementaryStreamF ormat gt      lt InputMediaSystemFormats gt      lt MediaSystemFormat id  JPEG Media  gt    lt FileFormat gt JPEG lt  FileFormat gt    lt Extension gt jpeg lt  Extension gt      lt VisualCoding gt    lt CodingFormatRef idref  JPEG   gt    lt  VisualCoding gt    lt  MediaSystemFormat gt    lt  InputMediaSystemFormats gt      lt QutputMediaSystemFormats gt      lt MediaSystemFormat id  MPEG1 Media  gt    lt FileFormat gt MPEG  1 lt  FileFormat gt    lt Extension gt mp 1 lt  Extension gt      lt VisualCoding gt    lt CodingFormatRef idref  MPEG 1   gt    lt  VisualCoding gt    lt  MediaSystemFormat gt      lt MediaSystemFormat id  MPEG2 Media  gt    lt FileFormat gt MPEG 2 lt  FileFormat gt    lt Extension gt mp2 lt  Extension gt      lt VisualCoding gt    lt CodingFormatRef idref  MPEG 2 f gt    lt  VisualCoding gt      65      lt  MediaSystemFormat gt    lt   OutputMediaSystemFormats gt      lt AdaptationModalities gt      lt AdaptationModality gt       Mode href  acemedia cme cat csImage2Video C AT Modes      lt Name gt K eyframe Replication Sumarizations Name gt    lt  Mode gt    lt MediaSystemRefInput idref  JPEG Media   gt    lt MediaSystemRefOutput idref  MPEG1 Media   gt    lt MediaSystemRefOutp
60. e included next        Photos Collection    Music Segmentation  Rhythm Estimation Output         Generation  2     P2V  Template       Framing Scheme    Motion Generation    Framing Scheme       Key frame Extraction          Figure 2 14  Flowchart of Photo2 Video taken from  3     1  Content Analysis  The content analysis applies a set of image and music content analysis algorithms   i  Image analysis  The images are first ordered by timestamps if available        by filenames  otherwise  The images are passed through a quality filter that removes images with a    quality measure under a predefined threshold and through a duplicate detection filter that    removes similar photographs     295     Next  face and attention detection are applied to estimate the attention objects on each  specific image and thus to establish the ROIs  The face detection can be accompanied by  some external annotation in order to be able to generate a film for an individual person out    of a digital photo album     With the information gathered during the face and attention detection  each photograph can  be semantically classified into different established groups  such as no people  portrait     multiple people  group photograph       ii  Music analysis  The video presentation will be accompanied by incidental  synchronized  music  The functioning of the alignment between music and video will not be described in    this document     2  Story Generation    As the name already anticipates  this st
61. e number of ROIs increased  this fairly simple algorithm reduced  drastically the quality of the browsing paths  returning chaotic browsing paths with back and forth    movements  The solution of finding a good browsing path had to be found in some other way     4 3 2 Simulated Annealing    The optimal browsing path will be obtained using a Simulated Annealing  17  approach  which  will return a sorted array of ROIs with a path distance near to the optimum  The algorithm imitates  the process of metallurgic cooling of a material to increase the size of its crystals and reduce their  defects  This technique is often used to solve the Travelling Salesman Problem    finding a good path  for the salesman  who wishes to visit a certain set of cities travelling the shortest distance possible   If the number of cities to travel is big  this problem can not be solved by brute force in an    affordable amount of time     Simulated annealing usually locates a good approximation to the global optimum of the  browsing path  Each step in the simulated annealing process replaces the actual solution by a  random nearby solution  i e  exchanging two nodes of the browsing path   If the new solution has a  lower cost  it is chosen  if the new solution shows up to have a worse cost  it can be chosen with a  probability that depends on the worsening and on the    temperature    parameter  that is gradually  decreased during the process imitating the cooling process of metals  At the beginning of
62. en el proyecto  o  sustituya una clase de fabricaci  n por otra que tenga asignado mayor precio o ejecute con mayores  dimensiones cualquier otra parte de las obras  o en general  introduzca en ellas cualquier  modificaci  n que sea beneficiosa a juicio del Ingeniero Director de obras  no tendr   derecho sin  embargo  sino a lo que le corresponder  a si hubiera realizado la obra con estricta sujeci  n a lo  proyectado y contratado     12  Las cantidades calculadas para obras accesorias  aunque figuren por partida alzada en el  presupuesto final  general   no ser  n abonadas sino a los precios de la contrata  seg  n las  condiciones de la misma y los proyectos particulares que para ellas se formen  o en su defecto  por  lo que resulte de su medici  n final     13  El contratista queda obligado a abonar al Ingeniero autor del proyecto y director de  Obras as   como a los Ingenieros T  cnicos  el importe de sus respectivos honorarios facultativos por  formaci  n del proyecto  direcci  n t  cnica y administraci  n en su caso  con arreglo a las tarifas y  honorarios vigentes     14  Concluida la ejecuci  n de la obra  ser   reconocida por el Ingeniero Director que a tal  efecto designe la empresa     15  La garant  a definitiva ser   del 4  del presupuesto y la provisional del 2      16  La forma de pago ser   por certificaciones mensuales de la obra ejecutada  de acuerdo  con los precios del presupuesto  deducida la baja si la hubiera     17  La fecha de comienzo de las obras ser
63. entana de muestreo centra las tres caras de las personas situadas en la  parte derecha de la imagen     El mayor beneficio de la conversi  n de im  genes a v  deo se obtiene usando como entrada una  imagen de resoluci  n decente y generando un v  deo de resoluci  n claramente inferior  Para estas    im  genes la adaptaci  n Image2Video ofrece una soluci  n realmente atractiva para el espectador     6 2 Trabajo futuro    Para proseguir el trabajo presente y mejorar el funcionamiento de la aplicaci  n ROIs2Video  la    investigaci  n futura podr  a centrarse en los siguientes puntos     3942    Como se mencion   anteriormente  debido a la dependencia de la librer  a Ffmpeg  es  necesario escribir y leer una gran cantidad de im  genes a de disco  lo cual frena  significativamente el rendimiento del programa  La codificaci  n de v  deo deber  a    incorporarse al sistema  de forma que los datos no saliesen de la memoria RAM     El alto n  mero de redimensionamientos de im  genes tambi  n supone una ralentizaci  n  del proceso  Investigaci  n futura podr  a abarcar la optimizaci  n de estos    redimensionamientos     Podr  an probarse otros algoritmos y funciones de coste  aunque generalmente los  resultados del algoritmo de Simulated Annealing con funci  n de coste de distancia son    satisfactorios     La calidad de la ordenaci  n de presentaci  n de las ROIs es en mayor parte subjetiva   pero se podr  a intentar hallar una medida de calidad objetiva y repetir el algoritmo de  S
64. es  11   FIGURE 2 3  EXAMPLE RECTANGLE FEATURES      cccccssssssssssssssssssssssssssssssssssesssesscscscsssssssssssssavavass   12   FIGURE 2 4  INTEGRAL IMAGE       Rubr nde  13     FIGURE 2 5  1  INITIALLY  UNIFORM WEIGHTS ARE DISTRIBUTED THROUGH THE TRAINING    EXAMPLES  2 amp 3  INCORRECT CLASSIFICATIONS ARE REASSIGNED WITH MORE WEIGHT    SHOWN AS BIGGER DOTS   THE FINAL CLASSIFIER IS A WEIGHTED   COMBINATION OF THE WEAK CLASSIFIERS        cscsssssssesesesesesessssssessesssesessscsescsssssssssssssseseesees  14   FIGURE 2 6  THE CLASSIFIER CASCADE IS A CHAIN OF SINGLE FEATURE FILTERS                 14   FIGURE 2 7  PERFORMANCE OF THE VIOLA JONES DETECTOR      csssssesesesesesesesesesesesescsesescscseseeees  16   FIGURE 2 8  THE CONVOLUTIONAL FACE FINDER  IMAGE COPIED FROM  7                    17  FIGURE 2 9  STEPS IN FACE LOCALIZATION  IMAGE COPIED FROM  7      18   FIGURE 2 10  PERFORMANCE OF THE CONVOLUTIONAL FACE FINDER                eee   20    FIGURE 2 11  MICROSOFT    S PHOTOSTORY INITIAL DIALOG                          20   FIGURE 2 12  SYSTEM                                          p SE ead E  FIGURE 2 13  SYSTEM ARCHITECTURE                  24   FIGURE 2 14  FLOWCHART      PHOTO2VIDEO TAKEN FROM  21     205   FIGURE 3 1  EXAMPLES FOR SAMPLING WINDOWS  ROIS AND FRAMES ON A PICTURE                  30    FIGURE 3 2  ROIS2 VIDEO ALGORITHM STEPS        c ccccssssssssssssssssssesesesessscssesssscssscsesvscscscscscscsesesees  30   FIGURE 3 3  SPECIFICATION OF A     
65. file                      esee nennen ennt  37   4 1 2 Automatic ROI initialization      ooooonnocccnoncccnonccononcnononanonnnacononcnnnnnoconn naciones   38     4 2 Aspect ratio adaptation of th   images dana  38     4 3 Finding the optimal browsing                                          41    4 3 1 Jumping to the nearest unvisited                2       400000  0          41    4 3 2 Simulated          ida dns pe Case eds   41     4 4 CV TANG ex ACCOM AE   44     4 5 Sampling window centering                                 ep ie pd eed                 44     4 6 Base speed calculan orion  45    4 7 Path interpolation for the simulated camera                                                         46             TIBerpolallObis dou caia enim                     e   47    4 72 Catmull Rom T  terpolatloH  idad              48    4 7 3 Arc length reparameterization    ooooonoocccconcccnoncnonnnnconnnanononcnonnnccnnnnc cnn nnne   50    ATA Speed control tilda d     52    AI LO OO    EA SS AAA   53    4 7 6 Overview of the interpolated Catmull Rom curve                                       54     4 8 Camera SIMIO e               54     4 9 Video coding from the temporal image                      57     A                   cau ois aae tcnas         59     3 1 Integration of the MOUSSE   59     5 2 CAIN Integral                          61    5 2 1 Mandatory Java class file            2  1                  55                                    62    5 2 2 Mandatory XML descr
66. fmpeg versions  some problems occurred  because the ffmpegJNI CAT is built on an older version  of the internal libraries and does only support the video generation from image files with  dimensions divisible through sixteen  though the video file does not have this restriction   A little  patch had to be introduced  to generate the image files respecting the restriction  but generating    afterwards the video file with the correct dimensions       66      5 3 Testing    5 3 1 Testing environment specifications    The system has been tested on different computers  using Microsoft Windows and Linux    operative systems  the versions for Windows and Linux are slightly different      The used computer specifications are     Computer    Home computer 1    Home computer 2    Lab computer 1    Lab computer 2     Laptop     Processor    Intel Pentium D 2 8  GHz    Intel Pentium Mobile  1 8 GHz    Intel Pentium 4 3 2  GHz    Intel Pentium 4  Centrino 1 86 GHz    Table 3  Computer specifications    Hard disk    100 GB for Windows    50 GB for Linux    50 GB for Windows    30 GB for Linux       The application has been tested on Microsoft Windows XP SP2  Home computer 1 amp 2 and Lab    computer 1  and on Linux Ubuntu 6 10  Lab computer 1   Also  during the integration in the    aceMedia CAIN framework  the system was tested on Linux CentOS 4 5  Lab computer 2      5 3 2 Library versions    The libraries used during the development of the applications have been      67     Library Vers
67. for the aceMedia  project  the workpackage WP4  D4 2 and D4 7 deliverables  provides an attention object detector  that is scoped to person and face detection recognition  5   Other applications  like surveillance  systems  could use the ROIs2Video algorithm adding for example a detector for any specific object   for example  cars  trains  abandoned objects      generate a video using the analysis data and send    the video over to a mobile device carried by a security guard     Generally  for most common applications  the semantic analysis is based on face and text   because most visual attention models  see  1   3   9   10   state that these are the objects an    average viewer concentrates on in entertainment applications     The following sections will therefore offer a brief introduction into the used detection    algorithms for faces     2 3 1 The Viola Jones face detection method    The Viola Jones method for face detection  available in OpenCV    see Appendix D    proposed  by Paul Viola and Michael Jones  12   is based on the training of a classifier with positive and    negative examples      11     2 3 1 1 Features    This classifier uses simple rectangular features evolved from Haar wavelets  pairs of dark and    light rectangles   thus called Haarlike features  Three different kinds of features are used        Two rectangle features  The value of a two rectangle feature is the difference between the  sums of the pixels in each rectangular region  The rectangles c
68. found to be adequate with a certain  resolution  2816x2112 pixels corresponding to a picture taken with a standard 6 megapixel camera  in full resolution   has been selected  For pictures with different resolutions  the pixel jump is  modified proportionally  If the viewer prefers faster or slower camera movement  a speed factor  which will multiply the predefined pixel jump has been introduced and can be modified to obtain  different speed outputs  By selecting a faster or slower camera movement  the user is also having  influence on the computing time of the program  Faster camera movements will need less  intermediate image files written to hard disk and less image resizings and therefore finally will    have a shorter processing duration as a consequence     The default defined jump is 8 pixels for images with dimensions 2816x2112  For a picture of  1408x1056 pixels  the jump would be rounded to 4 pixels  This pixel distance jumped from frame    to frame on the original image is called the base speed and has floating point precision     4 7 Path interpolation for the simulated camera movement    For delivering the simulated camera movement  it is necessary to specify the exact path  which  gets defined by the interpolation of the data or control points given by the ROIs  The data points  will be the upper left corners of the sampling windows centred at the ROIs and thus the  interpolation of new points will also be for the upper left corners of the sampling window  the  l
69. g Symposium  2006      2  Baltazar  Pinho  Pereira     Integrating low level and semantic visual cues for  improved image to video experiences     International Conference on Image Analysis  and Recognition  ICIAR    2006   P  voa de Varzim     Portugal  September 2006    3  Xian Sheng Hua  Lie Lu  Hong Jiang Zhang     Photo2Video A system for  automatically converting photographic series into video     TEEE Transactions on  circuits and systems for video technology  Vol  16  No  7  July 2006    4  F Pereira  I  Burnett     Universal multimedia experiences for tomorrow              Signal  Processing Magazine  Special Issue on Universal Multimedia Access  vol 20  n   2  pp   63 73  March 2003    5  aceMedia project  D4 2     Person detection  amp  Identification Algorithms     2007    6  aceMedia project  D4 7     Updated multimedia content analysis modules     2007    7  C Garcia  M  Delakis     Convolutional face finder  A neural architecture for fast and  robust face detection    IEEE Transactions On Pattern Analysis And Machine  Intelligence  26 11   1408  Nov  2004    8  Wolfe J     Visual attention    De Valois KK  editor  Seeing  2  ed  San Diego  CA   Academic Press  2000  p 355 386    9  Liu  Xie  Ma  Zhang     Automatic browsing of large pictures on mobile devices      International Multimedia Conference Proceedings of the eleventh ACM international  conference on Multimedia  Berkeley  CA  USA     10  Xie  Liu  Ma  Zhang     Browsing large images under limited displ
70. ge  This solution results in a distorted image which  depending on  the amount of change  can be unpleasant for the viewer  This is the default solution  when using OpenCV  because the interpolation methods in the mentioned library  automatically change the aspect ratio of the image if it is resized to a different aspect    ratio     e Image flipping  In addition to cropping or adding black bars  the image could be  previously rotated in order to reduce the effect of cropping or black bars  This step  supposes that the display can also be rotated  what is not true for every device  for  example if the video wants to be viewed on a personal computer   The action of  flipping the image was actually implemented  but finally eliminated because it seemed  unpleasant when viewing the videos on fixed screens  Also  if a video is generated by  the camera fly through of various input images  it would be annoying to have to turn    the display every time the image is flipped       40      4 3 Finding the optimal browsing path    4 3 1 Jumping to the nearest unvisited ROI    The first attempt to establish the order of presentation of the ROIs was to start by the ROI  situated most to the left and then continue jumping to the closest  still not displayed ROI and so on   For a very reduced number of ROIs   lt 5   or specific cases where the ROIs where placed following  certain patterns  for example all ROIs placed in a row   this method showed up coherent browsing  paths  However  when th
71. ge will be shown at the beginning and end    of the video     2 1 1 Information fidelity    The term information fidelity introduces a subjective comparison between the information  contained in an original image and the information maintained after its adaptation  transmoding in  our case  Chen et al  propose a numerical formula for information fidelity in  11   defining its range  from 0  all information lost  to 1  all information maintained   Thus  the optimal solution of image  adaptation will try to maximize this number under various client context constraints  According to  these authors  the resulting information fidelity of an image I  consisting of several attention  objects  can be calculated as the weighted sum of the information fidelity of all attention objects in    I     IF                     ROI cI    AVi  Attention Value              Information Fidelity of the i  attention object   AOi  ith Attention Object in the image           Region Of Interest  Determines the spatial region covered by the i  AO    Thus  the Image2Video application has to show all the image s attention objects to reach an    information fidelity close to the maximum     2 1 2 Composite image attention model  Another common definition to most existing papers on Image2Video transmoding is the  concept of attention object   1  11 etc   An attention object is an information carrier that often  represents semantic classes such as faces  texts  objects or saliencies in the image  Generally  
72. ges a good browsing path shows up are      Shorter and therefore smaller sized videos      Reduction of the processing time to generate the video        More pleasant video watching sensations as a short browsing path will implicitly    reduce the number of camera zigzag movements   The pseudocode of the simulated annealing algorithm looks as follows     Initialize i0  c0  LO   k 0   i i0   repeat  for 1 1 to Lk  generate jes  if f j  lt  f  i  Y i j   else if exp  f i  f j   ck  gt rand 0 1   gt  i j   k k 1     ck ck x a    until final condition     i  actual browsing path   j  possible next browsing path   ck  temperature variable   cO  initial temperature   Lk  iterations in each step   f i   cost function of path i   a  cooling parameter  0 8 lt a lt 0 99999            43      4 4 Keyframe extraction    The specified ROIs  or the image divisions in the video without semantic analysis  act as video  keyframes  Additionally  the full image will be added as a keyframe at the beginning and at the end  of the video in order to give an initial and final overview of the complete image and let the viewer    locate every object  The detailed steps are as follows     e        video will start off with the full image  first keyframe  and will zoom in towards    the first ROI  This pattern is called forward sentence  3      e Following the obtained browsing path  the video will show consecutively the different  defined ROIs  connecting them through Catmull Rom interpolation curves  
73. h increasing precision     With the help of the precomputed table  it is possible to find a point at a given arc length  distance  and therefore it is feasible to find the interpolation points compliant with a particular    speed function                                Index T Arc length  0 0 00 0 00  1 0 05 0 08  2 0 10 0 15  3 0 15 0 39  4 0 20 0 66  20 1 00 3 58             Table 1  Arc length table for reparameterization    Using the table  it will be necessary to loop increasingly the distance L according to the speed    function and use it to compute the according value of the parameter    using the next formulas        z L  ArcLengthlindex   ArcLengthlindex   1  ArcLengthlindex        aux    t  tlindex  taux   tlindex   1     tlindex           http   jungle cpsc ucalgary ca 587 pdf 5 interpolation pdf         51        SReusing the ml and m2 variables obtained in Catmull Rom  For each data_point i  calculate number of points to calculate n     j lt n  j      L  step   Step        be constant or variable according to  Sthe desired speed function       find L   s index in the precalculated table        calculate t according to the recently stated formulas        h00 2  t  3 t   1   Exactly as in normal Catmull Rom  h01  2 t  r3 t    h10 t  2 t rt   h11 t  t    interpolated_data j  y h00 data   hOl data   interpolated data j  x h00 data i         y hl0 ml1 i       1   y h11 m1  1 1                 x h10  m1  i     1   x h11 m1  itl      hOl data       4 7 4 Speed control 
74. he limits are not practical   General parameters       lt    Catmull Rom curvature parameter  Its bounds are 0  linear interpolation  to 1 very    curved interpolation    Explained in chapter 4 7 2         y Speed multiplying factor  Any positive floating point value   Explained in chapter 4 6       7   Maximal zoom applied on ROIs  Any positive floating point value  The ROIs will be  zoomed in with a factor limited by the minimum permitted by the video resolution and the    maximal zoom   Explained in chapter 4 4       5   Flag to specify whether Simulated Annealing is applied  1  or not  0   Only the  Boolean values 0 FALSE and 1 TRUE are allowed          a Flag to specify whether the ROIs are read out from a file  0  or generated  automatically  1   dividing the image in four ROIs and traveling clockwise through them     Explained in chapter 4 1   Only the Boolean values O FALSE and 1 TRUE are allowed     It is not mandatory to specify all the values  because the parameters that are not manually specified    will be set to default average or most generally used values     In second place  and necessarily after the tag value pairs  it is obligatory to specify the image    files and  when needed  the files with the ROI information  In the case the ROIs want to be     81     generated automatically    a 1      a sequence of the image paths has to follow the tag value pairs     The generated video will contain the camera fly through for all the images sequentially  Contrary   
75. hen adding  the bars  the positions of the ROIs have to be adequately displaced the same number of pixels of    the black bars    size   The size in pixels of each black bar is calculated      in case of upper and lower bars as     hus  qe x   T ue  h      W ideo  bar 7       2      incase of left and right bars as     h x Wideo    w  image h image  video    2    Whar      Adding the bars can be more or less noticeable  depending on the aspect ratios of the video and  the image  The worst case is when a horizontal vertical image wants to be transmoded to a  vertical horizontal video  see Figure 4 2   Though  the decision of adding bars produces less side     effects than other adaptation solutions  and what is more important  avoids information loss           b  Image dimensions  1232 x 1632  Video dimensions  300 x 240       a  Image dimensions  2304 x 1728  Video dimensions  240 x 300      39            c  Image dimensions  2304 x 1728    Video dimensions  300 x 240 d  Image dimensions  1232 x 1632  Video dimensions  300 x 240    Figure 4 2  Examples for the aspect ratio adaptation in bad cases  a amp b  and in better cases   c amp d     Other solutions to the problem of adapting the image to the output display resolution and their    inconvenients are     e Image cropping  This solution involves the elimination of possible important areas of  the image which is a weak point of this kind of adaptation  especially in the worst case    mentioned before     e Aspect ratio chan
76. his stage  the path used to display the video will be generated in two steps     222    i  Display size adaptation  The attention objects ideally want to be displayed in the video with  the same spatial resolution as the image  i e  one pixel on the image wants to be displayed  as one pixel on the video   Therefore  big attention objects  except faces  have to be split in  smaller parts that fit the display size  Small attention objects can ocassionally be grouped    with others         Browsing Path Generation  This mechanism determines the order in which the attention  objects will be displayed  Attention objects are displayed following the order of their AVs  and taking into account the travelled distance somehow in order to avoid travelling back    and forth  However  this algorithm is not explained in detail and lacks clarity     4  Video Generation    In this stage the video sequence is generated according to three modes   i  Normal mode  All the attention objects are shown    ii  Time based mode  The video cuts all the attention objects that appear after a certain time    limit     ii  Amount of information based mode  The video sequence will show only the most    important attention objects until a certain information percentage limit is reached     2 4 2 Rapid Serial Visual Presentation     Microsoft Research Asia    Microsoft Research Asia has published a variety of articles principally under the authory of  Xie  Ma and Zhang  9   10   which use the exact term Rapid
77. htly more sophisticated way of zooming images is the bilinear interpolation  applied in    the ROIs2Video tool  and which uses the average of the four nearest neighbours of a point     Other interpolation methods  as for example the bicubic interpolation  use more neighbour  points to obtain the interpolated value  This generally provides better and smoother results  but is  also computationally more demanding  In the ROIs2Video application  it does not seem to be  useful to apply a complex method and it is preferable to use the bilinear interpolation to reduce    processing time     The CvReference library  see Appendix D  in OpenCV includes the needed methods  so the    digital image interpolation has not to be implemented from scratch      25     4 Development       4 1        initialization    4 1 1 ROI initialization from file    The first attempt of defining the ROI specification files was a simple text file  which contained    each ROI specified in a separate line in the form    XRon Y ron Widthron Heightron Relevanceron           Yrop Widthrop Heightrop Relevancegop       The final solution tries to take advantage of XML   s robustness  The mandatory structure of the  XML files was described in section 3 4  The read out of these files is done using OpenCVs file  storage functions  which provide a complete set of access functions to XML files  The following is    an example of an actual XML file with two ROIs      lt  xml version  1 0   gt    lt opencv_storage gt    lt
78. idad que se derive de la aplicaci  n o influencia de la misma     9  Si la empresa cliente decide desarrollar industrialmente uno o varios productos en los  que resulte parcial o totalmente aplicable el estudio de este proyecto  deber   comunicarlo a la  empresa consultora     10  La empresa consultora no se responsabiliza de los efectos laterales que se puedan  producir en el momento en que se utilice la herramienta objeto del presente proyecto para la  realizaci  n de otras aplicaciones     11  La empresa consultora tendr   prioridad respecto a otras en la elaboraci  n de los  proyectos auxiliares que fuese necesario desarrollar para dicha aplicaci  n industrial  siempre que  no haga expl  cita renuncia a este hecho  En este caso  deber   autorizar expresamente los proyectos  presentados por otros     12  El Ingeniero Director del presente proyecto  ser   el responsable de la direcci  n de la  aplicaci  n industrial siempre que la empresa consultora lo estime oportuno  En caso contrario  la  persona designada deber   contar con la autorizaci  n del mismo  quien delegar   en   l las  responsabilidades que ostente       100      
79. ile has to be a  JPEG image file with resolutions between 500x500 and 3000x3000 pixels and that a MPEG 1 2    video will result as an output       63        lt cat xsi schemalL ocation  acemedia cme cat file    CATCapabilities xsd  gt    lt name gt Image2VideoCAT lt fMmame gt    lt description gt Generates a video passing through an image s ROIs lt  description gt        i1    JPEG visual elementary stream adapation capabilities description     gt      lt ElementaryStveamF ormat type  Image  id  JPEG  gt      lt CommonParameters gt      lt VisualCoding gt      lt Frame gt     sheight gt       lt            gt    lt from gt 500 lt  from gt    lt to gt 3000 lt  to gt     lt  range gt    lt fheight gt       width       lt            gt    lt from gt 500 lt  from gt    lt to gt 3000 lt  to gt     lt  range gt    lt  width gt    lt  Frame gt    lt  VisualCoding gt    lt  CommonParameters gt    lt  FlementaryStreamF ormat gt      lt ElementaryStreamF ormat type  Video  id  MPEG 1  gt      lt CommonParameters gt      lt VideoCoding gt      lt Frame gt      lt rate gt    lt value gt 25 lt  value gt    lt value gt 30 lt  value gt    lt            gt    lt  Frame gt    lt  VideoCoding gt    lt  CommonParameters gt    lt  FlementaryStreamF ormat gt      64        lt ElementaryStreamF ormat type  Video  id  MPEG 2  gt      lt CommonParameters gt      lt VideoCoding gt       Frame      lt rate gt    lt value 24000 1001 lt  value gt    lt value gt 24 lt  value gt    lt value gt 25 lt  valu
80. imulated Annealing si la calidad de la ordenaci  n no es lo suficientemente alta  Otra  opci  n ser  a repetir la ordenaci  n Simulated Annealing varias veces y elegir la soluci  n    con menor coste     El mayor inconveniente de aplicar una funci  n de coste por distancia est   en el hecho  de que la funci  n de coste no est   influenciada por el factor de zoom de cada ROI y  generalmente no es agradable si la c  mara est   acerc  ndose y alej  ndose    continuamente     Trabajo futuro podr  a mejorar la presentaci  n de ROIs grandes que no caben en la  ventana de muestreo  Una posibilidad ser  a dividir dichas ROIs y escanearlas con  resoluci  n espacial 1 1 o similar  Unos caminos de escaneado son m  s evidentes que  otros  como se puede ver en la figura 6 1  donde el camino de la derecha es uno de    varios posibles y deber  a ser comparado con otras opciones     5       Figure 6 2  Caminos de escaneado de ROIs subdivididas  El rect  ngulo negro de trazado  ancho representa la ventana de muestreo     6  Usando anotaci  n adicional y permitiendo que las ROIs sean rect  ngulos rotados  el  movimiento de c  mara simulado podr  a ser mejorado a  adiendo la posibilidad de  rotaci  n en el plano xy  Como se mencion   anteriormente en el cap  tulo 4 8  esto ser  a      til por ejemplo para leer con mayor facilidad texto rotado     Te    References        1  J  Baltazar  P  Pinho  F Pereira     Visual attention driven image to video transmoding    Proceedings of the Picture Codin
81. ion    Ffmpeg e libavutil version  49 0 0    e libavcodec version  51 9 0  e libavformat version  50 4 0    Built on Jun 20 2006 02 00 39  gcc  4 0 3    e  libavutil version  49 0 0  e libavcodec version  51 11 0  e libavformat version  50 5 0    Built on Sep 20 2006 00 26 15  gcc  4 1 2 20060906  prerelease    Ubuntu 4 1 1 13ubuntu2     e libavcodec version  47 23  e libavformat version  46 16    Built on Mar 31 2005 11 37 24  gcc  3 3 4  Debian 1 3 3 4 13      This older version corresponds to the FfmpegJNI CAT     OpenCV RCI  Released August 11  2006       Table 4  Library versions    5 3 3 Test examples    The tests have been done with a set of different images  taken from personal images  from the    aceMedia database and from the Internet  considering    different resolutions    number of ROIs    disposition of ROIs      relative size of ROIs      68      In the tests  the videos have been generated using the different execution parameters  camera  speed  curvature in the Catmull Rom curvatures  automatic ROI generation      Some execution    results of the program are compared in Table 5     File Image Video Number   Speed Number Execution  resolution resolution of ROIs   factor of time  frames      tenisjpg   600x435   320x240   7   4   315   5285    tenisjpg   600x435   150x100   7   1   1259   8 39      2 torre JPG   2112x2816   320x240   3   4   337   54 885    2Mome JPG   2112x2816   240x320   3   4   387   37155    7JPG   2816x2112   320x240   7   4   400   3366 
82. iption file  Image2VideoCAT xml                          63    5 2 3 Adaptation of the native C Uli   66    5 2 4 Modification of the Ffmpeg                                   66     E   67    5 3 1 Testing environment                                                    67    5 3 2 Y                             ted up M SURE S   67   5 3 3 Test Ox amples curo add   68     6 Conclusions and future WOPE       71     ex Conclusions A        a Se ee ee   71     6 2 F  t  fe WOTK de                                  71     6 Conclusiones y trabajo                                                                74     MESI MI EUN E   74            O THBEO epee erate doce area a n tM e      ass   74     O O         Staal cs fel te          TI     Gin c NEMPE                        79     Appendices  emot abet tete cab oed                erc E   81     A Running the application   81     B Manual ROI annotation tool                        essen nennen   83        11       CAIN system everview  21 o e ia     Architecture scu e dee e          con        oM ES ACE o p M o       iii    FIGURE INDEX    FIGURE 1 1  BASIC IMAGE2 VIDEO DIAGRAM      scscscssssssssssssssssssssssssssssssssssssssssssssssssscessvevevevessnsnes de  FIGURE 1 1  DIAGRAMA B  SICO DE LA ADAPTACI  N IMAGE2VIDEO            eee  5   FIGURE 2 1  VISUAL ABILITY TEST  IMAGE TAKEN FROM  8         csesesesesesesessscsssesesssesesescsesesescseeees pe  FIGURE 2 2  COMPARISON BETWEEN THE DIFFERENT EXISTING APPROACHES              ent
83. izes  the viewer would note an explosive growth of the    sampling window  which is not desirable at all   4 7 6 Overview of the interpolated Catmull Rom curve  Summarizing  the interpolated curve is done in two steps     1  Standard Catmull Rom interpolation  using the pseudocode in 4 7 2 and obtaining a    curve with points with different separations   2  Arc length reparameterization of the interpolated curve  considering  a  Constant speed or ease in amp ease out speed control as described above    b  The maximal separation between data points  in order to control the zooming    D  d  d min 1         D      Du  distance travelled by the upper left corner    speed  as       D    distance travelled by the lower right corner  d  desired pixel distance jumped from frame to frame    d     pixel distance jumped from frame to frame by the upper left corner so    the fastest of both corners travels at d pixels frame    4 8 Camera simulation    The virtual camera used to generate the videos from static images has been provided with    panning and zooming movements     e Panning is defined as the two dimensional movement of the camera in the x and y axis     without allowing movement in the z axis   Figure 4 13 a      e Zooming is defined as the action of approaching to or moving away from the image by    moving in the z axis  without allowing movement in the x and y axis  Figure 4 13     b   c      284    The system is able to pan and zoom simultaneously  Figure 4 13 d    The only
84. lly by any external ROI generation  module  Such an automatical annotation tool has been provided for example by the aceMedia    WorkPackage 4   5    6    in the form of a convolutional face detector  7      Although it may look as if the application is designed for leisure purposes  this application can  also be used in many different security applications  where a mobile patrol  ambulance  police     private security  firemen  is provided with a mobile device and receives a video generated from a      A Region Of Interest is used in imaging for defining the boundaries of an object  In medical imaging  the concept is widely spread and used for measuring for example the size of a tumor  In non medical  imaging  the best known standard is JPEG2000 that specifically provides mechnisms to label the ROIs in an  image     large image taken in the crime accident scene  So  the mobile patrol is able to prepare for the  situation and  given the case  call for reinforcements or a special support unit  This possible system  would only need a specific module to identify the desired objects and ROIs and pass this    information to the Image2Video application   1 3 Structure of the present document  This memory is structured in the following chapters     Chapter 1 provides the Introduction motivation and objectives  of the master thesis      Chapter 2 presents other approaches to adaptation of large images to reduced displays   as well as the existing Image2Video transmoding systems  In a
85. ltora representada por el Ingeniero Director del Proyecto     2  La empresa consultora se reserva el derecho a la utilizaci  n total o parcial de los  resultados de la investigaci  n realizada para desarrollar el siguiente proyecto  bien para su  publicaci  n o bien para su uso en trabajos o proyectos posteriores  para la misma empresa cliente o  para otra     3  Cualquier tipo de reproducci  n aparte de las rese  adas en las condiciones generales   bien sea para uso particular de la empresa cliente     para cualquier otra aplicaci  n  contar   con  autorizaci  n expresa y por escrito del Ingeniero Director del Proyecto  que actuar   en  representaci  n de la empresa consultora     4  En la autorizaci  n se ha de hacer constar la aplicaci  n a que se destinan sus  reproducciones as   como su cantidad     5  En todas las reproducciones se indicar   su procedencia  explicitando el nombre del  proyecto  nombre del Ingeniero Director y de la empresa consultora     6  Si el proyecto pasa la etapa de desarrollo  cualquier modificaci  n que se realice sobre   l     deber   ser notificada al Ingeniero Director del Proyecto y a criterio de   ste  la empresa consultora  decidir   aceptar o no la modificaci  n propuesta       99      7  Si la modificaci  n se acepta  la empresa consultora se har   responsable al mismo nivel  que el proyecto inicial del que resulta el a  adirla     8  Si la modificaci  n no es aceptada  por el contrario  la empresa consultora declinar   toda  responsabil
86. mandatory and desirable     In the last step  the execution module executes the selected CAT  calling the adaptation method    of the chosen CAT     C 3  CAIN extensibility  CAIN provides a flexible extensibility mechanism in order to integrate new or update existing  CATs without having to recode or recompile the core of CAIN  To inform the decision module  about the new CAT capabilities  it is required to enclose a file with the CAT   s adaptation    capabilities       88      Besides  all the CATs are forced to implement a common adaptation method that provides a  generic interface and performs the adaptations  This method will be called by the execution module    and will return a list with the paths and formats of the adapted contents     More about the integration of a CAT and the particular integration of the Image2VideoCAT    into the CAIN architecture can be consulted in chapter 5 2       89      D OpenCV    OpenCV   Open Source Computer Vision     is a library of programming functions in C C    mainly aimed at real time computer vision  Some example applications of the OpenCV library are    Object Identification  Segmentation and Recognition  Face Recognition  Motion Tracking etc     OpenCV is operative system and hardware independent and is optimized for real time applications   It consists of four principal function libraries  CxCore  CvReference  CvAux and HighGui  which  will be detailed in the following points   The full documentation can be found int the URL  
87. n be cut of the video almost randomly     Microsoft    s Photo2Video application  on the contrary  is a more complete article  The approach  is an entertainment application to generate video albums with incidental music to be viewed on a  personal computer  and therefore needs a strong content analysis  semantic classification and story  generation  in order to generate meaningful video albums  This information processing is useful for  leisure time applications  but unnecessary for other particular uses  such as security and  surveillance systems  A difference to the other approaches is that the Photo2Video application is  not designed to generate small sized videos for mobile devices and does not talk explicitly about  the possibility of adapting the video to different screen sizes  The motion generation is discussed in    detail and has served as a guide for some of the decisions taken for the work of this master thesis     ERES    2 4 4 2 Contributions of this master thesis    Our approach will rely on an external information source that establishes the ROIs that have to  be shown and assigns an importance or relevance  both terms will be used indistinctly  factor to  each ROI so it is displayed proportionally to its relevance  All the applications presented above  include fixed ROIExtraction modules  i e  face and saliency detectors  and differentiate the  presentation according to the ROI type  Our work pretends to be a more general approach for the  ROIs2Video system a
88. n de la obra  se har   una recepci  n provisional previo  reconocimiento y examen por la direcci  n t  cnica  el depositario de efectos  el interventor y el jefe  de servicio o un representante  estampando su conformidad el contratista     22  Hecha la recepci  n provisional  se certificar   al contratista el resto de la obra   reserv  ndose la administraci  n el importe de los gastos de conservaci  n de la misma hasta su  recepci  n definitiva y la fianza durante el tiempo se  alado como plazo de garant  a  La recepci  n  definitiva se har   en las mismas condiciones que la provisional  extendi  ndose el acta  correspondiente  El Director T  cnico propondr   a la Junta Econ  mica la devoluci  n de la fianza al  contratista de acuerdo con las condiciones econ  micas legales establecidas     23  Las tarifas para la determinaci  n de honorarios  reguladas por orden de la Presidencia  del Gobierno el 19 de Octubre de 1961  se aplicar  n sobre el denominado en la actualidad     Presupuesto de Ejecuci  n de Contrata    y anteriormente llamado    Presupuesto de Ejecuci  n  Material    que hoy designa otro concepto     Condiciones particulares    La empresa consultora  que ha desarrollado el presente proyecto  lo entregar   a la empresa  cliente bajo las condiciones generales ya formuladas  debiendo a  adirse las siguientes condiciones  particulares     1  La propiedad intelectual de los procesos descritos y analizados en el presente trabajo   pertenece por entero a la empresa consu
89. nd to concentrate on a quality  user customizable video generation that can be  generated independently on the prior semantic analysis  The planned contributions in the research    field of Image2Video adaptation are     Video quality and motion smoothness      General and open implementation  independent on the prior ROI detection and semantic    analysis      Usercustomizable video  The user can set his preferences in   e Camera motion speed  e Curvature of the camera motion  e Maximal zoom in    e Video bitrate and used codec  This options offer the possibility of generating lighter or  heavier videos  leaving it to the user to find a compromise between the video coding  quality and its size  For example if the video will be sent through a low bandwith    network the user is able to generate a video with low bitrate     Possibility of using automatic or manual methods   e Automatic or manual ordering of the browsing path     e Using the manual annotation GUI  together with the manual ordering and the other    available options  is a powerful and fast tool to create completely personalized videos      Video generation at any frame resolution  as long as the resolution is lower than the image    resolution       New research in alternative algorithms to the ones used in the articles     2m    3 Design       3 1 Definitions    Before starting describing the architecture  it is important to establish some definitions to avoid    misunderstandings and unify some concepts       
90. on only CvSeq  and indirectly the    CvMemStorage  were used       Functions and operations  The following table contains a short classification of the    functions in CxCore     Classification    and sub arrays                      the image and matrix ES  data structures  Linear algebra  Math functions    Pino as do meam ste Arithmetic  Logic and Comparison    Funcions to draw on an   MTM imas specially used  for debugging and   marking of ROIs    Writing and reading   File storage functions O    YAML formatted   files     functions    Table 8  Functions and operations in CxCore       D 2 CvReference    For most applications  CvReference is the main library of OpenCV functions  However  for the  development of the ROIs2Video tool  only the pattern recognition played a major role  Therefore  the pattern recognition in OpenCV  concretely the Viola Jones face detection method  was    described in detail in 2 3 1  while the other function families are only summarized in this table     Classification    Gradients  Edges and Corners  Sampling  Interpolation and Geometrical Transforms  Morphological Operations    Image Processing        91        puncta nal    Motion      Object Object Trackins  E Optical Flow  C Calibrati  Camera Calibration and 3D ae 2   x ae  Reconstruction    Epipolar Geometry                   Object detection  See 2 3 1 for more information about  8 the particular case of Face Detection      Table 9  Function classification in CvReference       D 3 CvAux    
91. onas que han  colaborado directa o indirectamente en el proyecto y a las que quedo muy agradecido     En primer lugar quer  a agradecer a Jos   Mar  a Mart  nez por la posibilidad que me ha  ofrecido de realizar el proyecto de fin de carrera con   l y por su trabajo continuo junto a  Jes  s Besc  s por mejorar la carrera de Ingenier  a de Telecomunicaci  n en la UAM     Durante el desarrollo del proyecto ha sido muy importante la ayuda de V  ctor Vald  s  que  me ha echado una mano continuamente a lo largo del proyecto y ha colaborado en el  trabajo     Asimismo quer  a agradecer a los ocupantes del laboratorio y los miembros del GTI  en  especial V  ctor Fern  ndez  Javier Molina y Juan Carlos San Miguel  por su ayuda y por  todo lo que me he podido re  r con ellos     Y en especial a mi padre  mi madre y mi hermana que son lo m  s importante para m       Fernando Barreiro  Septiembre 2007    KEY WOKS isis ds 4  A                                fette                                    4  BRS ET Ths                O O 4  A O 5  1  Introduction                                                                      aun eee  1   EE Motyatonrn          A ae  1   LEAD SCL VES             EE METER  1   1 3 Structure of the present docUmiellL   4 d edens tes ii           3   is                                           4   JE TH WAG LOM os                                                        4      PAD SU VO                              do tados  4   1 3 Organizaci  n de la rnemotrl
92. ower right corner of the sampling window can move    freely    to allow increasing and decreasing    for simulating zooming as discussed later in 4 7 2        46      Interpolation  unlike approximation  is a specific case of curve fitting  in which the function  must go exactly through the data points  Through interpolation we are able to obtain new points    between the control data        7777  A   oe      2                 5 Q       5    a  8 0          d 2        oor     Interpolation Approximation    Figure 4 6  Interpolation and approximation    4 7 1 Linear interpolation  The most simple and direct interpolation is the linear interpolation    which was applied in the  first implementation of the Image2Video CAT being then substituted by the Catmull Rom  interpolation method  18   Linear interpolation is given by the following equation for two data    points  xa  ya  and  xy  Yo         q ATAJO    Va     y  Ja   x   X      This is the first interpolation one would possibly think of  as it is easy to implement  but it is  generally not a good solution for the camera simulation movement  Linear interpolation is not  differentiable  has no tangential continuity  at the control points and therefore the movement is    abrupt and unpleasant for the viewer     Figure 4 7  Positional  but not tangential continuity at the central data point when using  linear interpolation       5 http   en  wikipedia org wiki Linear_interpolation        47     4 7 2 Catmull Rom interpolation  
93. ponents  summarized in the following table     Library Description    Libavcodec Audio video encoders and decoders  Some of the supported  codecs are shown in Table 13    Libavformat Multiplexers and demultiplexers for audio video    Libavutils Auxiliary library  Libpostproc Video postprocessing routine library  Image scaling routine library    Table 12  Components of Ffmpeg       Multimedia compression formats accepted in Ffmpeg  Video compression MPEG 1 H 261 WMV 7  MPEG 2 H 263         MPEG 4 H 264 RealVideo 1 0  amp  2 0                    MPEG 1 Layer III  MP3  AC3  Audio compression   MPEG 1 Layer II MPEG 1 Layer ATRAC3  RealAudio  WMA    ISO IEC ITU T    Image compression JPEG GIF  PNG TIFF       Table 13  Most important multimedia compression formats accepted in Ffmpeg       10 http   ffmpeg mplayerhq hu          94      1     2     3     4     5     6     7     8     PRESUPUESTO    Ejecuci  n Material    Compra de ordenador personal  Software incluido                             Material de oficina ii didnt iaa    Gastos generales  e 16   sobre Ejecuci  n Material    Beneficio Industrial    e 6   sobre Ejecuci  n Material                        esee    Honorarios Proyecto  e 800 horas a 15       hora  Material fungible    e Gastos de impresi  n    e                                                000      1 0600 000000000 000                 Subtotal del presupuesto    e Subtotal                                                    aplicable  e 16  Subtotal Presupuesto 
94. r rotating on the  xy plane  As mentioned before  this would be useful for example for reading rotated    texts more easily     Bice    6 Conclusiones y trabajo futuro       6 1 Conclusiones    Se puede decir que la aplicaci  n desarrollada ha alcanzado sus objetivos principales de ofrecer  una simulaci  n de movimiento de c  mara agradable y suave para la mayor parte de im  genes  El  movimiento de c  mara se ha visto mejorado significativamente con la introducci  n de la  interpolaci  n Catmull Rom reparametrizada con aceleraci  n al inicio y frenada al final de cada    uni  n entre ROIs     Es importante mencionar que la aplicaci  n muestra mejores resultados  movimientos de  c  mara m  s agradables  cuando las im  genes de entrada son de alta resoluci  n  Por una parte  el  sistema se ha dise  ado y probado mayoritariamente con im  genes de resoluciones medias y altas   Por otra parte tiene poco sentido convertir im  genes de baja resoluci  n a v  deo  ya que estas tienen    poca finura de detalle y se pueden ver directamente en una pantalla peque  a     V  deos con resoluci  n cercana a la resoluci  n de la imagen muestran particularmente malos  resultados  ya que la ventana de muestreo no tiene apenas margen para moverse libremente  En  estos casos con mucha probabilidad la ventana de muestreo coincidir   exactamente para m  s de  una ROI y no habr   panning entre dichas ROIs  lo que puede llevar a confusi  n del usuario  ver    Figure 6 1         Figure 6 1  La misma v
95. raction will be limited to establish some preferences on the video    generation     The application depends on an external module  which is commited to define the regions of    interest  which will vary on the domain where this application is used     The results of the project have been incorporated in the content adaptation framework  named  CAIN  developed within the Sixth Framework Program European project IST FP6 001765    aceMedia  http   www acemedia org          Resumen       El objetivo de este proyecto de fin de carrera es investigar en un sistema de adaptaci  n de  im  genes a video  que aproveche el modelo de atenci  n visual humano  para ver im  genes de alta  resoluci  n en dispositivos m  viles sin p  rdidas significativas de informaci  n  La aplicaci  n tratar    de automatizar el proceso de scrolling y zooming a trav  s de una imagen con una interacci  n    m  nima del usuario  que se limitar   a establecer sus preferencias en la generaci  n del video     El sistema de adaptaci  n depende de un m  dulo de externo  encargado de detectar las regiones    de inter  s  cuyo tipo variar   seg  n el entorno que haga uso del sistema de adaptaci  n     Los resultados del proyecto se han incorporado en la arquitectura de adaptaci  n de contenido  CAIN  desarrollada en el proyecto europeo del Sexto Programa Marco IST FP6 001765 aceMedia     http   www acemedia org          Agradecimientos       Este trabajo no hubiera sido posible sin la valiosa ayuda de todas las pers
96. realizado  totalmente por la empresa licitadora     3  En la oferta  se har   constar el precio total por el que se compromete a realizar la obra y  el tanto por ciento de baja que supone este precio en relaci  n con un importe l  mite si este se  hubiera fijado     4  La obra se realizar   bajo la direcci  n t  cnica de un Ingeniero Superior de  Telecomunicaci  n  auxiliado por el n  mero de Ingenieros T  cnicos y Programadores que se estime  preciso para el desarrollo de la misma     5  Aparte del Ingeniero Director  el contratista tendr   derecho a contratar al resto del  personal  pudiendo ceder esta prerrogativa a favor del Ingeniero Director  quien no estar   obligado  a aceptarla     6  El contratista tiene derecho a sacar copias a su costa de los planos  pliego de condiciones  y presupuestos  El Ingeniero autor del proyecto autorizar   con su firma las copias solicitadas por el  contratista despu  s de confrontarlas     7  Se abonar   al contratista la obra que realmente ejecute con sujeci  n al proyecto que  sirvi   de base para la contrataci  n  a las modificaciones autorizadas por la superioridad o a las    rdenes que con arreglo a sus facultades le hayan comunicado por escrito al Ingeniero Director de  obras siempre que dicha obra se haya ajustado a los preceptos de los pliegos de condiciones  con  arreglo a los cuales  se har  n las modificaciones y la valoraci  n de las diversas unidades sin que el  importe total pueda exceder de los presupuestos aprobados  Por 
97. rger than the screen size      grouping together nearby attention objects to reduce the computational complexity of the    browsing path generation algorithms    2  Optimal Path Generation    Similar to the time based and information based modes in the IST s Image2Video application   Xie and his colleagues define the Skimming and the Perusing mode  which obtain the order of the  ROIs using a backtracking algorithm to enumerate the possible paths and find the best among them   In the case the user wants to view all the information  the problem of ordering the ROIs can be seen  as the Traveling Salesman Problem and an approximation algorithm can be applied to find a fast    but suboptimal solution     ECT    3  Dynamic Path Adjusting    The system also allows the user to stop the browsing process  look at the image independently    and resume the automatic presentation afterwards     2 4 3 Photo2Video     Microsoft Research Asia    The Photo2Video method  3  appears to be Microsoft Research Asia   s evolution of the Rapid  Serial Visual Presentation  including many new features and options  From the presented systems   it appears to be by far the leading system with the most evolved characteristics  It aims to be more  than just a simple transmoding tool  and targets the capacity of generating musical stories out of  image series  The general system   s flowchart  designed to succeed such features  is presented in    Figure 2 14 and the detailed description of the stages will b
98. rrelevant for this work have been omitted for space reasons       3 3 2 ROI structure    The structure to manipulate ROI information will present the following fields           typedef struct           CvRect  rectangle   CvPoint  ul_point   CvPoint  lr_point   int importance      Roi       The rectangle representing the spatial  Location of the ROI    The upper left point of the sampling  window that centers the ROI      The lower right point of the sampling  window that centers the ROI      Displaying time factor       3 3 3 Trajectory structure    A variable of the type Trayectory will store the interpolated points that link one keyframe to the    following one           typedef struct Trayectory     int n     double ul_distance        double lr distance     CvPoint  curve       Trayectory       Number of points in the array curve     Distance the upper left corner will  travel in this trajectory       Distance the lower right corner will  travel in this trajectory       Array of interpolated points that  conform a trajectory                  3 4 ROI specification files    As already mentioned in chapter 2 3  the ROIs2Video application relies on the external image    analysis and attention object model generation  This work is focused mainly on the video    generation  independently on the semantic values of the regions of interest and therefore will define    a structure for the ROI specification file  which has to be respected by any possible external    209             
99. sageEnvironment xsd  gt       Description xsi type  UsageEnvironmentPropertyType  gt      lt UsageEnvironmentFroperty xsi type  UsersType  gt     SUser xsi type  UserType  gt      lt UserCharacteristic xsi type  UsagePreferencesType  gt      lt UsagePreferences gt      lt mpeg7 FiltermgAndSearchPreferences gt      lt mpeg7 SourcePreferences gt      lt mpeg  MediaFormat preference Value  10  gt    lt mpeg7  Content href  SoyUnTokenFeliz   gt    lt mpeg7 BitRate variable  true  minimum  54000  average  64000   maximum  54000  gt 25 lt  mpeg7  BitRate gt       mpeg7  VisualCoding gt       mpeg 7  Format href  unmpegmpeg  cs Visual CodingFormatC5 2001 1  gt    lt mpeg7 Name xml lang  en  gt MPEG 1 Video lt  mpeg7  Name gt    lt  mpeg7 Format gt    lt mpez7 Frame height  400  width  400  aspectRatio  1  rate  25   gt    lt  mpeg7  VisualCoding gt      lt mpeg7 AudioCoding gt      lt mpeg7 Format href  urnmpeg mpeg  cs AucdioCodingF ormatC  2001 2  gt    lt mpeg7  Name gt MP3 lt  mpeg7  Name gt    lt  mpeg7 Format gt    lt mpeg7  AudioChamels gt 2 lt  mpeg7  Audio Channels    lt mpeg7 Sample rate  44000  bitsPer  16   gt    lt  mpeg7 AudioCoding gt    lt  mpeg7  MediaFormat gt    lt  mpeg7 SourcePreferences gt    lt  mpeg7 FiltermgAndSearchPreferences gt    lt  UsagePreferences gt    lt  UserCharacteristic gt    lt  User gt    lt  UsageEnvironmentProperty gt    lt  Description gt    lt  DIA gt     a  Usage preferences description  The file establishes some of the output preferences
100. scarding all the useless points  This was unefficient         60      and therefore the final solution with arc length reparameterization was    implemented     In first place the simple panning of a camera was programmed  The  panning function initially only needed the origin and destination points     without having to specify the linear trajectory  Based on this function  the    Camera zooming and panning function was programmed   simulation    When the curve interpolation was done through Catmull Rom  the  zooming and panning function had to be changed and receive the curve the    upper left corner of the sampling window travelled        Table 2  Development of the modules    As stated in the table  some of the final modules were developed initially on Matlab for trying  out the results  due to inexperience in the fields  When these modules were found to be correct and  showed up the expected functioning  they were reprogrammed in C and finally integrated with the  rest of the code  These modules were principally the Simulated Annealing sorting module and the    Catmull Rom interpolation     5 2 CAIN Integration    This section will explain the integration of the Image2Video in the CAIN framework as a new  CAT  20   In order to integrate the Image2Video in the CAIN  it is necessary to change the  ROIs2Video application and convert it to the obligatory structure of a CAT so it can be added to  the existing architecture  The result of a CAT creation is a jar file which incl
101. shifting  mode   The dimensions of the sampling window on the picture will vary and adapt to  the size of the displayed ROI  The captured picture will be converted into the video   sized frame using bilinear interpolation  Bilinear interpolation is used instead of other  more elaborate interpolations  as for example bicubic interpolation  because it requires  less processing time and delivers good enough results for a video where each frame is  displayed 40ms and where the human eye   s limitations won   t allow to distinguish so    fine details     e After having travelled through all the ROIs  the virtual camera will zoom out and show  the complete image again  last keyframe   This pattern is called the backward sentence     3      The forward and backward sentence patterns form together the ring sentence  3   Using the ring  sentence  the viewer of the video perceives a general overview of where every element is situated  in the image  before and after the virtual camera zooms in to offer a detailed view  Figure 4 4    shows the keyframes associated to a particular image        Figure 4 4  Examples of keyframes in a ROIs2video sequence    4 5 Sampling window centering    As explained before  the keyframes of the video will be constituted by the complete image  followed by the set of regions of interest  The ROIs will be centred perfectly in the sampling  window  Other similar Image2 Video applications  1  centre faces on the upper third portion of the    window  stating th
102. sks to complete at each point will    be detailed individually in future chapters            s  Specification    1 Roi j  File Initialization        2 Image  Adaptation    3 Keyframe      Extraction  4 Sampling Window    Centering    Video  Resolution         5 Optimal Path  Calculation          6 Camera Speed  E Control  7 Curve    Interpolation    User  Preferences    Simulation       Figure 3 2  ROIs2Video algorithm steps      30     ROI initialization  Read out the ROI descriptions from the specified file or create an    automatic set of ROIs in case of generating a video preview of a photo     Image adaptation  Read the input image and adapt it to the aspect ratio of the video    dimensions  The ROIs may have to be relocated   Keyframe extraction  Selection of the key positions for the sampling window   Sampling Window Centring  Place the sampling windows trying to centre the ROIs     Optimal path calculation  Apply sorting criteria to find a pleasant and coherent order    for flying through the keyframes     Camera motion speed control  Camera motion speed calculation based on the original    image size  experimental observations and on the user   s preferences     Curve interpolation  Calculate an interpolated curve that joins the data points given by    the keyframes and apply speed control to the curve     Camera simulation  Travel the Catmull Rom curve  saving the sampling windows as  equally sized image files  which will constitute the frames of the video  The save
103. smaller images up to about 1500 for  high resolution images   The process of writing the temporary files to disk is a bottleneck to the  application   s performance and slows down significantly the speed of the video generation  The  solution to this problem  which is out of the scope of the project  would be to code the video file  inside the application and disassociate the ROIs2Video program from the Ffmpeg libraries  This  way all the data would be processed in RAM memory and no temporary files would have to be    written on hard disk     The size of the sampling window  that the virtual camera is capturing  grows or decreases    linearly between the two ends of the route        Figure 4 14  Scheme of the camera simulation     Figure 4 14 shows a scheme of the camera simulation showing size variations of the selections  on the original image  The red sampling windows determine the keyframes  i e  ROI positions   while the orange dashed line shows an example of two intermediate frames in which the size of the  sampling window increases towards the value of the last ROI sampling window  The complete    camera path is composed of the following routes      56     e 12  Panning zooming from the whole image to the first ROI  Forward sentence   e 23  Panning between two ROIs that are captured by equally sized sampling windows    e 324  Panning zooming between two ROIs that require sampling windows with    different sizes  e 45  Panning zooming from the last ROI to the whole imag
104. specific type of  visual attention or object detector  The Image2video tool could be used for any possible type of    object  for example cars  trains etc  in particular video surveillance systems     Furthermore  what allows the trading between space and time is the fact that a viewer is only  capable of centering his attention on just one of these regions of interest  because the human being  is dramatically limited in his visual perception faculties  This can be proven following a simple two    step exercise  8      1  Look at the centre of figure x and find a big black circle surrounding a small white square    2  Look at the centre of figure x and find a black triangle surrounding a white square    O  A    m             Figure 2 1  Visual ability test  Image taken from  8      Although you can see all the patterns in the image  your ability to process visual stimuli is  limited and you do not know immediately that the first requested item is present at the lower left  location and that the second requested item is not present at all  In order to perform the requested  task  you have to restrict your visual processing to one at a time  This way  if you obeyed the  instructions and kept your eyes on the central fixation point  you changed your processing of the    visual input over time without changing the actual input     The Image2Video uses this fact and shows each attention object individually one after another   To allow a general overview of the image  the whole ima
105. the  viewer focuses mainly on these attention objects  where most of the information that wants to be    transmitted is concentrated on an image  The most complete set to determine an attention object is     AO     ROI  AV   MPS   MPT     1 lt    lt       where          ith Attention Object in the image           Region Of Interest  which mainly determines the spatial region occupied by  the ith AO   AVi  Attention Value   MPSi  Minimal Perceptible Size of        MPTi  Minimal Perceptible Time of AOi   N  Number of attention objects    As stated in the definition  an attention object needs a minimal spatial resolution and a minimal  displaying time in order to be correctly recognized  When displaying the attention objects of an    image in the generated video  these values have to be taken into account somehow     Generally  if possible  the regions of interest will be displayed in their full original resolution  If  the region   s size compared to that of the display is small  the attention object can be interpolated  and displayed in a greater size  The maximal interpolation will be left to the user  who can decide  and establish his preferences  If he desires to zoom in too much  the attention object may appear    pixelated     In the opposite case  when the attention object is greater than the display  it has to be  downsampled or split in more segments  Faces will not be split  as it is more pleasant for the    viewer when they are presented entirely     2 2 Other approa
106. the previously identified attention objects and solving the possible spatial overlappings  The criteria    are       Face Text integration  Faces and text have completely different semantic values and should  therefore not be integrated together  The authors state that the cases where text and faces    overlap are due to inexact definitions of the bounding boxes of the detected ROIs       Face Saliency integration  A detected face and a detected saliency are most likely to  represent the same ROI  a face  if the face contains a significant part of the saliency  This    condition can be expressed as     area ROI ace    ROI  area ROI    dieu  gt  0 25    saliency           Text Saliency integration  Equivalently  a detected text and a detected saliency are most    likely to represent the same ROI if    area RO p    A ROI is   ER SOS    area  ROI    saliency      Besides the integration of different detected attention objects  in this stage the authors also  assign the order of importance of the attention objects  the attention values  The type of attention    object implicates a certain weight   Watiency O 2  Wrexr 0 35  Wrace 0 45    According to their experiments  the attention value AV  of each object is modified according to    the weight of the type   AV        AV W     Attention objects with a final AV that falls under a certain threshold will be eliminated  while    AOs with higher AVs will enjoy higher priorities in the next stages     3  Optimal Path Generation    In t
107. though the results  reached with Simulated Annealing and the distance cost function are in most cases    very acceptable     The quality of the browsing path is mostly subjective  but it could be tried to find an  objective measure of the quality of a path  for example number of crossings in the  browsing path  and repeat the Simulated Annealing process if the quality is not high  enough  Another option would be to repeat the Simulated Annealing process several    times and pick the solution with the best cost     4  The major drawback of applying the distance cost function relies in the fact  that the  cost function is not influenced by the zoom factor of the ROIs and it is generally not    pleasant if the virtual camera is continuously zooming strongly in and out     5  Future work could include some improvement in displaying large ROIs that don   t fit in  the sampling window  A possibility would be to split those ROIs and scan each ROI  with spatial resolution 1 1 or similar  Some scanning paths are more evident than  others as can be seen in Figure 6 2  where the scanning path to the right is questionable    and should be compared with other options              ROI ROI ROI                                  Figure 6 2  Scanning paths for splitted ROIs  The rectangle with bold strokes represents the  sampling window     272    Using additional annotation and allowing the ROIs to be rotated rectangles  the  simulated camera movement could be improved adding the capability o
108. timedia adaptation engine integrating  complementary adaptation approaches into a single module  Its main target is to adapt content in  the most efficient way  searching a compromise between the computational cost  the quality of the    final adapted media and the constraints imposed by the media formats     As can be seen in Figure C 1  CAIN is divided in three main modules  the Decision Module   DM   the Execution Module  EM  and the battery of CATs available  Additionally  a set of  support modules are necessary  e g  MPEG 7 21 XML parsers      The battery of CATs consists of four categories     Transcoder CATs    Scalable Content CATs    Real time content driven CATs      Transmoding CATs  e g  The Image2Video application        Media Description Multimedia Context Description   MPEG 7  MPEG 21  Content  MPEG 21     4 y Y  E getMedialnfo getMedia getContext i    A               Filter by mandatory constraints            Filter by desirable constraints    Selects CAT and its paremeters   Decision                                  Module  Multimedia CATS id and its CATs capabilities  Content patametass description  Execution CATS  Module Battery                                      Adapted Media  Description    i   Integrator             Adapted Media Description Adapted Media 8  e Cot   MPEG 7 MPEG 21              callback object Application Module    P  OSGI Interface      Proccess        Optional VO                         Figure    1  CAIN Architecture  Image taken from
109. udes a group of files    needed for the execution of the adaptation tool  The needed files are          A mandatory Java class file  with the code to perform the adaptation     Image2 VideoCAT class         A mandatory XML file with the description of the adaptation capabilities of the CAT   Image2 VideoCAT xml CAT Capabilities Descriptor file         Optional files included in the jar file which could be Java libraries  native libraries or  any other resource file needed for the CAT   s execution  In the case of the Image2Video    application  it is necessary to include       All the mandatory files must have the name of the CAT with varying file extensions  for example in  the present case the files Image2VideoCAT class and Image2VideoCAT xml have to be packed in a file  named Image2VideoCAT jar    261    e The OpenCV  Open Computer Vision  library  for a detailed description please  read Appendix D   Because it is not desirable to depend on any externally installed  version of the library and to avoid incompatibilities of OpenCV version changes in    the CAIN     e    shared library generated from the native C code of the ROIs2Video application  with some slight changes of the interface and the necessary adaptations to work    with JNI     Additionally the ffmpeg program  used initially as an external program and invocated before as  a system command  has now to be used through the ffmpegJNI CAT already included in the CAIN    framework     5 2 1 Mandatory Java class file 
110. ut idref  MPEG2 Media   gt    lt  AdaptationModality gt    lt  AdaptationModalities gt    lt  eat gt     5 2 3 Adaptation of the native C code    The native C code has to be modified  so it doesn   t work as a standalone program and can be  called as a function from a Java program  Therefore  the prior main routine is converted to a  function receiving the indispensable parameters from the Java Image2VideoCAT class using JNI      Java Native Interface   The main function is renamed to the generateVideo function with the    following header     The jobjectArray jArray is the variable where the arguments are passed to the native function   It will contain an array of Java Strings  that are converted in the generate Video routine fictitiously  to the int argc and char   argv variables  that were used in the prior main function  This way no  other changes have to be done in the original code  On the other hand  the jstring path contains the    path to the temporal folder     5 2 4 Modification of the Ffmpeg library    During the standalone development of the ROIs2Video tool  the Ffmpeg library collection is  ran through a system command  assuming the Ffmpeg software is installed on the machine and  having installed the latest subversion of the software  Contrary  for the integration of the  Image2Video tool in CAIN  the ffmpegJNI CAT is used to reduce the risk of incompatibilities and  external dependencies on programs which may not be installed  During the change between both  F
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Attuatore per valvole rotative a membrana di dimensione 70 1052  M-Cab 7005009  TDO48取扱説明書  Sistema de Teatro Casero de 5 DVD con Altavoces Posterior  User Manual TSG 95 Pathfindert PAL/NTSC Signal  高解像度DAY&NIGHT IRカメラ SCO  STAP 0 - I  PDF - 内閣府 沖縄総合事務局  La fratrie  1. RADIAZIONE COSMICA........................................................    Copyright © All rights reserved. 
   Failed to retrieve file