Home
        Deliverable 18.3 Notification of Delivery of the
         Contents
1.      3 1  The Documentation Platform       The documentation Platform is responsible of integrating and bringing together all the  components provided within the MAD area  hence it is the core module for building up a  MAD System  It provides the essential features for exchanging data and metadata and  for running the tools provided by the partners involved in the Area  pull logic      The Documentation Platform is made up of a core component  called Core Platform  and  a set of pluggable software processors called GAMPs  where GAMP stands for Generic  Activity MAD Processor  A GAMP is a software component that extracts the metadata  from the digitised material     As mentioned in the Introduction  the Core Platform offers the following main services     e  t implements a Workflow management service  which is responsible for starting  processes in the right order and for resolving dependencies between GAMPs    e It interacts with the component called Essence and Metadata Storage  EMS   system  which stores the audiovisual material sources and the associated  metadata    e It interacts with the component called Concurrent Versioning System  tracking  every change to the metadata operated by the GAMPs  built on a standard CVS  engine     As mentioned in the above section dedicated to the MAD architecture  the enriched  metadata and related materials created within the Documentation Platform are then  delivered by the Publication Platform    The main features of the Documentatio
2.     the date of creation    the associated EDOB    Author   EURIX 08 01 2008 Page 24 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    WorkFlow Monitoring    active Instances       cus 1179RSN14A A MATINA NANNA GOR GMT4 1 ukbhe NANNnkNRIN    Figure 14  a list of active instances     The administrator can then require more information about the selected instance  Figure  15  or the associated EDOB  Figure 16      WorkFlow Monitoring    INSTANCE HISTORY    See GL E      Associated EDOB  itrai 0044 nk0610   SA_ Italian  creation    2007 03 16 14 35 10 242 GMT 1___  arrival from Split 2007 03 16 14 35 10 242 GMT 1___  active a 2007 03 26 17 56 51 994 GMT 2___  fallout 2007 03 26 18 06 31 556 GMT 2    Figure 15  information about an active instance    Author   EURIX 08 01 2008 Page 25 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    EDOB   itrai 0044 nk0610    identifiers  Publications     Identifier  T05188 322   Organisation  Rai   Radiotelevisione Italiana    Type  Program Number     Service  Rai Tre      Time start  2005 07 07 23 53 00  120  s   Identifier  T05188 322    Type  Archive Number    Titles  Contributions       Title  TG3    SubTitle  Primo Piano    Language  IT         Type  Production Company  Contributor  Rai   Radiotelevisione Italiana         Type  Anchor person  Contributor  Giubilei Giuliano         Type  News Reader  Contributor     Type  Editor in chief  Contributor  Di Bella Antonio    Work Info       Figure 16  information abo
3.    Use CLIR Service        Seerch Reset   Advanced Search       News Items list  61 news found in 30 programmes                                     BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 06    a BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 08  f BBC News Newsnight 2005 10 20  I BBC News Six O clock news 2005 10 05  BBC News Ten O clock news 2005 10 05  2 3 4 5     View all       Figure 27  Results of the query   news items     In the news items displayed list  it is possible to see  by clicking on the     icon  a highlight of the news in which the word is found     Author   EURIX 08 01 2008 Page 40 of 51    FP6 IST 507336 PrestoSpace    Deliverable D18 3    PrestoSpace Search Interface       Full Text Search   Bush    Use CLIR Service        iv       C Programmes i News Items       z  BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 06                2 00 Bullettin    BBC 10 0 27Clock News BBC  News 2        we can killed all one went off to meet president bush and some of   the other the this again taken    complacency will late in the glen  eagles grounds a brief moment of drama president bush look like   bright    admits it s not the first time mr  bush s foreign office by  week s end up with his wife laura    Main Category  Programmes  Secondary Category  Question Time    Publication Date  2005 07 06   Duration  0h 2m 36s             BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 08          Figure 28  Results of the query  news ite
4.    VARCHAR 64   INT 10        Figure 17  Tables of the MySQL database used by the Publication Platform    The above relational database describes the data used for characterizing the EDOBs   such as role types  topic types  categories  and the programmes and segmentations  related to the EDOB itself     PART B  Utilization of the Turnkey System       In this part of the Deliverable some details on how to use the Turnkey System and its  components are provided     5  How to use the Turnkey System       Author   EURIX    08 01 2008    Page 32 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    In this section we give details on how to use the components of the Turnkey System   This part can be seen as a sort of user manual for this component     5 1  How to use the ADMIN component       In this section  we describe how to use the web application called ADMIN  which is used  by the administrator in order to monitor the work flow management system     After being logged in the factory  the above operations can be performed by the main  web page of the Documentation Platform section     Private arta    Trani irean tatr  Documentation    nistiorn  TENi i    On line Services    Figure 18  the link to the ADMIN component in the Documentation Platform    The main features of the ADMIN component can be selected by the third link of the main  page  i e  the link called    Work Flow Monitoring     The page below corresponds to this  section     Author   EURIX 08 01 2008 Page 
5.   bob dole talked with one of the others    to ratchet up the pressure on the g  eight he s taking a big risk    two of them landed almost a critical of course itself is the main event to have to convince them  only everyone feels the start of    is already promising w  s  aid to africa and called the deaths from malaria and now he s not giving  anything more global warming i    and he s helped the anyone who had to be content to resume in the japanese prime minister paul  hill had to he stranoly resist next ready for africa subeammitter    Ci 12345                    gt  00 20 26 06          Figure 29  the web page for browsing the selected programme news       ik     E     i gt     In the left part of the page  there is the video section  upper   and the tree structure  showing the segmentation of the programme in news  This segmentation is also shown    Author   EURIX    08 01 2008 Page 41 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    as a timeline  Erreur   Source du renvoi introuvable   and it is based upon a video  analysis performed during the Documentation  Each of the segments describes a single  highlight  and is related to the shots presented in the bottom of the right side of the web  page     Notice that the segment in which the keyword has been found  plain text or named entity   is highlighted and the related shots and transcription are displayed     The remaining  main  part of the page provides several tabs showing       Info  titles  public
6.   only the documented editorial  parts  or both     Profile type  Profile   Compl w         Search amp Retrieve   Simple Search   The simplest task it to find a keyword by a full text search  This is equivalent to find a  word within the EDOBs submitted by the Archive s      PrestoSpace Search Interface       Full Text Search   bush   Programmes    News Items    Use CLIR Service        Search Reset   Advanced Search   Show Query         Figure 25  The Search Interface    It is possible to search among Programmes or News Items  By clicking on the    Search     button the user starts the search   Following the preferences  the page displays the results of the query     Author   EURIX 08 01 2008 Page 39 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    PrestoSpace Search Interface       Full Text Search  Eush Programmes    News Items    Use CLIR Service   vV       Advanced Search    Reset      Programmes list  30 programmes found        TITLE SUBTITLE  DATE DURATION  BBC 10 O 27Clock News BBC News 22 00 Bullettin 2005 07 08 00 38 50  BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 06 00 47 43  BBC News Ten O clock news 2005 10 05 00 36 13  BBC News Newsnight 2005 10 20 00 49 34  BBC News Six O clock news 2005 10 05 00 27 9       He 3 4 5    Mew all    Figure 26  Results of the query  programmes     Selecting a query of news items  the page displayed looks like this one     PrestoSpace Search Interface    Full Text Search   Eush    Programmes      News Items 
7.  Bianca      The ontology representation for this entity is via a single id   i e  an Uniform Resource Identifier    URI      that is for its nature language independent   This realizes a systematic and consistent approach to multilingual indexing and  searching     4 2 3  The MySQL database   It provides a data set of the EDOBs published and the related METADATA                       categories classifications identifiers  id_category INTO  id code VARCHAR 4  id identifier YARCHAR 4   description VARCHAR 45  class_description VARCHAR 64  description VARCHAR 45   code VARCHAR 80     english_description VARCHAR 50   broadcast_cpy VARCHAR SO                                role_types roles_programmes topic_types topics_segments segmentation  id role VARCHAR 4  edob YARCHAR 255  id topic INT 10  segment VARCHAR 255  edob VARCHAR 255   description VARCHAR 64  role YARCHAR 4  label YARCHAR 45  topictype INT 10  mediatimepoint VARCHAR 18   value VARCHAR 255  VARCHAR 45  mediaduration VARCHAR 18   in  id segment INT 10   main_category INT 10   secondary_category INT 10   header _id VARCHAR 255     header_duration INT 10   programmes identifiers_programmes first kf umid VARCHAR 255   id_edob VARCHAR 255  edob VARCHAR 255   ss  title YARCHAR 255  identifier VARCHAR 4    subtitle VARCHAR 255  value VARCHAR 45   language CHAR 2   service_name VARCHAR 255   event_date DATE    edit_classifications_code YARCHAR 4   pev_classifications_code VARCHAR 4     filename  url  duration    VARCHAR 64
8.  Content Analysis formats  are the input for the  Publication Platform  The Publication Platform provides a web interface for searching  and retrieving information produced by the Documentation Platform  This web interface  is the output of the Turnkey system     Author   EURIX 08 01 2008 Page 23 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3 5  The ADMIN Component       The Turnkey System also comprises an additional component  called ADMIN  that can  be used in order to manage the operations of the Paltform    The ADMIN component consists in a web application  which allows the administrator to  manage and control the work flow activities     As an example  in Figure 13 the web page summarizing the status of the work flow  system is presented     The project     Presentation WorkFlow Monitoring        Links  Private area  All Processes x  Ok      lls SUMMARY STATUS REPORT    On line services    0 Pubblication  platform    Eurix  documentation       Blog        Backend          Figure 13  the web page of the Work Flow Monitoring System     The administrator can then obtain further information on a specific work item  For  instance  in order to take a look at the active items  the administrator can click on the  number of active items in the table called GAMPManager  The ADMIN interface will  show the list of active instances being processed by the work flow manager  for each  instance  three information are given       the identifier of the active instance 
9.  EURIX 08 01 2008 Page 9 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public       Root element wrapper Ad hoc structures    P_META sets    MPEG 7 profile nodes          Figure 4  a schema of the MAD document format     Concerning the PrestoSpace project  given the audiovisual items produced by the  preservation and restoration units  we need to develop a software component able to  document and deliver them  This component is called MAD  standing for Metadata  Access and Delivery  The MAD component provides the software modules for  documenting and delivering audiovisual information  and it is made up of pluggable  GAMPs  Generic Activity MAD Processor  connected to a core Platform for automatic  features extraction     2 1  Architecture of MAD       As mentioned in the Introduction of this document  the MAD Platform is the component  of the PrestoSpace project having the following objectives     1  extracting metadata from audiovisual items   2  offering suitable mechanisms for retrieving and accessing audiovisual  contents based on metadata     In order to achieve the above goals  the MAD platform adopts a modular  extensible    architecture  In detail  it receives in input the digitised media  video and audio files   produced by the Preservation and Restoration units  then it produces several materials     Author   EURIX 08 01 2008 Page 10 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    like key frames  camera motions  and metadata  as output  The
10.  The Publication Platform   architecture    The platform architecture is based on three main components  a web application to allow  user interaction  a database  MySQL  to store data about available programmes and so  to make easy searching and selections  the KIM platform  provided by Ontotext  to  perform semantic functionality through semantic analysis of speech and full text indexing     The Publication Platform is delivered as a web archive  Deployment is performed by  posting the web archive into the servlet container of the used web server  After  completed the deployment phase  it   s possible to set up the platform  launching an ant  build file released within the web archive     Author   EURIX 08 01 2008 Page 19 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3 2 1  The Web interface    The Publication Platform provides a web interface for searching and retrieving  information produced by the Documentation Platform   The entry point for queries is the form shown in Figure 10     PrestoSpace Search Interface    Full Text Search  jbush i Programmes    News Items    Use CLIR Service   _    Search   Reser        Figure 10  the Search Interface       Advanced Search Show Query    Basically  the user can submit a keyword and start the search among programmes or  news  searching by contribution  title  publication date  publication service  topic and  named entities for semantic queries  i e  programmes news which contains Persons   Places  and so on     T
11.  e learning lessons  As another example  a soccer fan could be interested  in digitising and retrieving a set of a hundred of matches stored on VHS tapes    This kind of users essentially have the following requirements     e they aim to digitise and access small size archives    e they do not have their own Publication Platform  i e  typically they need a complete  software performing the functionalities of the MAD Platform     In order to satisfy the requirements of this kind of users  a software component called  Turnkey System has been developed  The Turnkey System is a lightweight system  specifically tailored for small size archives  It is made up of both the Documentation and  the Publication Platform with customized features  that is to say it is a fully automatic  system for content enrichment and web publishing searching  Big size archives should  use subparts of the Turnkey System because they have their content management  systems and web search and publication features     The Turnkey System is represented in Figure 1     Author   EURIX 08 01 2008 Page 4 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public          Turnkey  System    Documenta Publication  Platform Platform    Automatic Permanent b    Analysis Public Access  Work Flow control Search  Transient Storage       Figure 1  the Turnkey System     The Documentation Platform is made up of a core component  called Core Platform  and  a set of pluggable software processors named GAMPs  where GAMP st
12.  the GAMP can be implemented by means of a Generic GAMP     The Documentation Platform provides a Generic GAMP  a Java component that can be  used in order to build a GAMP component in an easy way  In order to add a new GAMP  to the Documentation Platform  the following steps have to be performed     e the new metadata extracted by the GAMP must be MPEG7 compliant     e in order to deliver the new kind of information  the Publication Platform has to be  modified  taking this new metadata into account     e anew queue related to the new GAMP must be added to the Documentation  Platform     Obviously  one can think of replacing an existing GAMP with another one  producing the  same kind of matadata  This could be the case in which a more efficient implementation  of the GAMP is provided  In this case  one can replace the existing GAMP with the new  generic GAMP  making use of the same queue and producing a specific metadata   rather than assignining a new queue to the GAMP within the Platform     A broader discussion on how to use the generic GAMP is presented in section 6 2 of this  deliverable     3 2  The Publication Platform       The Publication platform will provide retrieval and browsing functionalities regarding the  essence elaborated within the MAD Platform     Author   EURIX 08 01 2008 Page 18 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    TOMCAT             Publication Platform webapp                              Lucene Sesame             Figure 9  
13.  the camera motions  zooms  and changes at editorial level  i e  a  different editorial part     The technical information related to the camera are displayed as coloured rectangles  with an area that extends from the starting point ant until the end of the camera  motion zoom    The displayed information are  camera pan  left     right   camera tilt  up     down  and  zoom in and zoom out events     Author   EURIX 08 01 2008 Page 45 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    Related Sources Tab    i BBC NEWS   Programmes   Question Time   This wee    amA    File  Modifica visualizza Preferiti Strumenti 7  f kd     x  A vs i a Cerca Sy Preferiti        Pe           1 a                aoe     Info  Transeriptiom             paca 8              111   BBC In Depth       i q BBC NEWS   QUESTION TIME A      BBC Sporn Home             hics  ersion   Change to UK Wersi           Question Time   About the Show   Cl    David Dimbleby   FAQs              Editions              i4   BBC  Programmes            This week s panel in Africa     Operazione completata   Internet       Abstract  BBC NEWS   QUESTION TIME    Subject      Gatwick barb hoaxer behind bars   TIMELINE    beat    Figure 36  Related Sources Tab       In this section the user can browse the related news founded by the Documentation  Platform and see them on a new window clicking on the link in the upper left part of the  panel showing the news itself     Advanced Search  Clicking on the Advanced Sear
14.  the turnkey system  can be seen as a unique component  containing a Documentation Platform and a  Publication Platform  together with a    small    orchestrator  PSO   whose role is to  coordinate the two components  Even if the functionalities of the Documentation Platform  and the Publication Platform are also described in deliverables  D18 1  and  D18 2   this  document is self contained     This document is made up of three main parts  The first part  part A  recalls the  architectures of MAD  which is implemented by the unique component called turnkey  system  The second part  part B  gives detailed information on how to use the turnkey  system  The third part  part C  concludes the deliverable by presenting information about  legal aspects and licensing issues about the usage of the software     1 2  Executive Summary       Recently  broadcasters have rediscovered the value of their audiovisual archives   Moreover  recent researches have shown that approaches meant to the recovery and  availability of archived materials may produce consistent cost savings in the overall  programme production processes    In order to achieve this goal  it is essential to adopt metadata     Metadata can be defined as    Data about data     that is to say those information that  describes  or supplements  the main  or central  data  Concerning the broadcast archives  scenario  this entails finding which information schemes are needed in order to make  archive users able to retrieve audiov
15. 33 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    WorkFlow Monitoring    All Processes v  Ok      SUMMARY STATUS REPORT    AMORIN ccssie seed Despre Oy sees Basreers ot EEA Vane  Wert enennns ot E A E  1 rana Beets Oo  nn  a  I oo pE M eca eens   E  eae  ee ee  CA speech lll aE DT aym O  R  M aen poa  CA_mediaAnalyse     ARS   SASAR   EE  a DN ce  CA_LexicalSegmenter   ONOR  S    PE  RNS  EMN  CA mmStructure d Mi occu    De rl OF oe a M pe scorch Oo SA  SA English oaser De recs    AEE RARA  Pressi Ea Decani Pass  SA HT aiea KOROR  AS Op E EE  EOSS  ATENE ua  Publication annn D a D oa ea Oo  nn  O  EditorialPartSegmenter 0 0 0 0 i  0    Refresh    Figure 19  the Work Flow Monitoring page    Two tables are presented     Author   EURIX    the GAMP manager table  summarizing the status of all the items involved  in the system  namely the items running  active  completed  terminated and  suspended  By clicking on each number  the list of items of the selected  category is presented  As an example  Figure 20 shows the list of active  items     the Workitems table  whose rows contain the different activities that can be  performed on a work item  Annotation  Welcomer  CA_shots  and so on    whereas the columns contain the status of the items  namely active   inactive  completed  suspended  blocked  fallout  By selecting each item of  the table  the ADMIN component also offers the opportunity to check all the  information about the specific work item  monitoring its 
16. D Platform     In order to satisfy the requirements of this kind of users  a software component called  Turnkey System has been developed  The Turnkey System is a lightweight system  specifically tailored for small size archives  It is made up of both the Documentation and  the Publication Platform with customized features  that is to say it is a fully automatic  system for content enrichment and web publishing searching  Big size archives should  use subparts of the Turnkey System because they have their content management  systems and web search and publication features     The Turnkey System can be represented as shown in Figure 7           Turnkey  System    Documenta Publication  Platform Platform    ary    Permanent Storage  Public Access  Search    Automatic  Analysis  Work Flow control  Transient Storage       Fig  7  the Turnkey System     In the rest of this section we will focus our attention on the two main components of the  turnkey system  namely the Documentation Platform and the Publication Platform  These    Author   EURIX 08 01 2008 Page 13 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    components are also described in specific deliverables  D18 1  and  D18 2    Furthermore  in section 3 3 we discuss about the main differences between the turnkey  system   s architecture and the one of a    standard system     comprising the Documentation  Platform and the Publication Platform coordinated by an essential  restricted   implementation of a PSO
17. FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    esito PME  Did Ce Information Society FRAMEWORK       Technologies       Deliverable D18 3    Notification of Delivery of the Turnkey System       DOCUMENT IDENTIFIER PS _WP18_EURIX_D18 3_TurnkeySys_v2 0  DATE 08 01 2008  ABSTRACT This document is a notification of delivery for the Turnkey system  The turnkey    system is a lightweight system specifically tailored for small size archives  and  implements a complete MAD unit comprising the functionalities of both the  Documentation Platform and the Publication Platform    KEYWORDS metadata  web services  modular components  digitisation  metadata extraction   multimedia data access  multimedia data delivery  software integration  work flow  management system    WORKPACKAGE   TASK WP18   AUTHOR  COMPANY W  Allasia  A  Damiani  S  Ridolfi  F  Toscano  M  Vigilante  euriX Group  NATURE Prototype   DISSEMINATION Public    DOCUMENT HISTORY    Release Date Reason of change Status Distribution  0 1 2004 02 27 First Draft Living Confidential  1 0 2004 12 20 Working Draft Living Confidential  1 1 2005 01 24 Release Candidate Living Confidential  1 2 2007 06 24 Release Candidate Living Confidential  1 3 2007 11 16 Release Candidate Living Confidential  1 4 2008 01 08 Final release Closed Confidential  2 0 2008 02 25 Document made public Closed Public    Author   EURIX 08 01 2008 Page 1 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    Contents Table    Det o HPO CMON e
18. FTP  SMB  file  and so on      It could be the case that a GAMP operates in a wrong way  thus wrongly updating files  and or metadata stored in the EMS  In order to avoid this situation  therefore ensuring  that a consistent  sound version of the set of information is always available to the  factory  the Core Platform also interacts with the component called Concurrent  Versioning System    The Concurrent Versioning System  CVS  tracks every change to the metadata that  takes place during the execution of the GAMPs  It is build on a standard CVS engine   In this way  if an unacceptable update has been performed by a GAMP  then the Core  Platform asks the CVS to perform a rollback to a consistent version of the system     It is worth noticing that the EMS and the CVS are components directly managed by a  specific  limited implementation of a PrestoSpace Orchestrator  PSO  see  D19 0 2  for  details   which can be intuitively seen as a coordinator of the activities of the Turnkey  system  The Core Platform of the Documentation Platform only interacts  via web  services  with EMS and CVS as mentioned above     Let us conclude this section with a brief remark  It is worth noticing that the interaction  between GAMPs and the Core Platform is based on web services  This implies that  GAMPs can be developed by using different programming languages supporting SOAP  and deployed on totally different platforms and operating systems     Author   EURIX 08 01 2008 Page 16 of 51    FP6 IS
19. MP polls its own queue in the Core Platform  If it founds some  work to do  i e  an active job belongs to the GAMP   s queue  then the GAMP gets its job  and starts its process  As a second step  2   the GAMP asks the Core Platform to  checkout the EDOBs related to the job in analysis  Before starting its own metadata  extraction  the GAMP also needs to retrieve all the files linked to the EDOBs  to this aim   another invocation to the Core Platform is performed  3   In order to physically retrieve  the requested file  the Core Platform asks the EMS     At this time of the process  the GAMP has all the information needed to perform its own  metadata extraction  4   This process could require to store additional files and or  metadata in the factory  in this case  the GAMP asks to insert new material  5  by means  of a request to the Core Platform  The Core Platform forwards the GAMP   s request to the  EMS component  The metadata  EDOB  built by the GAMP are then registered on the  Core Platform  6   Finally  the GAMP notifies the Core Platform that the elaboration of  the current job is over  7      The communication between a GAMP and the Core Platform is based on web services   and it is performed by exchanging XML documents  As an example  suppose that a  GAMP needs a specific file in order to perform its process  in phase 3  it asks the Core  Platform to retrieve this file  by forwarding this request to the EMS component   The  result of this request consists of an XML d
20. STOSPACE_HOME  shared prestospace GenericGAMP linux  LIB_DIR  shared prestospace GenericGAMP linux lib    USAGE   ClGamp sh  XMLConfigFile    h    h CHECKOUT_EDOB    h CHECKIN_EDOB     h NOTIFY    h GET_JOB    h GET_MATERIAL    h INSERT_MATERIAL     where   XMLConfigFile is an XML file defining a GAMP Operation     To see all xml templates run  CiGamp sh  h     h option to read all xml templates    h CHECKOUT_EDOB option to read xml template for CHECKOUT_EDOB   h CHECKIN _EDOB option to read xml template for CHECKIN_EDOB    h NOTIFY option to read xml template for NOTIFY    h GET_JOB option to read xml template for GET_JOB    h GET_MATERIAL option to read xml template for GET_MATERIAL    h INSERT_MATERIAL option to read xml template for INSERT_MATERIAL    QUEUES for GET_JOB command   Welcomer  Demux and so on     CA_shots  ContentAnalysis jobap   CA_speech  ContentAnalysis jobap   CA_mediaAnalyse  ContentAnalysis jobap   CA_ other  ContentAnalysis jobap   Restoration   SA_generic  SemanticAnalysis jobap   SA_other  SemanticAnalysis jobap   Annotation   Delivery    5 3  How to use the Publication Platform   s Interface       In this section we describe how to use the interface of the Turnkey System  namely the  interface provided by the Publication Platform     Author   EURIX 08 01 2008 Page 36 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    This part can be seen as a sort of user manual of the Turnkey System  It is quite obvious  that this interface is a restr
21. Servlet 2 4 Specifications and JSP 2 0  Specifications  The design takes advantage of the MVC pattern to separate presentation  logic and business logic  The Jakarta Struts Framework has been adopted in order to  implement the controller layer  which takes into account the task of the business control  flow  mapping user request with business operations of the model layer     Author   EURIX 08 01 2008 Page 30 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    In order to perform searching and selections on programmes and news items  the  platform is supported by a database  storing information from metadata  e g  titles  roles   descriptions  publishing dates  services etc    The connection between the web  application and the database management system is provided by the JDBC support  So  it s quite easy to change the DBMS    The KIM Platform  provided by Ontotext  is integrated into the Publication Platform in  order to provide semantic analysis capability  To give more details  the KIM Platform  consists in a system based on three components  Lucene  Sesame and Gate  Together   they allow searching about semantic content of the programmes  through simple queries  formulated as sentences with subject  action and target     The Publication Platform is delivered as web archive  Deployment is performed by  posting the web archive into the servlet container of the used web server  After the  deployment phase is completed  it is possible to set up the platform  laun
22. T 507336 PrestoSpace Deliverable D18 3 Public    3 1 2  Generic Activity MAD Processors  GAMPs     The Documentation Platform is able to connect the components provided by the partners  involved in the MAD area  namely to so called GAMPs  Generic Activity MAD Processor    The GAMPs are software units that extract metadata from the digitised materials  The  Core Platform maintains a queue in the workflow for every GAMP  which will poll it in  order to become aware of any activity to be done  In order to achieve their goals  the  GAMPs ask the Core Platform for the materials and the related  associated  metadata  produced up to the request time     The Documentation Platform makes use of three different kinds of GAMPs  namely     e Content Analysis tools  e Semantic Analysis tools  e Manual Annotation tools    The basic idea of the turnkey system is that it can be executed even on a small  machine  therefore  the turnkey system only makes use of some GAMPs  Obviously  if  the turnkey system is installed on a small machine  only the GAMPs performing a     lightweight    metadata extraction are used    Here below is the list of actually implemented GAMPs      e Welcomer  demux MXF  RAI    e Semantic Analysis  University of Sheffield  University of TorVergata   e Annotation GAMP  JRS    e Shot boundary detection tools  RAI  content analysis    e Key frame detection and extraction tools  JRS  content analysis   e Stripe Images extraction tools  JRS  content analysis    e Camera mot
23. ands for Generic  Activity MAD Processor  A GAMP is a software component that extracts the metadata  from the digitised material     The Core Platform offers the following main services     e Workflow management service  responsible for starting processes in the right  order and for resolving dependencies between GAMPs    e Interaction with the Essence and Metadata Storage  EMS  system  which stores  the audiovisual material sources and the associated metadata    e Interaction with the Concurrent Versioning System  tracking every change to the  metadata operated by the GAMPSs  built on a standard CVS engine    e Delivery of enriched metadata and related material created by the GAMPs within  the Documentation Platform     EMS and CVS are two components of the    small    PSO of the Turnkey system  They are  used in order to manage the storage of materials within the factory and track different  versions of these materials  respectively    The main features of the Documentation Platform can be represented as shown in  Figure 2     Author   EURIX 08 01 2008 Page 5 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public       EDOB   Rich MPEG7   Content    PMETA    DC       TEY  Sia  E  99  ae    s  s     s  s  s  s  6  s  s                                                                                                                         GAMPs             Fig  2  the Documentation Platform    The overall services offered by the Core Platform are available through web se
24. ations  contributions and identifiers    legacy data      Transcription  the entire text converted from speeches  the user can do a textual  search       Semantic analysis  using KIM facility     see section 4 2 2        Content analysis  stripes and camera motion  if extracted during the  documentation       Related sources  correlated news from external web sites     Info tab  legacy data    This tab shows legacy data  Titles  title  subtitle  title language   Publications data   duration  organisation  channel  date of first publication   Contributions  like production  company  news reader  editor in chief etc   and Identifiers related to the source archive   programme number and archive number                           f Info    Transcripti Semantic Analysis Content 4  Titles E  Publications  Contributions  Identifiers             Figure 30  legacy data     The Info tab    Transcription Tab    Author   EURIX 08 01 2008 Page 42 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public       Info   Transcription   Semantic Analysis l Content Analysis l Related Sources   iQ bush  00 20 21 about it it s of all the way of kicking off a major international summit but tony blair s linked up Va  EE i with bob dole talked with one of the others  00 20 31 to ratchet up the pressure on the g  eight he s taking a big risk  ikk 1 two of them landed almost a critical of course itself is the main event to have to convince them  10 20 35    only everyone feels the start of  00 20 44 
25. ch button the user can submit queries more complicated  than a simple full text searching task    PrestoSpace Search Interface       Full Text Search      programmes     News Items    Use CLIR Service  V    Search by    Gontribution Ea Add    Search   Reser   Hide Advanced Search    Figure 37  Advanced Search       Show Query       The user can use filters searching by Category  only for News items   Contributions   Named entities  Publication Date  Publication Service  only for Programmes  and title     Author   EURIX 08 01 2008 Page 46 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    PrestoSpace Search Interface    Full Text  Search   Buch f Programmes  amp  News Items  Use CLIR Service   Search by     Seerch Show Query           Figure 38  Filters of the queries in the Advanced Search    Clicking on the    Add    button the filter is added to the query  It is also possible to make  logical operation on the filters of the same type  AND OR      PrestoSpace Search Interface       Full Text Search   Bush C Programmes    News Items  Use CLIR Service      Categories    AND     Forsign affairs  i    Adaf Remove  Search by  Contribution   Add    Search   Reser   Hide Advanced Search f Show Query               Figure 39  Filter of the query    In the following Figure it is selected to search any BBC News that contains the word     bush     published in the January of 2005  containing the Person    Bush    OR the Place New  York     PrestoSpace Search Interface    Fu
26. change to the metadata  that takes place during the execution of a GAMP  Itis a  component of the PSO    Enhanced Medatata and structuring information that are generated   Metadata within the map factory  in the view of improving the  accessibility to digitised contents    GAMP This label stands for Generic Activity MAD Processor     which represents the Generic Client communicating to  the MAD Core Platform  using SOAP  WebServices   protocol        Generic GAMP    Software component which allows to simplify the  creation of anew GAMP        MAD Factory    Facilities where massive documentation  metadata  enhancement  and preparation of publication for  audiovisual contents are performed        Mass Storage    Storage solution in which all the assets  programs   recordings etc   are kept on common media  disks   tapes etc   and access is managed through a file  management system                             Publication The component of the MAD Factory which is   Platform responsible of delivering enriched audiovisual contents    Preservation Facilities where massive A to D migration of audiovisual   Factory contents is performed    PSO This is the PrestoSpace Orchestrator  which is the  administrator Of the PrestoSpace factory  coordinating  all its components PRE  RES  and map    Queue Filler Software component used to test the Documentation       Platform  It gives the opportunity of inserting jobs in  the GAMP   s queues           Turnkey system       Complete name  Turnke
27. ching an ant  build file released with in the web archive     4 2 2  The Kim Platform    The KIM Platform provides a novel Knowledge and Information Management  KIM   infrastructure and services for automatic semantic annotation  indexing  and retrieval of  unstructured and semi structured content    As a base line  KIM analyzes texts and recognizes references to entities  like persons   organizations  locations  dates   Then it tries to match the reference with a known entity   having a unique URI and description  Alternatively  a new URI and description are  automatically generated  Finally  the reference in the document gets annotated with the  URI of the entity  This process is called  as well as the result  semantic annotation  This  sort of meta data can be used for indexing  retrieval  visualization and automatic hyper   linking of documents    For the purposes of semantic annotation  indexing  and retrieval of documents  KIM also  uses a seed knowledge base  KB   The knowledge base  KB   in this context  is a body  of formal knowledge about entities  representing non ontological formal knowledge  It  consists of instance data     descriptions of entities and their interrelations  i e  for each  entity  the KB contains information about the entity   s type  aliases  incl  a main alias    official or well known name   attributes  and relations  The KIM KB provides coverage of  popular real world entities of common interest  which are considered well known and  thus not 
28. dressed  The principal aim is to prepare the way for preservation factories providing affordable services to all kinds of collection     custodians in order to manage and to allow access to their assets     Material Publication       Platform  Administration          d Solution for Audio visual preservation and access   web Application Developed by www curixgroup com   prestospace eurixit      Internet A100          Figure 23  The Publication Platform Web Interface    The user can access the documented programme news or manage the user accounts     5 3 2  Platform Administration    By means of the web application  a user can perform the usual administration activities   namely     e change its own password  e access to the administration of users    e log off the system   The web page is shown here below     Platform  Administration  Change password  Users administration  Log off    ooo    5 3 3  Material Publication    Preferences    Author   EURIX 08 01 2008 Page 38 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    Navigation  Results per page    na           Language settings  Input Language  IT  EN  F      Output  Language IT i EN ie  i Use CLIR Service iw    Profile type  Profiles  Completa t  wey       Figure 24  Preferences    The user can set the number of results displayed in a page  default  5  and the type of  information  In fact  it is possible to choose among technical  only key frames  camera  motions and other technical information   journalistic
29. e summarized as follows  first  we recall the  architecture of MAD  introducing the turnkey system  which is a fully implementation of  the MAD functionalities  part A   second  we describe how to use the turnkey system   part B   A third part  part C   containing a glossary and some information about licences   concludes this deliverable     Author   EURIX 08 01 2008 Page 8 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    PART A  Architecture and Implementation  of the Turnkey System    2  The MAD Platform          In the recent years  broadcasters have rediscovered the value of their audiovisual  archives  Moreover  recent researches have shown that approaches meant to the  recovery and availability of archived materials may produce consistent cost savings in  the overall programme production processes    In order to achieve this goal  it is essential to adopt metadata     Metadata can be defined as    Data about data     that is to say those information that  describes  or supplements  the main  or central  data  Concerning the broadcast archives  scenario  this entails finding which information schemes are needed in order to make  archive users able to retrieve audiovisual items with effective levels of accuracy     Researches within the PrestoSpace project have determined that the required  information for a typical audiovisual archive exploitation processes can be partitioned in  the following fundamental classes     e Identification information  such as 
30. ed as an argument     4  elaboration  the GAMP performs its own metadata extraction    5  if the GAMP  during the elaboration of phase 4  has generated some new files   then it needs to store them in the factory  In this case  the GAMP asks an  insertMaterial XMLDocument  to the Core Platform  XMLDocument contains  all information on the new files  The Core Platform forwards this file to the  EMS in order to make it available to the factory     6  the GAMP asks the Core Platform the checkinJob XMLDocument  in order to  register the EDOB  the argument XMLDocument  produced by the GAMP     7  The GAMP notifies the Core Platform that it has successfully concluded its  work  This is made by an invocation of notifyJob XMLDocument      4 1 3  Man Machine  Control  Software  Files and database  interface    4 1 3 1  Man Machine interfaces    The interaction between human beings and the Documentation Platform is performed by  means of the ADMIN component    In section 3 6 we have mentioned that the Documentation Platform comprises an ADMIN  interface  which is a standard web application  The ADMIN component offers a GUI for  adding annotation and representative information to the Metadata  The web interface  offers these main functionalities     1  submitting essences and providing the metadata   2  managing the work flow and for controlling the entire work cycle     3  metadata browsing  a web interface for browsing the essences and the metadata  is provided  It will be useful for g
31. eeeeeeeeeeeeeeeeeteeeeeeeeeeeeeeees 33  5 2  How to use the generic GAMP ssi esas cris dees cee tr ecasuies Cicucecachdarenaduakcrmeeeehabead ede 35  5 3  How to use the Publication Platform   s Interface    36   PARE C  CONnCIUSIONS cic ccissedeectecocttcisisteccettociensd loti astetentcet arabes ucatinieteedseieueanitiees 49   6 gt  Licensing ace e ce be econ ha eee eaten ted EE been eee Aa eet A 49   7e VEIDNOQLADIN atta tn Ae ne ta ata a ta ee cal Ant ty ate ea Rtas 49   8  Glossary Ree Rear Aa E E A EE eee REACH ETE eT 50    Author   EURIX 08 01 2008 Page 2 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    1  Introduction       1 1  Scope of this Document    This deliverable is about the MAD component of the PrestoSpace factory  in particular it  is focused on a software component called turnkey system  The turnkey system is a  lightweight system specifically tailored for small size archives  and implements a  complete MAD unit comprising the functionalities of both the Documentation Platform  and the Publication Platform        The tools here described are part of the MAD Unit  After a brief recall of the architecture  of the Metadata Access and Delivery component  MAD   the turnkey system is described  in detail    This deliverable is a part of a three piece product  which also includes the deliverables   D18 1  and  D18 2  describing the other MAD components  the Documentation Platform  and the Publication Platform  respectively  As mentioned here above 
32. erable D18 3 Public    e arelational DBMS that stores information related to the available programmes   e a text search and indexing engine  Lucene     KIM   comprising a semantic engine  for processing natural language queries     The searching interface of the Publication Platform offers several searching approaches   and the user can choose to apply for a programme or a news item  which can be filtered  by programme title  broadcast date  authors  topics  and so on     The user interface presents a video preview  currently making use of Windows Media  Player  This is the only feature written specifically for Internet Explorer     A schema of the Publication Platform is shown in Figure 3        Rich Web  Content interface      Publication  a  Platform                 asf                   Fig  3  the Publication Platform    The Publication Platform provides a web interface for searching and retrieving  information produced by the Documentation Platform     As mentioned above  this document presents the architecture of the turnkey system   describing its components in detail  The Documentation Platform and the Publication  Platform are also described in deliverables  D18 1  and  D18 2   respectively  however     Author   EURIX 08 01 2008 Page 7 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    this document is self contained  then users interested in the turnkey system can restrict  their attention to this deliverable only    The structure of the document can b
33. etting a quick view of the work done     4 1 3 2  Control Interfaces    The Core Platform provides a control interface based on SOAP protocol  We are  currently analyzing SNMP protocols for controlling the machines involved     4 1 3 3  Software interfaces  The Core Platform provides Web Services interfaces in a WSDL format     Author   EURIX 08 01 2008 Page 29 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    The main interfaces are   1  PSO    2  EMS  3  Work Flow  4  Admin    5  Browse    Some features could be provided as Java Web Start applications  published on http  jnlp  mime type      4 1 3 4  Files and Databases interfaces    Files can be accessed by the following protocols provided by the Core Platform  by  means of an interaction with the SO component called EMS        file        smb     samba     ftp        http s        soap       aR O N      4 2  Implementation of the Publication Platform       In previous sections we have introduced the Publication Platform  which is a web  application allowing users to retrieve and use audiovisual information within the Factory   In this section we describe in detail the software components used by the Publication  Platform in order to perform its functionalities     4 2 1  Physical environment    The web application is developed on JDK 1 4 2   using Java web technologies  It needs a  web server with serviet container to run  as Jakarta Tomcat 5 5  but it   s possible to use  any web server compliant with Java 
34. eurixgroup  com     jubilation in london and in singapore the olympic race is one of britain   s vision all of reconnecting inexplicable twins the day  adventist is rejoice wasn t often industrial that you saw them touch their inability to embrace next this is an interesting but    inkatha is the longstanding favors to silence dismay we doubled the reaction in singapore continent also tonight the world   s  richest nations gathered for the g  eight summit today needles but __        a navy seal and the new stuff in some places the olympic dream becomes reality just don t want  mand          prestospace eurkgroup com       a navy seal and the new stuff in some places the olympic dream becomes reality just dont want mandates behind the scenes  with the capitol thanks weinstein in singapore     good evening the olympic games is will come to london and twenty twelve off the tennis  coaching ses       prestospace eurixgroup com    good evening the olympic games is will come to london and twenty twelve off the tennis coaching session earlier today london    eventually be paris by just full of votes the building looking east london will stop almost immediately that the gangs that into the  city for the first time since nineteen forty eight the voting session was held in singapore until that moment on is that            Figure 12  RSS export feature    Author   EURIX 08 01 2008 Page 21 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3 3  Communication between the Docu
35. evolution within the  work flow process  An example is presented in Figure 21     08 01 2008 Page 34 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3    WorkFlow Monitoring    active Instances    i 2007 03 13 15 51 52 851 GMT 1  2007 03 29 17 55 51 237 GMT 2          An Integrated solution for Audio visual preservation and access    home  gt  private area  gt  docurmetation platform  gt  workflow monitor    pso  you re logged    logout       The project  Presentation WorkFlow Monitoring  Links  WORKITEM ACTIVE REPORT   GAMPManager  Private area   Documentation  platform   2007 03 29   cus_1175183751 24 Annotation 17 55 51 237 atorf 0048 nk0702  On line services a ee ne eae   2007 03 29   cus_1175183780 6 Annotation 17 56 20 599 ukbbc 0004 nk0610  Pubblication Sh Aa ec ee i te SAR 5  POEN i en GL ae ttt  platform 2007 03 09   cus_1173428377 51 Annotation 09 19 37 505 itrai 0046 nk0610  Eurix mins EEE cree EE teats oem AATA EENE   i Me 299  tripe sent atte eR  documentation  Blog  Backend    Figure 21  information about a work item    5 2  How to use the generic GAMP       Public       Author   EURIX 08 01 2008    Page 35 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    In this section we describe in detail how to make use of the Generic GAMP in order to  add a new GAMP component to the Documentation Platform     Here below are the instructions about the usage of the Generic GAMP via command line     JAVA_HOME  usr lib java   JAVA  usr lib java bin java  PRE
36. explicitly introduced in the documents  Most important and used entities in the  KIM KB are geographic names and organizations  The entities that represent  geographical features are imported from GNS  GEOnet Names Server  and other  sources  They are organized so as to represent instances of Location  and its  subclasses  having the property subRegionOf as it is applied between Continents   GlobalRegions  Countries  and other subclasses of Location  Some of the subtypes of  Location  contained in KIM KB are Country  Province  County  CountryCapital  City   Ocean  Sea  etc  The locations are given together with several of their aliases  including  in English and French  as well as with their geographic coordinates  Long Lat   the  designator  DSG  and Unique Feature Index  UF   according to GNS  All this provides a  useful basis for cross linguistic querying and retrieval  The entities in the KB are derived  or collected from various sources like geographical and business intelligence gazetteers   As a part of the Publication Platform  the KIM engine supplies an indexing of the  EDOB   s metadata     Author   EURIX 08 01 2008 Page 31 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    The role of KIM in MAD is to provide a language independent representation for Named  Entities as a specific metadata common to the two languages  As an example  consider  that the    White House    is translated in other languages  e g  in Italian the correct  translation is    Casa
37. f the Documentation Platform in  more detail     Author   EURIX 08 01 2008 Page 15 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3 1 1  The Core Platform    As mentioned above  the Documentation Platform comprises a main component called  Core Platform  The Core Platform essentially offers a Workflow management service     The Workflow management service is the component used in order to manage the  activities of the different GAMPs  In detail  the Workflow management service comprises  a queue for each GAMP within the platform  Every GAMP polls the Core Platform  i e  its  own queue   asking for a job and related resources  when something is available  then  the GAMP starts its process  When it concludes its work  the GAMP notifies the  completion to the Core Platform  The Workflow management service is build up using  the open source component Open Flow  Zope      When a GAMP is scheduled by the Core Platform to perform its process  it usually needs  to retrieve the EDOBs and or the files for the elaboration  To this aim  the GAMP  contacts the Core Platform  which is responsible to contact the Essence and Metadata  Storage component  and retrieves those information     The Essence and Metadata Storage  EMS  system stores the materials on the file  system  and tracks their location by means of a relational database  It is possible to  maintain several copies of the same material  even located on different machines  accessible via suitable protocols  HTTP  
38. for the KIM engine  Moreover  it creates the html  documents that will be exposed by the Publication Platform  Intuitively  the export GAMP  is the component used to implement the communication interaction between the  Documentation and the Publication Platforms     Author   EURIX 08 01 2008 Page 12 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3  The Turnkey System       In the Introduction we have observed that broadcasters often need to digitised  audiovisual big size archives  requiring sophisticated mechanisms of information  retrieval  These broadcasters usually have their own Publication Platform  therefore they  are interested in enriching the multimedia contents by adding metadata  i e  they are  interested in the Documentation Platform only  The modular structure of the MAD  Platform allows to satisfy the requirements of this kind of users  in the sense that one  can only make use of the Documentation Platform module  together with its own system  for retrieving enriched information     It is worth noticing that many other users need to digitise very small size archives  Private  audiovisual archives and archives of resources of an academic institute typically have a  small size  not comparable to the information available to a broadcaster  even local    Users having the goal of managing small size archives essentially do not have their own  Publication Platform  i e  typically they need a complete software performing the  functionalities of the MA
39. he results of the query are then shown in a list  Figure 11  from which the user can  select a document in order to browse it     PrestoSpace Search Interface    Full Text Search   Bush C Programmes News Items    Use CLIR Service  V                Agvenced Seerch   Show Query  News Items list  61 news found in 30 programmes      BBC 10 0 27Clock News BBC News 22 00 Bullettin 2005 07 06 _  E EE Su E EEEREN PERE E           lt  E BBC News Newsnight 2005 10 20 _          F   BBC News Six O clock news 2005 10 05                BBC News Ten O clock news 2005 10 05       Figure 11   the list containing the results of a query    3 2 2  The RSS system    The Platform supplies the feature for exporting the programme in the RSS  Really  Simple Syndication  format  and then read it with the aim of a feed reader  Figure 12      Author   EURIX 08 01 2008 Page 20 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public                         E  Visualizzati 27 27  e Tutti 27  Ordina per     Data  i 5    r n E  Titolo  did a so did the committee the business of announcing that the game so that that it did and yet Aiku  r OR ee  prestospace eurieroup com   gt  Filtra per categoria   did a so did the committee the business of announcing that the game so that that it did and yet they went into world war the d  Other Classification 3          c  the long run you see the    jubilation in london and in singapore the olympic race is one of britain   s vision all of reconnecti     prestospace 
40. his section the user can browse the named entities  and their categorization   founded by the semantic section of the Documentation Platform     Author   EURIX    08 01 2008 Page 43 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    By clicking on the white rectangles under the categories  see Figure 33   it is possible to  highlight the named entities in the transcription tab and then  clicking on them  see a  pop up window  Entity explorer     issued by KIM  with an ontological description of the  entity based on the Knowledge Base of the KIM system  Figure 34        internatianat     Laura Susan Webb j      Bush Tube l        ueen Elizabeth l  ony Blair      Alethea Foster l             Organizations j         AFRICA    SUBCOMMITTEE     Police       ithe House i     white House    Figure 33  named entities    An integrated solution for Audio visual preservation                  pea  Info f Transcription i Semantic Analysis Content Analysis Related Sources  Q bas                        3  z f amp   s of fhe main Hews is that fhe visit He world s richest nations have arrived on fhe   e g  eight summit  y a BRA d     OLSA   alue    be attending a dinner hosted by the queen of th   main summit business start     hasMaindlias gS bw with the aid to attacking climate change on top of fie agenda a woman s    pson editor john simpson is that  hasAlias Africa    d Entities it s of all fhe way of kicking off a major international summit but tony blair s linke  Link to Af
41. hor   EURIX 08 01 2008 Page 22 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    4  Metadata model  defined in WP15   as default standard in the MAD platform we can  assume that EDOB is the reference one  EDitorial Object  and Dublin Core or PMeta  can be used as well    5  Delivery format definition    We can assume that the exchange format used for delivering data will be the same  used internally  Actually WP15 16 will define it  As a starting reference we can use  the EDOB schema     Physical connections constraints   6  The Core Platform will publish its services on SOAP  More precisely it will publish  them as web services on a wsdl interface  For using it  a system will need an http  connection and API for xml soap message marshalling unmarshalling  For  submitting uploading essences  a system needs file samba ftp http s server for  publishing every document it is planning to send to the platform     3 4 2  Processing    Documentation platforms will process the EDOB schema internally and will deal with  some well defined standards as MXF  MPEG7  PMETA  DC    These are the default document formats the platform is expected to manage   Furthermore the platform will handle the following protocols  file  samba  ftp  http  soap     3 4 3  Outputs    The documentation platform will provide the enriched metadata and digitized material  As  final outputs we have a complete export in some format WP16 has to identify  MXF and  further attachments  as MPEG7 and other
42. icted version of the one of the Publication Platform for the  MAD factory described in  D18 2   indeed  the interface allows to access only the specific  information offered by the GAMPs supported by the Turnkey System     5 3 1  How to access the Publication Platform    The Publication Platform is accessible via any browser at the following URL   http   orestospace eurix it PublicationPlatform  The user has to submit a valid Username and Password        E   F    An integrated solution for Audio visual preservation and access SEROU       Today is  2007 04 16 14 38 38          PrestoSpace   An integrated solution for Audio visual preservation and access   Web Application Developed by iugroup com   pre    Figure 22  The Weleoma Page of the Publication Platforin    Author   EURIX 08 01 2008 Page 37 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    After that  it is possible to access the contents of the platform        An integrated solution for Audio visual preservation and access sgos    Today is  2007 03 30 11 38 21    Welcome to PrestoSpace Publication Platform       About PrestoSpace    The project s objective is to provide technical solutions and integrated systems for digital preservation of all     types of audiovisual collections  The project intends to provide tangible results in the domain of preservation  restoration  storage  land archive management  content description  delivery and access  Economic factors supporting preservation services will be  ad
43. ion detection tools   e Visual activity extraction tools   e Speech to text transcription tools  RAI  content analysis    e Audio structuring and segmentation tools  RAI    e Multimedia structure detection tools   e Editorial parts segmentation tools  University of Sheffield    e Reference video clips detection tools    e Low level visual features extraction tools       1 It is worth noticing that any kind of new GAMP can be easily added in the future  In order to insert a new GAMP  it is sufficient to add a new process queue to the Workflow engine of the Core Platform  as discussed in section  3 3     Author   EURIX 08 01 2008 Page 17 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    Among the others  the Annotation GAMP allows the user to enrich an audiovisual item by  means of a manual addiction of metadata    For further details on all the GAMPs developed within the PrestoSpace project  we  remind the reader to deliverables  D15 4    D15 5   and  D15 6      3 1 3  Generic GAMP    In order to add a new GAMP to the Documentation Platform architecture  e g  performing  the extraction of further metadata  two different alternatives are available     1  the GAMP can be implemented following the guidelines about its functioning   Obviously  the new GAMP must fill all these specifics  namely it has to publish all  needed web services and it has to implement the operations required by any  GAMP  for a detailed description  see section 4 2 of this Deliverable      2 
44. is already promising w   s  aid to africa and called the deaths fram malaria and now he s not  EENE i giving anything more global warming i 3  00 20 57 and he s helped the anyone who had to be content to resume in the japanese prime minister paul    Ratan hill had to be strongly resist next ready for africa subcommittee  00 24 09   tony blair s bodies went outside  es   all push in the hallway do you ever get everything you ever want in negotiations like this now it s  00 21 12   i     i time we made very substantial progress can we change the terms of debate on africa yes we can  00 21 23 killed all one went off to meet president and some of the other the this y  12345  TIMELINE E Pegi  Tiii    Figure 31  The Transcription tab    It shows the results of the speech to text analysis of the Documentation Platform  Every  segment  corresponding to a silence or a change of the news reader  is labelled with the  time from the starting point of the programme news  The actual segment of the EDOB  in  which the keyword submitted in the query has been found  is also shown in the timeline    section     As in all the tabs  it is possible to perform a simple plain text searching task using the    input form in the upper right part of the page and clicking on the S icon     Semantic Analysis Tab       Info Transcription f _ Semantic Analysis  J Content Analysis Related Sources bush                meraes E Le i  Organizations E  Locations       Figure 32  Semantic Analysis Tab    Within t
45. isual items with effective levels of accuracy     The MAD Platform is the component of the PrestoSpace project having the following  objectives     1  extracting metadata from audiovisual items     2  offering suitable mechanisms for retrieving and accessing audiovisual contents  based on metadata     Author   EURIX 08 01 2008 Page 3 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    MAD stands for Metadata Access and Delivery   The MAD Platform adopts a modular and extensible architecture  It consists of two  different components     a  The Documentation Platform  b  The Publication Platform    The MAD Platform receives digitised media  audio and video files  as input  these data  are processed by the Documentation Platform  which returns different materials as key  frames  camera motions and metadata  These materials are then indexed and published  on a web server by the Publication Platform     Broadcasters often need to digitised audiovisual archives of very big size  requiring  sophisticated mechanisms of information retrieval  These broadcasters usually have their  own Publication Platform  therefore they are interested in using the Documentation  Platform only  The modular structure of the MAD Platform naturally satisfies the  requirements of this kind of broadcasters     In contrast  many other potential users need to digitised their own audiovisual archives   having very small size  As an example  a Department of a University could need to  digitise some
46. ll Text Search   kush RE Programmes   News Items       Use CLIR Service  M                      From  yyy y mim dd  To tyvyyy   rmim dd     AND  i 2005 08 01  amp  ss  Remove    Named Entities     Personty pe ma   j Bush Add   Remove     placeType oN  Jeu 4 E add  Remove  Titles   FAND     A ja  Remove    Search by   Conmibution w      Search   Reset   Hide Sduanced Search   Show Query         Figure 40  Example of an Advanced Search    Clicking on the Search button  the user can select the entities to be inserted in the query     Author   EURIX 08 01 2008 Page 47 of 51    FP6 IST 507336 PrestoSpace    Deliverable D18 3    Public    PrestoSpace Search Interface       Full Text Search   buch  Use CLIR Service  V  From  yyyy   mm dd     2005 01 01     AND          x          AND      PersonType       Named Entities     E Programmes C News Items    To  yyyy mm dd      2005 01 31 Remove      os   Bush add  Remove       New york    xI    New York  New York  Bush  Bush Tube  Bush God  Bush Bush    1    Bianca Berlusconi Bush  KATE BUSH    Barbara Bush    1  lt  lt     Bush       xI    George W  Bush          Select Entities    Figure 41    Author   EURIX      Named Entities of the query    08 01 2008 Page 48 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    PART C  Conclusions       6  Licensing       The software product described in this deliverable is a prototype in an advenced state of  implementation  In order to engineer this software  some further implementatio
47. mentation  Platform and the Publication Platform  the Export  GAMP       The Documentation Platform also contains an Export GAMP  which is responsible of the  communication between the Documentation and the Publication Platform  In detail  the  Documentation Platform makes use of the Export GAMP in order to forward to the  Publication Platform the results produced by the Documentation Platform     The Export GAMP essentially performs the following activities     e it populates the database used by the Publication Platform    e it generates suitable directories containing javascript files used by the Publication  Platform    e it creates suitable indexes for retrieving information in the Publication Platform     All the above activities are performed in order to transfer information  i e  the enriched  metadata  produced by the Documentation Platform to the Publication Platform     3 4  Inputs  Processing  Outputs       In this section a schematization of inputs  processings and outputs of the Turnkey  System is presented     3 4 1  Inputs    Input data    1  Editorial Object Identification   2  Preservation and Legacy metadata  data provided during the Preservation phase    3  Digitised Material  Essence submission from preservation factory  PRE   The  essences are expected to be published on file  samba  ftp  http  https  so far  implemented or others  protocols by the preservation system  They can be in some of  the planned formats  MXF as default      Input standard    Aut
48. ms     expanded    Public    Clicking the     button on the right of the programme or news chosen will display a pop   up window with the streaming video of the retrieved EDOB   Clicking on one of the retrieved items will open a new window showing the contents  produced by the Documentation Platform  Fig  29         esac       Today is  2007 04 18 09 52 18          TIME  00 00 00 00          13   BaC Programmes  RR  Ea BBC Programmes     19   BBC Africa  R16   BaC  World  peli  17   BBC  Entertainment  Wh 18   BBC England  f 3 19   BBC Have Your Say  RR 20   Bec World    21   BBC World  RR 22   BBC England  SR 23   BBC  Technology  oR  24   BBC Scotland  S  25   BBC Programmes  S 26   BBC Programmes  27   Other Classification                            fa os            gt      x    An integrated solution for Audio visual preservation and access        Info  Sa J Semantic Analysis   Content Analysis   Related Sources                00 20 00    00 20 06    00 20 21    00 20 31    00 20 35    00 20 44    00 20 57    TIMELINE    the days of the main news is that the visit the world s richest nations have arrived on the needles  with the g  eight summit    tonight the be attending a dinner hosted by the queen of the main summit business starts  tomorrow with the aid to africa attacking climate change on top of the agenda a woman says john  simpson editor john simpson is that    about it it s of all the way of kicking off a major international summit but tony blair s linked up with
49. n Platform can be represented as shown in  Figure 8     Author   EURIX 08 01 2008 Page 14 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public       Rich  Content                                                                                                                               Figure 8  the Documentation Platform    The overall services offered by the Core Platform are available through web services  interfaces based on SOAP  Using web services  every GAMP polls the Core Platform  asking for a job  and then submits the produced metadata and notifies the completion of  its work to the Workflow Manager  By using web services  GAMPs can be implemented  by using any programming language supporting SOAP and web services protocols     The architecture of the Documentation Platform has the following peculiarities     e it is modular  since GAMPs can interact with the Core Platform even being totally  different in implementation details and functionalities    e it is extensible  in the sense that it is easy and natural to think about the insertion  of anew GAMP    e it is platform independent  since the Core Platform itself is implemented in Java   therefore portable to several architectures and operating systems    e itis characterized by a multi tier distribution  in the sense that every GAMP can be  installed on a different physical system  provided that a network link to the Core  Platform is available     In the rest of this section we analyze each component o
50. n steps  have to be performed  As an example  an improvement of the feedback machinery is    needed     7  Bibliography        D15 4    D15 5    D15 6    D16 2    D16 4    D18 1    D18 2    D19 0 1    D19 0 2     Author   EURIX    Content Analysis Tools    Cross Linguistic IE Tools Analysis    Semantic Interpretation Tools    Conceptual Search    Delivery Models    The Documentation Platform for the MAD Factory    Publication Platform for the Results of Digitization and Documentation  External and Internal Models and Protocols for the PrestoSpace Factory   The PrestoSpace Orchestrator  PSO     08 01 2008 Page 49 of 51    FP6 IST 507336 PrestoSpace    Deliverable D18 3    8  Glossary    Public          Term    Description       ADMIN    The component of the Documentation Platform  which allows the user to manage and monitor the  activities of the Documentation Paltform  It  represents the interface between human users and the  Documentation Platform              Core Platform       The component of the Documentation Platform  offering a workflow management service and interacting  with PSO   s components EMS and cvs  This software  component represents the middleware which is  publishing web services interfaces  It has a built in work  flow engine for managing all the activities done within  the MAD platform  content analysis  semantic analysis   annotation  delivery  etc                     CVS Concurrent Versioning System  the system which is  responsible for tracking every 
51. ocument containing all the information  necessary to recover the file  that is to say the protocol to use  the port to adopt  and so  on    As another example  consider phase 5  and suppose that a GAMP needs to store some  files in the factory  In this case  the GAMP returns an XML document to the Core  Platform  this document contains all the parameters needed to access the produced  data  The Core Platform will then send this XML document to the    restricted    PSO  managing the activities of the Tunrkey System  which will download these new materials  according to the directives of the EMS     The interaction between a GAMP and the Core Platform can be summarized as follows     1  The GAMP asks a getJob queueName  XML to the Core Platform   queueName is the name of the queue associated with that GAMP  This service  returns an XML document with all the information about the job to be  processed by the GAMP     2  The GAMP asks a checkoutEDOB idEDOB  XML to the Core Platform  by  means of this operation  the GAMP asks the Core Platform to return the EDOB   an XML document   whose identifier   dAEDOB  is passed as an argument     Author   EURIX 08 01 2008 Page 28 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    3  The GAMP invokes a getMaterial UMIDs  XML  aiming at recovering some  files needed to its metadata extraction  As usual  the Core Platform answers  with an XML document  The identifiers of all needed files are stored in the list  UMIDs  which is pass
52. pt another work flow system  thus replacing OpenFlow   one does not  need to change the overall architecture  it is only needed to implement all the work flow  interfaces  providing suitable classes  These classes will replace the ones provided for  the OpenFlow engine     Let us conclude this section with a brief remark on the physical environment of GAMPs   The GAMPs  the clients  involved in the process can be thought either as running on the  same machine or as performing on different ones  It is worth noticing that the whole MAD  System could be made up of a Rack system where every single machine will handle  some specific task  Furthermore  as discussed in section 3 7 above  one can think of  having a limited number of GAMPs running on a    lightweight    Documentation Platform   requiring results from other GAMPs running on other machines  A GRID like architecture  can be provided to ensure the connections among these components     Author   EURIX 08 01 2008 Page 27 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    4 1 2  Interaction between GAMPs and Core Platform    In section 3 2 we have recalled the main features of GAMPs  GAMP stands for Generic  Activity Metadata Processor  A GAMP implements a specific process of metadata  extraction within the Documentation Platform     In this section  we describe how a single GAMP interacts with the Core Platform  of the  Documentation Platform   in order to offer its functionalities     As a first step  1   a GA
53. rica p dale talked with one of the others     Table Mountain subRegionOF    Mount Stanley subRegionOf    Mount Sinai SubRegionOf hem landed almost a critical of course itself is HE main event to have to convince     Mount Kilimanjaro subRegionOF eryone feels the start of    Mount Kenya subRegionOF      Mount Elgon subRegionOF dy promising w  s  aid to africa and called the deaths from malaria and now he s r  nything more global warming i       et up he pressure on the g  eight he s taking a big risk      Cameroon Mountain subRegionOF  E airings Montane subRegionf b helped fhe anyone who had to be content to resume in fhe japanese prime mini   amp  Tibesti Mountains subRegionOF  in he strannly resist next ready for EEE  lt uheammitter     Ruwenzori zori Range subRegionOf 132                       amp  Operazione completata D  Internet       Figure 34  The Entity Explorer    Author   EURIX 08 01 2008 Page 44 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public       Content Analysis Tab        Infa Transcription    Semantic Analysis k Content Analysis i fied             o    00 00 25 15 00 00 30 18 00 00 35 21 00 00 40 24 00 00 46 02 oc          ee  wl     e    Figure 35  Content Analysis tab    Within this section  the user can see the technical information related to the video part of  the EDOB  This page shows the stripe images that represent the combination of the  central column of each key frame of the shots representing the video and are useful to  see changes in
54. rvices  interfaces based on SOAP  Using web services  every GAMP polls the Core Platform  asking for a job  and then submits the produced metadata and notifies the completion of  its work to the Workflow Manager  By using web services  GAMPs can be implemented  by using any programming language supporting SOAP and web services protocols     The architecture of the Documentation Platform has the following peculiarities     e it is modular  since GAMPs can interact with the Core Platform even being totally  different in implementation details and functionalities    e itis extensible  in the sense that it is easy and natural to insert a new GAMP    e it is platform independent  since the Core Platform itself is implemented in Java   therefore portable to several operating systems    e itis characterized by a multi tier distribution  in the sense that every GAMP can be  installed on a different physical system  provided that a network link to the Core  Platform is available     The Publication Platform is the component of the MAD Platform providing retrieval and  browsing functionalities  In detail  it deals with instances of documents in MAD metadata  format  making them available on a web representation  and it gives access to the  material sources exported from the Core Platform    The Publication Platform comprises three different main subcomponents     e aweb application  namely the user interface     Author   EURIX 08 01 2008 Page 6 of 51    FP6 IST 507336 PrestoSpace Deliv
55. se materials are then  available for information retrieval   The architecture of the MAD Platform is schematized in Figure 5 below        ca   a    Digitised media Rich Web  MAD Content MPEG7  PMETA  DC interface    Extracting Retrieving i  aa 54 A i m   i   R       eS JPG                Figure 5  the architecture of MAD     As cited here above  the MAD Platform adopts a modular and extensible architecture  It  consists of two different components     e The Documentation Platform  e The Publication Platform    taking care of its two fundamental goals  the red and underlined text in white boxes of  Figure 5   The resulting architecture is presented in Figure 6     Author   EURIX 08 01 2008 Page 11 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public            Digitised media Rich Wt  Content M PMETA  DC interface    Documentation oy ar MXF Publication  Platform z S Platform  Lae                   Figure 6  the architecture of MAD  2      The MAD Platform receives digitised media  audio and video files  as input  these data  are processed by the Documentation Platform  which returns different materials as key  frames  camera motions and metadata  These materials are then indexed and published  on a web server by the Publication Platform  The communication between the  Documentation Platform and the Publication Platform is implemented by means of an  Export GAMP  This export GAMP is responsible of inserting suitable information in the  database and of creating an index 
56. tches fds tides rah ae el asta st eds ie ce al ceed Tee ete 3  Its  Scope OF TMS Documenta orators ecageriv ecules  sso de a A a Ea e AAA RA EAA ERa aa 3  1 2  Executive Summary acs cicie eect clare tc be eel ate eek etic bees onthe ACRE ENG eas 3   PART A  Architecture and Implementation of the Turnkey System                       9   Ze Whe MAD  Plafon a a eaan E a Chad E Eaa aa arati 9  2 1  Architecture of IWIN Dx ss a eo eh Se A ee 10   3  The Turnkey System Oe mee ee ee 13  ook  The Documentation  Platform svn   ct ied tester ra 14  3 2  The Publication Platform nese cst etter ete tenet eet 18  3 3  Communication between the Documentation Platform and the Publication  Platform  the Export G21     sean ene Neer Rrra eT Hern Meena fier BERET ener Hear enor ie ee fi 22  3 4  Inputs  Processing  Outputs i carers eras cesecusecesecusecasrsmacevesusureialusesesesasetesusess 22  305      FHE ADMIN COMPOMGIN ionar a al 24   4  Implementation of the Turnkey SyStem              ccccceeeeeeeeeeeeneeeeeeeeeeeeeeeeenneeeeeeeees 27  4 1  Implementation of the Documentation Platform               cccceeeeeeeeeeeeeeeteees 27  4 2  Implementation of the Publication Platform               cccccceeeeeeeeeeeeeeeeeeeees 30   PART B  Utilization of the Turnkey System                       ccccccceeeeeeeeeeeeeeeteeeeeeees 32   5  Howto use the Turnkey System                   ccccceeeeeeeeeeeeneeeeeeeeeeeeeseeeneeeeeeeeeeeeeenee 32  5 1  Howto use the ADMIN component                 cccceeeee
57. titles  credits  and programme publication  information    e Editorial parts of information  i e  information about the relevant editorial sub items  of a programme  such as news items    e Content related information  such as text of speech  descriptions  and visual low  level descriptive features    e Enrichment information  coming from external sources related to the programme  content     The data model adopted  representing the above classes  together with a data format  carrying all the entities and relations of it  consists of a single XML based document  format  resulting from the combination of MPEG 7 and P_META  More in detail  MPEG 7  has been used thanks to its powerful temporal segmentation tools and for its  comprehensive set of standard audiovisual descriptors  whereas P_META has been  adopted in order to capture information structures for identification  classification and  publication related features of a programme     In Figure 4 a schematization of the adopted document format is presented  Ad hoc data  structures  introduced to represent those information not supported neither by MPEG 7  nor by P_META  are emphasized  In addiction  it is worth noticing that the SMPTE UMID  standard has been adopted in order to capture the unique identification of the instances  of audiovisual material all throughout the platform  namely original media  digitally re   mastered media  and all the material generated by the documentation process  e g  key  frames      Author  
58. ut an EDOB    In this section we have only presented an introduction to the ADMIN component  with the  aim of giving an overview of its functionalities  We do not present details on the  implementation of this component  since it is a standard web application  However  in  section 6 1 we will give a detailed discussion on how to use the ADMIN component by  means of the web application     Author   EURIX 08 01 2008 Page 26 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3 Public    4  Implementation of the Turnkey System       In this section we give further technical details about the implementation of the Turnkey  system  We present detailed information about the implementation of its components   namely the Documentation Platform and the Publication Platform     4 1  Implementation of the Documentation Platform       4 1 1  Physical Environment    In order to manage the activities of different GAMPs  the Documentation Platform makes  use of a Work Flow system    The Work Flow system can be either a commercial or an open source system  Our  current implementation adopts the OpenFlow engine  running on the Zope platform   However  in order to allow possible future changes to the adopted work flow system  the  Java components interacting with the work flow management system has been  developed in an abstract way  More precisely  suitable interfaces are provided   Moreover  classes implementing those interfaces and referring to OpenFlow are also  provided    In order to ado
59. y system for delivering to  small archives  A small scale stand alone production          Author   EURIX    08 01 2008    Page 50 of 51    FP6 IST 507336 PrestoSpace Deliverable D18 3       quality system suitable for small archives and already  configured for the publication of the preserved material   Given the intrinsic modularity of the MAD Platform  the  functionalities deployed in a Turnkey System installation  can be modulated according to the user needs  The  Turnkey System Will be derived from the test bed that the  project is setting up to run all the experiments needed to  define the specifications of the final platform        EMS    Essence and Metadata Storage System  the system    which is responsible for storing the essence within the  PSO           Documentation  Platform       The component of the MAD Factory which is  responsible of extracting metadata from audiovisual  content by means of different GAMPs           Author   EURIX    Public    08 01 2008 Page 51 of 51    
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
ELLIPTICAL TRAINER BEDIENUNGSANLEITUNG  Chimie organique  Storage Options 55509 User Guide  Composting: A Guide to Managing Organic Yard Wastes  ① ② ④ ③ ⑤ ⑥  Ministry of Housing, Land and Marine Affairs  manualediistruzioni dragonfly    M7500  高性能 UHF 家庭用アノテナ    Copyright © All rights reserved. 
   Failed to retrieve file