Home
D5.3.1 – Europeana OAI-PMH Infrastructure – Documentation and
Contents
1. As Stripes will not be used any longer all the URLs will be changed too e g the http request aggregator CreateEditAggreator action preEdit will be changed to aggregatorForm html NB To go from the class name to a URL stripes removes ActionBean if it is the last part of the class name and converts it to a path and appends action CRUD operations in REPOX have the following syntaxes taking aggregator as an example creating a new aggregator http localhost 8080 repox createAggregator html Create Read Update Delete 18 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g i a ey europeana connect reading an aggregator http localhost 8080 repox viewAggregator html aggregatorld aggtestr0 updating an aggregator http localhost 8080 repox editAggreator html aggregatorld aggtestr0 deleting an aggregator http localhost 8080 repox deleteAggregator html aggregatorld aggtestr0 For each CRUD operation a method annotated with RequestMapping within the Spring Controller and a JSP view has to be created 4 1 2 Limitations of the intended approach e No flash scope in Spring MVC Flash Scope lets you pass a message from one page to the next and only to the next page Alternatively you could add a message as GET parameter which isn t very clean Spring offers flash scope only on the Spring Web Flow level Figure 6 e Separation of
2. ccccccccseccceeeceeeeceeeceeeeseeeeeeeessueeseeesseeens 4 1 2 Limitations of the intended approach 4 1 3 Status of the refactoring 26 cccccccseceeseccaceccscsecssceecoceecescenssceecenoecescenes 5 REPOX SOUCO COS areas re cecsceminentcrnena sean ORE 6 Conclusions ANd OPEN SSUES nensnennennoennoennnrnneenenrrensnrrsnersnernrrrnnrrnerrnerrennee europeana connect 5 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a ey europeana connect 1 Introduction The Open Archives Initiative Protocol for Metadata Harvesting OAI PMH http AWwww openarchives org has become a cornerstone of the content integration strategy of The European Library TEL http search theeuropeanlibrary org portal en index html Millions of catalogue records from dozens of providers primarily national libraries have been harvested for the TEL portal Europeana aims to provide access to the content behind the catalogue data and not only from national libraries but any provider of digital cultural objects in Europe from universities to archives to museums This will lead to a substantial increase in harvesting targets from dozens to hundreds and eventually thousands In order to meet the demands of this leap in harvesting scale this task will provide the implementation of a solution for the administration of available European OAI PMH data providers and respective dat
3. connect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype ber co funded by the European Union The project is co funded by the European Union through the eContentp us programme http ec europa eu econtentplus Osterreichische E Nationalbibliothek EuropeanaConnect is coordinated by the Austrian National Library EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype Appendix REPOX User Manual europeana connect ECP 2008 DILI 528001 connect EuropeanaConnect Europeana OAI PMH Infrastructure Documentation and final prototype Deliverable number name D 5 3 1 Dissemination level PU Delivery date 11 10 2010 Status Final Gilberto Pedrosa IST Petz Georg Author s ONB Cesare Concordia Nicola Aloia ISTI eContentplus This project is funded under the eContentplus programme a multiannual Community programme to make digital content in Europe more accessible usable and exploitable sterreichische g Nationalbibliothek EuropeanaConnect is coordinated by the Austrian National Library EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype Appendix REPOX User Manual europeana connect Distribution 01 10 2010 Cesare Concordia Nicola Aloia Approval 17 12 2010 Jan Molendijk EF Technical Lead Revisions Cesare 0 2 Draft Concordia 09 10
4. Administration BAI PMH server REPOX IHavestSource a Sg O m OAI PMH diet Repox Manager Data Manager a WW O Ji H Z39 50 Q oN L Repox2Sip AccessPoints IRegisterSchema RegisterT ask HTTP a S T Q Access Points Metadata Task Manager FileSystem Manager Transformation Manager Repox2Sip REPOX DB storage Figure 1 REPOX context and architecture of components Repox2Sip p The main component of the REPOX infrastructure is the REPOX Manager The REPOX Manager glues together the other components by managing all the repository processes It also provides an administration user interface to view the Service status and perform Service operations and an interface to manage Data Providers and Data Sources The Data Manager harvests the records from the data sources via the data source interfaces which may be OAI PMH HTTP get Z39 50 or a folder in the file Service with files in the format ISO2709 MarcXML MarcXchange ESE or any XML format The method of choice for harvesting will be an XML Folder or OAI PMH Client interface For each Record harvested the 8 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype F a ei e connect Access Point Manager creates and stores the indexes of the access points in a database The Data Manager also provides a set in the REPOX OAI PMH server for external access to the Records The Acces
5. dass CreateEditDataProviderActionBean extends RepoxActionBean CL ei CreateEditDataSource 39 50ActionBean java hepublic dass CreateEditDataSource73950ActionBean extends CreateEditDataSourceActionBean a ei CreateEditDataSourceQaiActionBean java public dass CreateEditDataSourceOaiActionBean extends CreateEditDataSourceActionBean x ei statistcsActonBean java oul dass StatisticsActionBean extends RepoxActionBean 5 ei ImportFromFileActionBean java public dass ImportFromFileActionBean extends RepoxActionBean ma ei CreateEditAggregatorActionBean java public dass CreateEditAggregatorActionBean extends RepoxActionBean SE ei DeleteDataProviderActionBean java public dass DeleteDataProviderActionBean extends RepoxActionBean S DeleteDataSourceActionBean java public dass DelebeDataSourceActionBean extends RepoxActionBean ag ei CreateEditDataSourceActionBean java 9ublic abstract dass CreateEditDataSourceActionBean extends RepoxActionBean S ei HomepageActionBean java public dass HomepageActionBean extends RepoxActionBean E ei ViewDataProvider ActionBean java 9Ublic dass ViewDataProviderActionBean extends SchedulingActionHelper E ei ViewAgoregatorActionBean java public dass ViewAggregatorActionBean extends RepoxActionBean oa ei SchedulingActionHelper java 1 public dass SchedulingActionHelper extends RepoxActionBean P Figure 5 REPOX Action Beans 4 1 1 URLs for CRUD operations
6. or indirectly by any other mean e Aggregator an entity that aggregates Data Sets Descriptive Metadata from Data Providers ideally through the OAI PMH protocol with the purpose of making it available to Europeana also through the OAI PMH protocol e Descriptive Metadata data about information objects relevant for Europeana that Data Providers make available for Harvesting e Metadata Set by default the same as a Data Set of Descriptive Metadata e Descriptive Metadata Set by default the same as a Data Set of Descriptive Metadata e Metadata by default the same as Descriptive Metadata 7 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a AO europeana connect 3 Design of REPOX This section describes the software architecture and design of REPOX The design is a consequence of the requirements both functional and non functional which were identified in the documents M5 3 2 The Europeana OAI PMH Infrastructure Updated Specification and Design and M5 3 3a The Europeana OAI PMH Infrastructure Second Prototype REPOXxX is an infrastructure to store preserve and manage metadata sets in XML It can play the role of a broker or other specific service in a Service Oriented Architecture It can manage transparently data sets independently of their schemas or formats 3 1 Service Architecture cmp REPOX Component Mo EUROPEANA UI
7. 2010 Nicola Aloia Petz Georg 10 10 2010 Some additions de Blees 07 02 2011 Some formulation changes 3 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a AO europeana connect Executive Summary The task 5 3 in EuropeanaConnect aims to provide a solution for Europeana to manage metadata harvesting of thousands of digital heritage content sources across Europe which results in the REPOX service IST has led the task of developing the REPOX together with ONB The purposes of the REPOX are the management of the aggregator data providers and their harvested metadata records This document describes the specification design and implementation of the REPOX The remainder of the document continues with the integration of REPOX in the Europeana ingestion workflow followed by the refactoring and code review done in REPOX source code 4 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype Table of Contents ES UTI nn Ve A EE oer oes one E eames enc omen A E A Tele Pe ie el Die dLe i te ee E EZ WC edel Siet Bed a od EE 3 1 DEVICE CAOC O eet 3 2 Dala e 3 3 palaga o IO GG EE 3 3 1 Internal REPOX database nnannnennnennnennnennnnrreerrrerrrerrrrrrrrrrrrrrrrsrrrsreene 332 Mepox25ip Bee 3 4 Integrating REPOX in the Europeana ingestion workflow SE ne 4 1 1 URLS for CRUD operations
8. SE JUOLIELUSHS dWTOSL ABS eU1S1IeUL XS SsDUNOSelep lt SULBU9NGO ZOSTIOIPUS1T GO ZOSL DWE Let Un id lt f PLUEQ Y60 Z05 4O1eusiL saunoseiep Asy abe e ELueq Lyso Zos Joeda gOLZosL dew SL On 19 sse 940 L egzuauwe dugosL 4Bele41513641x8 aIunNoSsP1ep 158 UaUM 3 gt lt S0042 2 gt JEUWJ0JEe epeleu eoanoge lep g 51581 JLII gt lt Soi zost4oredarI po zost odet gsi pin rd lt 1 60 Z051 LPUJOJPIEPP La SBIUNOSPLEp Sos LNU JEWIOYELEpeLawaaunoselep lt f 4ap Log uowwoo Asy abessom yuy gt SUP SSELD SIUNOSELED s 1581 UGUM o gt lt UdyM 3 f gt lt YULL SodLuqs gt lt Hd IYO Leslie HWd IvO 3861 8L2L2 Bud LLWSUed1 yJomau sabeuL dsC yegaxeques 1senbau ixaluojabed ous Bu gt lt dadsodwTAsoi38u Lqsaunoseieg adeu xoda Let L n 19 ix LpSigeLEpelsi UEL PUPE ae sf Jeeeoreoinoegigepkt an ea ez Ss0eu mesed f heehee ule iets SIUNOSPIEP S an eA eyp4anses UEL weuwed tebue dsl yWd Tyo1se1 dsC sii UL SbeWL ssP lo Se Ieuioigippngeiau e2lpnoegIgpkt n eA lt f l4oys Leo aaunoselep Aey abessom 101 z lt Leqs3unosejieg Leo xodau isit yin id SWEU SSELI SDUNOSPIep S ise Hau o gt oS 009 J gt DI ecdAy uouuos Aay bessa yuy gt lt p1 gt lt Pll odes7splifid in ast odad y 011 FLAN SEI 1U elsojesanous sedliis dd44 Lan I SUOLISUNI LIS Last ioo Uns ene L77 10139 L4N SES KE e DIEN ES ATIY Hm OH zwei
9. a sources This is an absolutely essential extension of the Europeana portal that will enable large scale import and thereby rapid attainment of a critical mass of digital content The two lines of action that this task will take are the management of a large number of aggregators and data providers and the management of the large quantities of metadata records made available by those data providers As an infrastructure service this does not benefit end users directly but rather indirectly by supporting the integration and efficient management of more content The direct beneficiaries are the administrators of the Europeana portal who will be able to manage and integrate content more efficiently and hence at a lower per unit cost For the management of the OAI PMH data providers the solution REPOX provides the following functionalities e The registration of aggregators data providers their collection descriptions one data provider might make available to Europeana more than one collection and the configurations for the harvesting of the relating metadata e The automatic and manual harvesting of the collections by OAI PMH according to configurations and options provided by the data providers and the decisions of the administrators of the central service e Monitoring of the quality of service of the OAI PMH servers including statistical reporting For the management of the harvested metadata records the following functionalitie
10. e source_data with the delivered content dump and content_hash all other fields should have the default values The algorithm for generating content_hash is sha256 hash with all the linefeeds stripped The field status can assume the following values e created default the record is created but not yet processed in any way e idle nobody is touching it waiting for more checks e processing a checker is working on this record e problematic something went wrong human intervention might save the record e broken record is invalid some check showed this ESE is not acceptable e verified all checks succeeded could be sent to production 15 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype F a europeana connect The mdrecord_id is an identifier used by REPOX when no id XPath is provided it is automatically generated by REPOX otherwise it is extracted using the defined in the id_q_name field of the Data set table 3 4 Integrating REPOX in the Europeana ingestion workflow The SIP Manager is the Europeana software component responsible for managing and processing harvested data http europeanalabs eu wiki SpecificationsRhineRequirements IngestSipManager REPOX and the SIP Manager coordinate their activity by exchanging data via a shared data store The next paragraph describes the database schema for data and the synchronization protocol for operation
11. e to create a visual mapping of them which is stored in an intermediate XML format internally to allow editing The records can be retrieved by OAI PMH in their original format ISQO2709 can only be accessed in MarcXchange as specified previously or any format which has been configured a mapping to The mapping is performed by request and not stored because the performance impact is not noticeable The record identifiers used in REPOX can be associated in two ways generated by REPOX or extracted from each record using an XPath expression The advantage of using extracted identifiers is that it is possible to update just the changes because the records can be recognized by the identifier In the diagram there is a third option provided ids in which all records ingested must be sent with their identifier 3 3 Database Model In REPOX the records are stored in two databases internal REPOX database and Repox2Sip database 3 3 1 Internal REPOX database There are only two tables in internal REPOX database Figure 3 for each Data Source a record table and a timestamp table The record table stores an internal id a unique id used for OAI PMH a deleted flag and the value in a blob the value is the zipped XML representation of the record The timestamp table has the same fields except for the value which has the date instead of the record XML representation There are two tables and duplicated fields for performance reasons when the record i
12. edTask gt L gt L gt gt Data Source Directory Data Source 5 E E E IdGenerated Access Point Access Point Record Timestamp IdExtracted _ File Extract Strategy 1SO2709 File Exctract Simple File Extrac Figure 2 REPOX domain MarcXchange File Extract Aggregators are entities that aggregate Data Providers and contain information like name name code and the homepage Data Providers are entities with one or more collections of records record sets each associated to REPOX by a Data Source They typically represent an institution e g a Library Data Sources represent the source of a record set which is then provided by OAI PMH Data Sources are either OAI PMH or Directory Importer the former meaning that the records will be harvested from an OAI PMH server and the latter meaning a folder in the file Service To ingest the folders in the file Service REPOX recognizes three strategies Simple File Extract ISO2709 File Extract and MarcXchange File Extract Simple File Extract is the default method where there is no processing of the XML records The only associated logic is validation of the XML ISO2709 File Extract and MarcXchange specifically target those formats 1502709 File Extract requires the file Character Set and the format variant because even though ISO2709 is a standard some institutions do not follow it exactly Because 1502709 is not an XML f
13. est can be sent to production When REPOX is parsing the file the request should be in the state under construction and REPOX may abort the request and set the status accordingly when this process is done the status value must be changed in import completed and after this point REPOX cannot change the data The field status can assume the following values e under construction REPOX is creating a new request e import completed REPOX ready sip can take control when ready e aborted something went wrong e sip processing SIP has found the request REPOX may not any more delete the request e pending validation sign off all records for this request completed e pending AIP sign off e creating AIP s AIP completed s RequestMDRecord table This table links all the metadata records belonging to a given request REPOX can only insert records in this table whose request status is under construction if the request is aborted don t remove links from this table this task will be done manually This table is needed because we want to maintain history of requests for the same data set e MDRecord table This table contains the original record and all its refinements The value of the field contenthash will be provided by the Harvester and is used to identify the ESE record REPOX may only insert new MDRecord and cannot change or delete existing items The two fields that must be actually filled by REPOX ar
14. icelmpl java 5 ptut ist repox web spring session Il Session java II SessionService java SessionServicelmpl java Figure 9 pt utl ist repox web spring session 21 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype europeana connect 5 REPOX source code The REPOX source code is available at the httos europeanalabs eu svn contrib repox This application is using MAVEN according to the Europeana guidelines The project is organized in two packages main and tests Inside the main package there are the java package with the REPOX OAI and Europeana classes the resources that contains all configuration files and REPOX properties and the webapps that includes the documentation and the JSP s The main tests of REPOX and Repox2Sip functionalities are in the test package The tree of REPOX is represented at Figure 10 C Karepon G doc E legacy EES sre EE main EES java Ey eu europeana Ge core util web E repox2sip E org odc oai E E harvester H E server Ga util 6 E pt utlist E E characters H E marc H EI repox GL util H 9 resources Ej webapp H E3 documentation H E jsp E E3 WEB INF B E test E java o B E eu europeana repox2sip o B E ptutList repox E resources el test application context xml i dl Test configuration properties z CodeReview txt derby log Wi log bet po
15. ince Version 3 is using the help of annotations to handle web requests The Controller annotation which indicates that the annotated class is a controller class and can handle web requests is used for this purpose e New JavaServer Pages JSPs for resolving views have to be created and existing ones have to be adapted e Spring form backing beans have to be applied to represent the forms in the JSPs For the validation JavaBean validation Valid is used Spring 3 includes support for JSR 303 http de wikipedia org wiki Model_ View Controller http stripes sourceforge net docs current javadoc net sourceforge stripes action ActionBean htm http static springsource org spring docs 3 0 x javadoc api org springframework stereotype Controller htm http icp org en jsr summary id 303 17 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype europeana connect E ei MapMetadataActionBean java ou dass MapMetadataActionBean extends RepoxActionBean ei CreateEditDataSourceDirectoryImporterActionBean java public dass CreateEditDataSourceDirectoryImporterActionBean extends CreateEditDataSourceActionBean EL ei Scheduler ActionBean java public dass SchedulerActionBean extends SchedulingActionHelper P SR ei Delete AggregatorActionBean java public dass DelebeAggregatorActionBean extends RepoxActionBean E ei CreateEditDataProviderActionBean java public
16. ion Shared Europeana Request data_format data_set_id String id long request_name String status Request Status time_created Date Sip Europeana Vocabulary record content_hash date Created date last_modified id long link URI source_data Sip Europeana Labels id label language term_attribute Figure 4 Repox2Sip data model A brief description of tables whose content is managed by REPOX is given below e Aggregator table The field name_code is set by the ingestion operator currently something like 97 The field name is a human readable name of the aggregator 13 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype F i a AO europeana connect e Provider table The field country contains the code that identifies the country of the provider this code can be the two letters ISO code 3166 1 alpha2 http www iso org iso english_country_names_and_code _ elements or the string eu if the provider is an EU organization The field name_code is set by the ingestion operator currently something like 004 The type values can be e Museum e Archive e Library e Audio Visual Archive e Aggregator e Research educational e Cross sector e Publisher e Private e DataSet table The value of the language field is a code identifying language of the data set this can be a ISO 639 two letters code h
17. m xml 2 README tet La gl repox iml lh External Libraries Figure 10 The REPOX tree Apache Maven is a software project management and comprehension tool http maven apache org 22 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g i a AD europeana connect 6 Conclusions and open issues For the final Europeana Danube release the main addressed issues were The refactoring of the source code clean and easy to maintain version Evaluation and improvement of the performance The migration from the Stripes interfaces to Spring was started Some others improvements were implemented o enable authorized users to upload to REPOX a file from a local computer and then harvest the file content in REPOX o enable authorized users to download harvested files that exist in REPOX o improve the visual feedback of operations especially for harvesting more relevant and dynamic reporting Issues for future consideration For pragmatic reasons the current REPOX version uses two databases Derby and PostgreSQL In the future would be better use only one database to avoid data replication and to increase the REPOX performance In practice the only implication would be the creation of a new class specific for Postgres database that will extend from AccessPointsManager class Comparing the performance between REPOX installation at IST using onl
18. n eb Q gt O BS O UN Qa o E Ge O fa Cc KS Ces 30 G eb D gt O O O l eb UN gt O gt E Wal D eg I lt x O fa C Wa Q O Bn gt LLI l D LO O O eb G Cc O O Wa Cc Wa eb Q O Se D gt LLI lt a LJELazInogegJep sse 5 niet qi be qi beg edit d LE l del apnpurssnoseep E 20 23 f gt Figure 7 JSTL tag lt c EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype 4 europeana connect 4 1 3 Status of the refactoring The following classes CreateEditAggregatorActionBean ViewAggregatorActionBean DeleteAggregatorActionBean CreateEditDataProviderActionBean ViewDataProviderActionBean DeleteDataProviderActionBean see Figure 8 haven been successfully ported to the Controller classes in ot utl ist repox web spring As described in 4 1 2 Spring MVC offers no flash scope pt utListrepox web spring AgoregatorController java AggregatorForm java DataProviderController java DataProviderForm java Homepage java Homepage Validator java PropertesController java Proper tesForm java Figure 8 pt utl ist repox web spring In package pt utl ist repox web spring session Figure 9 a Session Bean was created to simulate flash scope The bean SessionService java is accessed by the controller classes via the Service SessionServ
19. ormat and REPOX only handles XML the format is ingested as MarcXchange because there is no data loss 10 23 AD europeana connect EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype F a ei e connect transforming from one to the other In the three scenarios the files may be zipped in the file Service and they will be unzipped prior to ingestion Data Sources can have associated ScheduledTasks by scheduling an Ingest of records or an export of the records to the file Service Those Scheduled tasks are handled by the task Manager A Task is a managed action in REPOX Scheduled Tasks are tasks that occur at specific times with a periodicity unique harvest daily weekly and every n months Access points AP enable the retrieval of the records by more than only their identifiers For that purpose access points are associated to Data Sources to define how to process the pertinent information for indexing These AP are used by the AccessPointsManager APM to extract the relevant data from each record and build the respective indexes Those indexes are maintained in a relational database for efficiency as they are not part of the fundamental model A Metadata Transformation is a translation between two metadata formats ex MarcXchange Marc21 to ESE Every Data Source can have any number of transformations The transformations are stored as XSLT files even though it is possibl
20. s Figure 4 shows a diagram of the Repox2Sip DB schema using the UML static diagram symbols class symbols represent tables class attributes represent fields and associations represent relationships and their cardinality Stereotypes above table names indicate the component owning the write permissions on the table for instance the table Aggregator can be modified by the REPOX component while the table URIs can be modified by the SIP Manager Some tables are shared and can be modified by both 16 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a AO europeana connect 4 User Interface During the period of work reported in this Deliverable it was decided to migrate the user interface from the Stripes Framework to Spring MVC Both frameworks are based on the Model View Controller MVC pattern but different techniques regarding its implementation are used Hence annotations are used instead of interfaces JSP sites are refactored to Spring Tag Libraries and form validation is applied via JavaBean validation In detail this means e 17 Stripes Action Beans Figure 5 have to be replaced by corresponding Spring Controllers All of the aforementioned 17 classes implement the interface ActionBean Implementations of this interface respond to user interface events and receive information about the events usually a form submission Whereas Spring s web framework s
21. s content_hash String id long mdrecordld String pid long source_data String status Enumeration time_created Date time_last_changhe Date uniqueness hash String annotation String mdrecord_id long userid long NEE E EE EE EE email String id long name String Sip Sip Europeana Europeana Uris ProcessMonitoring date _lastcheck Date err_msg String id long md_rec_id pid status un source URI url URL xmlelement String Sip Europeana URISource dns_name String id long ip_nr String pid long pid long role status time_ started te tet tt Repox Europeana Provider aggregator_id long country String description String home_page URL id long item_type Enumeratio name String name_code String providerld String Shared Europeana Request_MdRecord id long md_record_id long request_id long Sip Europeana Allignment data_blob Object facet_id long facet_link URI facet_type Enumeration facet_value String id long md_record_id long Sip Europeana MDRecordErro exit_code id long md record message task name time_stamp Repox Europeana DataSet description String home_page URL id long id_q_name String inputOaiSet String item_type Enumeratio language Language name String name_code String outputOaiSet String provider_id String q_name String strategy Enumerat
22. s Points Manager manages the indexes of the Records For performance reasons all record content and indexes are stored in the database An Access Point is an Index to access specific fields of the records in the database Currently REPOX only indexes the identifiers to access the records content and the timestamp The Metadata Transformation Manager is responsible for the registration of transformations between metadata formats schemas and for the application of those transformations between specific metadata sets The Task Manager is the component that handles tasks with time constraints There is a background thread that checks for tasks that need to start and launches them when necessary The Repox2Sip is a component that is responsible for the integration between REPOX and the Europeana Sip Manager which is the ingestion tool responsible for creating the Submission Information Packages SIP The integration process is described in details in section 3 4 When available the REPOX Service will be able to use the XSLT transformations provided by the Europeana Metadata Registry EUMDR to perform transformations from each original data format into the ESE profile 9 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype 3 2 Data Structures class REPOX Classes Aggregator TaskManager manages Data Provider Metadata Format destination Data Source Schedul
23. s are provided e Support for multiple metadata formats s A metadata repository service for making the harvested metadata records available to the Europeana e A scalable solution able to hold a large number of aggregators OAI PMH data providers with a virtually unlimited number of records 6 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a ey europeana connect 2 Definitions This document uses the following definitions some of the definitions are redundant as recognition of other usages that also are common to be found in related documents and bibliography e Data Provider an entity that contributes with Descriptive Metadata Sets for Europeana e Data Source a service or location under the responsibility of a Data Provider from which it is possible to harvest at least one Data Set e Data Set a defined group usually named of collection of one or more Data Records e Data Record An instance of a structure of data attributes defined according to a Data Schema e Data Model schema The definition of the data elements and the rules governing the use of these data elements to describe a resource e Harvest the process to collect Data Sets from a Data Provider e Data Provider an entity that has relevant resources on line and decides to make available their respective descriptive metadata to Europeana directly through the OAI PMH protocol
24. s not needed only the timestamp table is used There is another reason for using two tables they represent indexes of the record so if it is necessary to have another index for the record in the future another table would be added with the value for that index The tables are dynamically created by prefixing the Data Source identifier This way every Data Source has its indexes separated which makes managing creating editing and deleting and accessing the data faster and easier 11 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype class REPOX Database DB Table DB Table DataSource_record DataSource_timestamp deleted boolean deleted boolean value blob value date PK PK id long id long UNIQUE UNIQUE nc string DC string Figure 3 REPOX database 3 3 2 Repox2Sip database Tables in the Repox2Sip database contain information needed to create Submission Information Packages and information needed to synchronize the integration with the Sip Manager 12 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype europeana connect class Europeana Database Repox Europeana Aggregator aggregatoriD long home_page String id URI name String name_code String Sip Europeana UserContribution all the ESE field
25. ttp www loc gov standards iso639 2 or the string mul for multiple languages Only UTF 8 encoding for harvested files The value of the field q name is the qualified name used by the provider to identify the root element of the metadata record The field id_q_name is the xpath expression identifying the record id If the value is empty this means that the REPOX will automatically generate the identifier The name is the name of the data set sent by provider may be empty the name_code is the name created by Europeana e g 03901_Ag FR MCC joconde The field file_name contains the original request as single file for traceability The description is a REPOX mandatory field and describes the data set this field is used for OAI server The strategy field indicates the ingestion strategy adopted by the harvester for a specific data set Possible values e DataSourceDirectorylmporter e DataSourceQOai e DataSourceZ3950 About item_type currently ESE is the only accepted metadata format but in the future we almost certain extend the harvesting to other formats such as LIDO The values of oa _set field are defined here http www openarchives org OAl openarchivesprotocol html Set 14 23 EuropeanaConnect D5 3 1 Europeana OAI PMH Infrastructure Documentation and final prototype g a AO europeana connect s Request table This table identifies a specific harvesting for a given data set It also indicates if this requ
26. user interface and program logic There is too much programming logic in the Ul layer The JSTL tag lt c if gt is used too often Figure 7 This makes the jsp code too complicated e JSF vs JSP To simplify the development of web based user interfaces JSF should be used rather than JSP But this means to port REPOX from a Request based framework Stripes or Spring MVC to a Component based framework The Spring Web Flow 2 Distribution Spring Faces Spring Spring Web Flow JavaScript Spring Web MVC Figure 6 The Spring Web Flow 2 Distribution A FlashScope is an object that can be used to store objects and make them available as request parameters during this request cycle and the next one http stripes sourceforge net docs current javadoc net sourceforge stripes controller FlashScope htm 19 23 connect europeana lt p1 gt SE DUNOSE JEP an pea eed yuy gt lt 40443 Aay shessou yuy gt ee lt f LJDUS DEE eaunoseiep Asy abessam yuy gt SWEU SSP LO SounoSelep Oe UeyM a gt lt UdYM d gt lt OS6EZS3unoseieg gossez xodad isi Inid lt BuLposuqus1oeueyssaunoselep man ea 0073 lt 3500y3 3 gt lt WOYM 3 f gt sf SULEU96O ZOST4OLeualL aaunogeiep Aey abessam quy gt SSP JUOLIPLUSIS dugos L ADSeU1S1OPU1 xe SsoUNOSPlep y zeit U yM J gt lt UdyM I gt 2S Soll July gt Lal USayM 3 gt lt WOYM 3 f gt sf 60 705 TUO1lEUe1L saunoseiep Aey abessam us S
27. y Derby database and the REPOxX installation at Europeana that uses Repox2Sip and Derby we realize that Europeana performance is much slower the performance of Repox2Sip should be improved in the near future 23 23
Download Pdf Manuals
Related Search
Related Contents
Manual de Instruções www.knick.de SE 605 H (X) MS NX-VFC-e 型 User Manual - GT series Motion Controllers PROTECTA™ VR D364VRG - Medtronic Manuals: Region AERO 1300 Samsung S1060 Kullanıcı Klavuzu Luma Comfort Humidifier HCW10B User's Manual X9DBL-3 X9DBL-3F X9DBL-i X9DBL-iF VULCAN ECHANGEUR DE CHALEUR 付(て不豆日言 三養 量も有 ざ ま す 安全に 口 電動フロードキャスターを Copyright © All rights reserved.
Failed to retrieve file