Home

D4.2.1 Information and Data Lifecycle Management_ - iot

image

Contents

1. y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Cultivate resilient smart Objects for Sustainable city applicatiOnS Grant Agreement N9 609043 D4 2 1 Information and Data Lifecycle Management Software prototype Initial WP4 Information and Data Lifecycle Management Version Due Date 30 6 2014 Delivery Date 24 7 2014 Nature P Dissemination Level PU Lead partner IBM Authors Jozef Krempasky ATOS Achilleas Marinakis NTUA Eran Rom IBM Paula Ta Shma IBM Internal reviewers Adnan Akbar Univ Surrey Date 19 07 2014 Grant Agreement number 609043 Page 1 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial SEVENTH FRAMEWORK PROGRAMME Version Control WWVW iot cosmos eu The research leading to these results has received funding from the European Community s Seventh Framework Programme under grant agreement n 609043 Version Date Author Author s Organization Changes 0 1 17 07 2014 Paula Ta Shma and IBM First version for co authors internal review 0 2 23 07 2014 Paula Ta Shma and IBM Second version co authors incorporating review comments Annexes Ne File Name Title Contents 1 o o NT 5 1 1 About this deliVerable cocionoionocinionoosioasavasicanasasacadarasanaoasasasasarasasasasasaradarasanandrac
2. 3 OpensStack Swift Apache 2 0 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 16 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial Open source Python modules 1 pyparsing MIT License 2 pyes new BSD licence 3 pika Mozilla Public Licence version 1 1 and GPL v2 0 or newer 5 2 5 Download Selected source code is available on the COSMOS SVN under SourceCode M10 Prototypes WP4 CloudStorage The metadata search source code should be considered confidential i e accessible only by COSMOS partners and reviewers from the EU Date 19 07 2014 Grant Agreement number 609043 Page 17 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial 6 Cloud Storage Storlets 6 1 Implementation 6 1 1 Functional description Storlets are computational objects that run inside the object store system Conceptually they can be thought of as the object store equivalent of database store procedures Please see D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document section 4 4 regarding motivation and main innovations 6 1 1 1 Fitting into overall COSMOS solution Data in COSMOS will be store as objects in the cloud storage Examples of such objects are energy and temperatur
3. DOWNNO 17 6 Cloud Storage StOrlets ccccccsessececeeecessessaececeescessesaseseeeesseesesaeaeeeesesseeeaeaeeeeeeeseesenaeas 18 6 1 Implementation A iii 18 Date 19 07 2014 Grant Agreement number 609043 Page 3 of 31 A y D4 2 1 Information and Data Lifecycle Management Software prototype Initial cosmos 6 1 1 Functional descriptiON ocoocococcnncnonononoonnnnnnncanonononnnnnonononnnonnnnnnnnnnnnnnnr nn nnnnncnnnnnnns 18 6 1 2 Technical descriptas idas 19 6 2 Delivery and USAR iiss ses ente ferae adds 20 6 2 1 Package information x reed tete led 20 6 2 2 Installation iristr ctiohs 2 1 2 eret retento ici acabadas 21 6 2 3 User Mandale LE 22 6 2 4 Licensing INFOPMATION cccccccccsssssssnsceceeecesseseeaeaeeeeecesseeaaeseeeeseesseseaaeaeeeesensees 28 6 2 5 PIU A ia 28 7 Cloud Storage Security and Privacy eene enne nnns enn nins 29 74 ImiplemeritatiOn corr eren re repr ter EE REPRE EARN ER YER STA E aaee EY YER ERR 29 7 1 1 Functional descriptiON ocoooocccncncnononooooncnnnnnnnnnnonnnnnnnnnncnnnnononnnnnnnnnnnnnrnnoncnncnnannnns 29 7 1 2 Technical descriptione tester tiere ao te ases es Ver Te 29 7 2 Delivery ANG true EE 30 7 2 1 Package information ccscccccccccsssessssceeeeecesseseaeseceeecesseeeaeaeeeeeesssessaaeaeeeeseesees 30 7 2 2 Installation inStrUCtIONS oonoccconcccnonnnnnonnnnnaccnononononnnnnn n
4. appears below For examples take a look at the json files in the swift _deployment cluster_config directory cluster wide constants a json file representing the cluster wide constants For an example take a look at swift deployment cluster wide constants This constants file replaces the constants previously in package deployment constants py local install sh the script to be run on each cluster node It can be found in the deploy directory 6 2 3 User Manual Overview This section describes how to write deploy and execute a storlet The instructions are user oriented and assume you already have a storlet enabled swift cluster deployed Storlets can be invoked as follows 1 As part of a GET where the object s data appearing in the GET request is the storlet s input and the response to the GET request is the storlet s output 2 As part of a PUT where the request body is the storlet s input and the actual object saved in Swift is the storlet s output How to Write a Storlet In this paragraph we cover 1 How to write a storlet 2 The best practices of writing a storlet 3 StorletTranscoder An example of a storlet Writing a Storlet Storlets can currently only be written in Java To write a storlet you will need the storletcommonapi 10 jar which is built as part of the installation process Import the jar to Java project in Eclipse and implement the com ibm storlet common IStorlet interface The interface has a
5. of the storlets and metadata search components 7 1 2 2 Components description This work is not a separate component but rather is part of the storlets and metadata search components 7 1 2 3 Technical specifications The facial blurring privacy preserving storlet uses the OpenCV open source computer vision and machine learning software library See http opencv org Storlet sandboxing has been implemented as part of the storlets implementation using LXC containers Metadata search security has not yet been implemented and will be implemented in a later stage of the project 7 2 Delivery and usage 7 2 1 Package information Please see the relevant sections for storlets and metadata search 7 2 2 Installation instructions Please see the relevant sections for storlets and metadata search 7 2 3 User Manual Please see the relevant sections for storlets and metadata search 7 2 4 Licensing information Please see the relevant sections for storlets and metadata search In addition OpenCV is released using a BSD licence 7 2 5 Download Please see the relevant sections for storlets and metadata search Date 19 07 2014 Grant Agreement number 609043 Page 30 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial 8 Conclusions This document describes the prototypes for the Information and Data Lifecycle Management Work Package Each component has been i
6. 4 1 1 Functional description COSMOS platform needs to interoperate with Virtual Entities provided by different vendors These entities may run on a variety of platforms and they need to exchange information such as experience between them as well as with COSMOS subsystems The message bus provides solution for connection of independent components through message exchange mechanism 4 1 1 1 Fitting into overall COSMOS solution The purpose of the message bus is to integrate COSMOS components as well as external components such as Virtual entities For the demo purposes components will exchange messages according to static message routing configuration between publishing and listening components 4 1 2 Technical description 4 1 2 1 Prototype architecture 1 Publish 2 Receive Figure 5 Message bus overview From the high level perspective there are two roles interacting with the message bus a producer and a consumer Producer sends messages and a consumer receives messages The more description of the high level architecture of the message bus is described in chapter 4 6 Message Bus of the D4 1 1 document 4 1 2 2 Components description There are no custom components introduced for the prototype The information about data format adapters is described in section 4 6 2 of the D4 1 1 document 4 1 2 3 Technical specifications The RabbitMQ is implemented on top of the Erlang virtual runtime y D4 2 1 Information an
7. 9 07 2014 Grant Agreement number 609043 Page 28 of 31 7 Cloud Storage Security and Privacy 7 1 Implementation 7 1 1 Functional description Note that this section describes work belonging to the WP3 work package It belongs in this document also according to the DoW because its prototype source code is closely tied to the prototype source code of the Cloud Storage components which belong to WP4 There are many important security and privacy aspects related to cloud storage We mention here 3 security and privacy aspects of the cloud storage components developed for COSMOS 1 Privacy preserving storlets examples are e A facial blurring storlet which operates on images stored in the cloud storage It detects human faces and blurs the details so that the person cannot be identified e A storlet which masks exact street addresses and reveals only the neighbourhood or postcode e Astorlet which masks the exact GPS location and reveals only an approximate location Sandboxing of storlets e Storlets are sandboxed using linux containers and are only given access to the storage objects they are authorized to access They are not given permissions to access the network or the file system of the underlying container This allows running possibly buggy or potentially malicious code written by a wide range of users on the cloud storage system while still protecting the system as a whole as well as the rest o
8. Swift URL again using the pre configured account http sde softlayer com AUTH_2dc1440a41e94fc696bced36c6e3c249 my_container example pdf Here is how we can invoke the storlet using Curl where auth_header is the X Auth Header used with Swift curl i X GET http sde softlayer com v1 AUTH 2dcl440a41e94fc696bced36c6e3c249 my c ontainer example pdf H auth header H X Run Storlet storlettranscoder 10 jar Note the extra header X Run Storlet specifying the name of the storelt to execute When this header is specified the storlet engine wsgi middleware intercepts the request activates the storlet and returns the computation result as a repsonse To invoke a storlet whose logs will be available as an object use the below Note that a container named storletlog needs to be created under the account prior to this curl i X GET http sde softlayer com v1 AUTH 2dcl440a41e94fc696bced36c6e3c249 my c ontainer example pdf H auth header H X Run Storlet storlettranscoder 10 jar H X Storlet Generate Log True Once executed with the generate log header set to true one can download the resulting object as follows Note that object name is derived from the storlet name truncating the version number suffix and adding a log suffix Date 19 07 2014 Grant Agreement number 609043 Page 27 of 31 N N D4 2 1 Information and Data Lifecycle Management Software prototype Initial cosmos c
9. UTH PORT 5000 ACCOUNT service D4 2 1 Information and Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 26 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial cosmos USER NAME swift PASSWORD password os options tenant name ACCOUNT url token c get auth http AUTH IP AUTH PORT v2 0 ACCOUNT USER NAME PASSWORD os options os options auth version 2 0 put storlet object url token storlettranscoder 10 jar tmp com ibm storlet transcoder TranscoderStorlet commons logging 1 1 3 jar fontbox 1 8 4 jar jempbox 1 8 4 jar pdfbox app 1 8 4 jar put storlet dependency url token commons logging 1 1 3 jar tmp put storlet dependency url token fontbox 1 8 4 jar tmp put storlet dependency url token jempbox 1 8 4 jar tmp put storlet dependency url token pdfbox app 1 8 4 jar tmp How to Execute a Storlet Once the storlet and its dependencies are deployed the storlet is ready for execution and can be invoked Invocation via PUT and GET involves adding an extra header to the Swift original PUT GET requests Below we invoke the TranscoderStorlet in both PUT and GET Let us assume that we have uploaded a pdf document called example pdf to a container called my_container as appearing in the following
10. aradaas 5 1 2 Document Structute t dazsedccies cessus iiaeeeaino darei RAEE se deseo Deve cca PO seats 5 2 Complex Event Processing cccssssssccececeesesesnsaecececessesesaeseeeeecuseeseaaeseceeecuseesesaeaeeeeeesseesenaeas 6 2 1 Implementation iter dci 6 2 1 1 Functional description eese eene enne nennen nnn nsa enn niin 6 2 1 2 Technical description elei n inerek sse nano 7 2 2 Delivery and us aR Oise 229 dete tede mebo tested 9 2 2 1 Package information eer er RR RE ERI 9 2 2 2 Installation instructions eese nnne nnn 9 2 2 3 User Manual irris LITE 10 Date 19 07 2014 Grant Agreement number 609043 Page 2 of 31 A y D4 2 1 Information and Data Lifecycle Management Software prototype Initial cosmos 2 2 4 Licensing INFOPMATION c cccccccecsssssessececeeecesseseaeseceeecesseeaaeseeeeseessesesaeaeeseseusees 10 2 2 5 bI ger 10 3 Data MAD DIR Ett deidades 11 3 1 Impleme ritatiOn tee ttepo I ete tpe tree Rhee er ens 11 3 1 1 Functional descriptif seisein ida 11 3 1 2 Technical description ccccccccecessssessececececesseseaececeeecssceseeaeaeceeecesseeaaeaeeeeesesees 11 32 Delivery and Usage ite cheese ninas sitas 11 3 2 1 Package intobtmatl ON essee e A ESPERE IRURE RE RES SRE IN ERES EE ERR 12 3 2 2 Installation instructions essen entente nennen
11. ata Lifecycle Management Software prototype Initial cosmos 2 1 2 Technical description The ability to dynamically change the evaluation of Complex Event Processor rules is provided by CEP Management service via REST API 2 1 2 1 Prototype architecture Application VE CEP Management service Jersey TCP IP RESTfull web services framework middleware i e R 8 o9 O e fe S e SE ws x 2 35 o 2 E gt je O Apache Tomcat host Figure 2 CEP Management service architecture The figure 2 describes more detailed architecture of the CEP Management service For high level architecture please see D4 1 1 document 2 1 2 2 Components description For communication with external clients a CEP management service utilizes Jersey 2 framework which offers support for seamless exposing of data in variety of representation media types without a need to implement low level communication details For hosting purposes we decided to use Apache Tomcat 3 web server which is directly supported by the Jersey framework and 64bit Linux operating system In order to support distributed CEP deployment administration and modification of all running CEP instances is controlled through single CEP management service 2 1 2 3 Technical specifications The prototype will be deployed on and executed by single 64bit Linux operating system Virtual entities and applications will be deployed on their
12. che Tomcat 3 web server 2 1 2 5 Complex Event Processor In order to achieve high and stable event evaluation rate as well as low end to end latency the Complex Event Processor is implemented in C Distributed CEP components utilize enhanced middleware based on Zero MQ 4 for fast and reliable data transfer Date 19 07 2014 Grant Agreement number 609043 Page 8 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial CEP extensibility Event Collector Event Detector Event Publisher Network Data Format Event Evaluation Network Data Format Protocol Conversion Plugin Protocol Conversion Plugin Plugin Plugin Plugin Communication Network Figure 4 Extensibility via plugins The figure 4 illustrates extension possibilities of the CEP which is primarily achieved through SPI plugin mechanism A new Json data format plugins have been introduced for demo purposes 2 2 Delivery and usage 2 2 1 Package information The delivery contains following files The delivery of CEP contains following files e General files o Manifest txt o License txt e Configuration files o Config solcep conf xml Default Configuration file o Config solcep detector xml Standalone detector configuration o Config solcep collector xml Remote event collector configuration o Config solcep publisher xml Remote event publisher configuration o Config detect dolce Dolce detecti
13. d Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 13 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial From the client perspective RabbitMQ provides official support for all mainstream operating systems and programming languages In addition to that RabbitMQ community has created numerous adapters and tools for specialized tasks such as integration with other existing platforms 4 2 Delivery and usage 4 2 1 Package information The RabbitMQ is not delivered as a standalone package but is installed using OS specific repository 4 2 2 Installation instructions 1 As mentioned in the previous chapter the RabbitMQ runs on top of the Erlang 1 virtual runtime Therefore it is necessary to install erlang before actual installation of the RabbitMQ 2 Install the RabbitMQ server Packages are available on the http www rabbitmq com download html or in OS specific repository By default RabbitMQ server is installed as an OS service 3 Install the RabbitMQ Management Plugin The purpose of this plugin is to provide Web based management functionalities 4 2 3 User Manual The manual how to connect to the RabbitMQ and exchange messages is available online at http www rabbitmq com documentation html 4 2 4 Licensing information The RabbitMQ is protected by the Mozilla Public License 4 2 5 Do
14. d prototype introduces ability to dynamically at run time change the evaluation of Complex Event Processor rules The main motivation for implementation of this prototype is to increase applicability and flexibility of Complex Event Processor for event detection and monitoring features provided by COSMOS These features are further described in WP6 2 1 1 1 Fitting into overall COSMOS solution S Bclo MEME o 00 o ooo anuuuuuuuuuuuo mm VE Developer Sa amp lt lt subsystem gt gt Complex Event Processor XML Configuration ard dese Event Sink Event Source MEM E subsystem amp 8 sss Virtual Entity 99 E Pu a L essage Bus Topic Exchange e lt lt subsystem gt gt 8 J Decision making Services M Figure 1 Situation monitoring and detection subsystem As depicted in figure 1 prototyped solution for situation detection and monitoring functionality collects information mainly from virtual entities either directly or through the message bus and detected situations are consumed by applications or decision making components within COSMOS such as decentralized management described by WP5 More detailed technical information can be seen in D4 1 1 document Date 19 07 2014 Grant Agreement number 609043 Page 6 of 31 SN j D4 2 1 Information and D
15. e data do a getStream to get a java io InputStream on which you can just read The StorleOutputStream is a base class for the StorletObjectOutputStream When the storlet is invoked it will never be with the base class In the PUT scenario the storlet is called with an instance of StorletObjectOutputStream You will need to first need to call the setMetadata function to set the appropriate metadata of the to be created object and then use getStream to get a java io OutputStream on which you can call write with the content of the object It is important to note that metadata cannot be set once you started to stream out data via the java io OutputStream Also one needs to set the metadata atmost 40seconds from invocation otherwise a timeout occurs The StorletLogger class supports a single method called emitLog and accepts only String type Each invocation of the storlet would result in a newly created object that contains the emitted logs This object is located in a designated container called storletlog and will carry the name lt storlet_name gt log Creating an object containing the logs per request has its overhead Thus the actual creation of the logs object is controlled by a header supplied during storlet invocation More information is given in the storlet execution section below Date 19 07 2014 Grant Agreement number 609043 Page 23 of 31 When invoked via the Swift GET REST API exact details b
16. e data for a building in Camden for a particular week or the movements of buses in a Madrid bus line over the course of a particular day Another example of a data object is images uploaded to the COSMOS system for example pictures or video taken by a bus camera or by COSMOS users mobile phones These objects are stored in OpenStack Swift cloud storage We augment this cloud storage with storlets which enable computation to take place close to the data objects For example storlets could perform privacy preserving filtering operations or could be used to prepare data for visualization or reporting purposes Storlets can also be used to pre process data before it is fed into an analytics or machine learning computation Alternatively the machine learning computation could be run as a storlet The use of storlets has several advantages in the COSMOS context e Avoid sending large amounts of data across the network apply storlets to send only the data which is necessary to send For example o pre process data thereby reducing its size and perform some needed calculations before sending to machine learning for further processing o apply machine learning algorithms as storlets directly to the data and avoid the need to send data across the network altogether o prepare data for visualization Such data may be presented to the user by a browser or new objects may be created for visualization purposes In the latter case if these objects were to be cr
17. eated outside the cloud storage the corresponding data would need to be sent across the network in both directions assuming it needs to be retained in the cloud storage for future use This can be avoided using storlets e Apply privacy preserving filters so that only privacy filtered data leaves the cloud storage o Such filters could transform or hide certain information Examples of privacy preserving storlets will be described in section A diagram of how storlets Analysis Close to the Data fit into the WP4 architecture can be found in section 3 High Level Architecture of the D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document In addition deliverable D2 3 1 discusses the COSMOS overall architecture Date 19 07 2014 Grant Agreement number 609043 Page 18 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial 6 1 2 Technical description 6 1 2 1 Prototype architecture The prototype is built of the following components See figure 8 in section 4 4 2 3 in the M8 scientific report document 1 A Swift cluster augmented with a middleware plug in that allows invoking the storlet processing 2 ALinux container that runs on each of the cluster nodes 3 A per storlet daemon A JVM process that runs inside the Linux container The daemon loads a storlet code on startup and awaits execution requests 4 An agent running in
18. ection below Best Practices of Storlet Writing Storlets are tailored for stream processing that is process the input as it is read and produce output while still reading In other words a merge sort of the content of an object is not a good example for a storlet as it requires to read all the content into memory random reads are not an option as the input is provided as a stream While we currently do not employ any restrictions on the CPU usage or memory consumption of the storlet reading large object into memory or doing very intensive computations would have impact on the overall system performance Once the storlet has finished writing the response it is important to close the output stream Failing to do so will result in a timeout With the current implementation a storlet must start to respond within 40 seconds of invocation Otherwise Swift would timeout The call to setMetadata must happen before the storlet starts streaming out the output data Note the applicability of the 40 seconds timeout here While this might be obvious it is advisable to test the storlet prior to its deployment The storlets are executed in an open jdk 7 environment Thus any dependencies that the storlet code requires which are outside of open jdk 7 should be stated as storlet dependencies and uploaded with the storlet Exact details are found in the deployment section below Storlet Deployment Storlet Deployment Principles Storlet deployment i
19. eement number 609043 Page 25 of 31 from swiftclient import client as c def put storlet object url token storlet name local path to storlet main class name dependencies Delete previous storlet resp dict metadata X Object Meta Storlet Language Java X Object Meta Storlet Interface Version 1 0 X Object Meta Storlet Dependency dependencies X Object Meta Storlet Object Metadata no X Object Meta Storlet Main main class name f open s s local path to storlet storlet name r content length Non response dict c put object url token storlet storlet name f content length None None application octet stream metadata None None None response print response f close status response get status assert status 200 or status 201 def put storlet dependency url token dependency name local path to dependency metadata X Object Meta Storlet Dependency Version 1 f open s s local path to dependency dependency name pt content length Non response dict c put object url token dependency dependency name f content length None None application octet stream metadata None None None response print response f close status response get status assert status 200 or status 201 AUTH IP 127 0 0 1 A
20. elow the invoke method will be called as follows 1 2 3 4 The inStreams array would include a single element representing the object to read The outStreams would include a single element representing the response returned to the user Anything written to the output stream is effectively written to the response body returned to the user s GET request The parameters map includes execution parameters sent These parameters can be specified in the storlet execution request as described in the execution section below IMPORTANT Do not use parameters that start with storlet these are kept for system parameters that the storlet can use Currently we have storlet execution path which carries the full path as seen by the code running in the container where the storlet code runs This is also where all dependencies reside A StorletLogger instance When invoked via the Swift PUT REST API the invoke method will be called as follows 1 2 3 The inStreams array would include a single element representing the object to read The outStreams would include a single element which is an instance of StorletObjectOutputStream The parameters and StorletLogger as in the GET call The compiled class that implements the storlet needs to be wrapped in a jar This jar must not include the storletcommonapi 1 0 jar Any jars that the class implementation is dependent on should be uploaded as separate jars as shown in the deployment s
21. eperated list of dependent jars In our case commons logging 1 1 3 jar fontbox 1 8 4 jar jempbox 1 8 4 jar pdfbox app 1 8 4 jar X Object Meta Storlet Object Metadata Currently not in use but must appear Use the value no X Object Meta Storlet Main The name of the class that implements the IStorlet API In our case com ibm storlet transcoder TranscoderStorlet 2 The jar files that the storlet code is dependent on The below jars are the storlettranscoder dependencies These should be uploaded to a container named dependency The metadata that must accompany a dependency is its version as follows X Object Meta Storlet Dependency Version While the engine currently does not parse this header it must appear commons logging 1 1 3 jar jempbox 1 8 4 jar fontbox 1 8 4 jar pdfbox app 1 8 4 jar If one wishes to update the storlet just upload again the engine would recognize the update and bring the updated code Important Currently dependency updates are not recognized only the Storlet code itself can be updated Deploying a Storlet with Python Here is a code snippet that uploads both the storlet as well as the dependencies The code was tested against a Swift cluster with 1 Keystone configured with a service account having a user swift whose password is password 2 Under the service account there are already storlet dependency and storletlog containers Date 19 07 2014 Grant Agr
22. f the cloud storage data Metadata search whose results contain only resources that the user is authorized to access e The functionality was described in document D3 1 1 End to End Security M8 deliverable 7 1 1 1 Fitting into overall COSMOS solution 1 Privacy preserving storlets can be applied when objects are retrieved before returning data to the user In this way complete raw data can be stored within the cloud storage but only privacy filtered data is returned to the user Sandboxing of storlets is especially important in the future if we want to allow arbitrary users to write storlet code for the COSMOS platform Metadata search whose results contain only resources that the user is authorized to access this is important in COSMOS in order to ensure that metadata search does not enable users to have access to more data or metadata than they should be 7 1 2 Technical description This work is not a separate component but rather is part of the storlets and metadata search components Therefore please see the relevant sections of this document describing storlets and metadata search Date 19 07 2014 Grant Agreement number 609043 Page 29 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial y D4 2 1 Information and Data Lifecycle Management Software prototype Initial 7 1 2 1 Prototype architecture This work is not a separate component but rather is part
23. he scripts will search for the bin directory under the nemo storlets module Once built in eclipse the first step is to go to the root dir of the module and do ant all Step 4 Deploying the code 1 Make sure you got Ixc installed on all nodes apt get install Ixc 2 Make sure you have paramiko and scp python libs installed If prior to this deployment you have deployed Swift and Keystone using swift deployment then do not worry about it Otherwise do cdswift deployment swift cluster install python install dependencies py ssh 3 Run the following from the nemo storlet deploy directory y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 21 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial python management_install py install root workspace storlets nemo_storlet storlets_modules json cluster_configuration cluster_wide_constants local_install sh all The parameters are Install in the future we will also support remove root workspace storlets nemo_storlet the path to the root directory of the nemo storlet module storlets modules json a json file representing a list of all supported storlet modules Located in the swift deployment module cluster configuration a json file representing the cluster configuration more information on the cluster configuration file
24. iption 5 1 2 1 Prototype architecture Regarding the prototype architecture please see section 4 3 2 Metadata Search Architecture of the D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document Diagrams depicting the architecture can be found there 5 1 2 2 Components description Regarding the components description please see section 4 3 2 Metadata Search Architecture of the D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document Text describing the components in the architecture can be found there 5 1 2 3 Technical specifications This prototype is based on code developed by IBM SoftLayer and adapted for the needs of COSMOS We designed and implemented a new search API which supports complex queries For example one can search for objects meeting multiple constraints We also implemented data type support which is needed for COSMOS data The prototype uses the following open source components Elastic Search a search engine built using the Lucene search library see http www elasticsearch org Date 19 07 2014 Grant Agreement number 609043 Page 15 of 31 RabbitMQ Rabbit MQ is used to queue the metadata indexing requests and submit them in bulk to Elastic Search see http www rabbitmq com OpenStack Swift object storage see http docs openstack org developer swift The source code is deve
25. ld be e Timestamps e Information like the number of a bus line the number of a bus etc A diagram of how Data Mapping fits into the WP4 architecture can be found in section 3 High Level Architecture of the D4 1 1 In addition D2 3 1 discusses the COSMOS overall architecture 3 1 2 Technical description This section describes the technical details of the implemented software 3 1 2 1 Component description Regarding the component description please see section 4 1 2 of the D4 1 1 Text describing the design decisions and details can be found there 3 1 2 2 Technical specifications The prototype uses the following open source components e Rabbit MQ which is used as a message broker It allows publishers to send messages and subscribers to receive them please see http www rabbitmq com e OpenStack Swift which is used in order to store these messages as data objects in the cloud storage please see http docs openstack org api openstack object storage 1 0 content storage object services html The source code is developed in Java and uses the following Java ARchive jar files e json simple used for parsing json files e rabbitmq client a Java client for Rabbit MQ 3 2 Delivery and usage Date 19 07 2014 Grant Agreement number 609043 Page 11 of 31 3 2 1 Package information The delivered package contains the following folders e dependencies contains the jar files
26. loped in Python and uses the following open source Python libraries e pyparsing used for parsing the search API requests e pyes a python client for elastic search e pika a python client for Rabbit MQ 5 2 Delivery and usage 5 2 1 Package information The swearch_hrl package has the following structure e setup py python installation script e bin admin scripts e etc config files e swearch metadata index and search source code o middleware OpenStack Swift middleware e tests unit tests 5 2 2 Installation instructions 1 Install Elastic Search An installation script is provided in the swift deployment module described in the next section 2 Install RabbitMQ 3 Install OpenStack Swift 4 Install metadata search using the following command e sudo python setup py install 5 Setup the indexes using the following command e sudo python bin swearch prep 5 2 3 User Manual Once metadata search has been installed Swift objects which are created are automatically indexed according to their metadata Metadata search is accessed using an extension of the OpenStack Swift REST API The metadata search API was described in Appendix 7 3 Cloud Storage and Metadata search API in the D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document 5 2 4 Licensing information Dependencies 1 Elastic Search Apache 2 0 2 RabbitMQ Mozilla Public Licence version 1 1
27. mentioned above e input contains the json files to be published through Rabbit MQ server e src contains the JAVA files 3 2 2 Installation instructions Please follow these steps to install and start up the prototype Java SE EE Runtime Environment is a prerequisite e Install OpenStack Swift e Install Rabbit MQ server e Download the package and install it under the main root of your machine e Open the package through an IDE like NetBeans 8 0 Eclipse Kepler 4 3 0 etc e Run the Receiver java continuously e Publish the input files through the Sender java 3 2 3 User Manual For detailed information about how to configure the Rabbit MQ publisher and subscriber please see the section 4 6 1 of the D4 1 1 Please see also http www rabbitmq com tutorials tutorial five java html For detailed information about how to use Openstack Swift please see http docs openstack org api openstack object storage 1 0 os objectstorage devguide 1 0 pdf 3 2 4 Licensing information Dependencies 1 json simple Apache 2 0 2 RabbitMQ Mozilla Public Licence version 1 1 3 OpensStack Swift Apache 2 0 3 2 5 Download The source code is available on the COSMOS SVN under SourceCode M10 Prototypes WP4 DataMapping y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 12 of 31 4 Message Bus 4 1 Implementation
28. mplemented independently and now the various components need to be integrated This is the initial prototype for our work in COSMOS which will be revised in years 2 and 3 of the project Date 19 07 2014 Grant Agreement number 609043 Page 31 of 31
29. n nana conan cnn nn nnne 30 7 2 3 User Mandale Ateca 30 7 2 4 Licensing into mati viii aiii 30 7 2 5 DOWN merindad tarda 30 8 COMCIUSIONS ee ERR 31 Date 19 07 2014 Grant Agreement number 609043 Page 4 of 31 1 Introduction 1 1 About this deliverable This document is the complement to the delivered software as prototype for deliverable D4 2 1 Information and Data Lifecycle Management Software prototype Initial For information on the motivation architecture and design of the components in this work package please refer to document D4 1 1 Information and Data Lifecycle Management Design and open specification Initial 1 2 Document structure In this document there is a section for each component of WP4 This includes sections on Data Mapping CEP Message Bus and 2 sections on Cloud Storage Metadata Search and Storlets In addition there is an additional Cloud Storage section describing Security and Privacy this describes work belonging to WP3 End to end Security and Privacy but which is part of the current deliverable D4 2 1 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Date 19 07 2014 Grant Agreement number 609043 Page 5 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial 2 Complex Event Processing 2 1 Implementation 2 1 1 Functional description The delivere
30. nd the daemon factory and storlet daemon on the Linux container side The channel is based on unix domain sockets 6 1 2 3 Technical specifications Our prototype is built over Swift version 1 12 Swift as well as our middleware is written in Python using the WSGI framework The daemon factory is written in python the storlet Date 19 07 2014 Grant Agreement number 609043 Page 19 of 31 daemon as well as the Storlet API library are written in Java Schannel is written in C Python and Java JNI Most of the code is based on standard Python and Java libraries The below libraries are used by various parts of the Storlet engine e Jsoncsimple Apache 2 0 logback classic 1 1 2 Eclipse Public License v 1 0 GNU Lesser General Public License e logback core 1 1 2 Eclipse Public License v 1 0 GNU Lesser General Public License Sl 4j api 1 7 7 MIT license The below libraries are used as part of an example storlet that transform pdf to text and extract metadata from pdf commons logging 1 1 3 Apache 2 0 fontbox 1 8 4 Apache 2 0 e jempbox 1 8 4 Apache 2 0 pdfbox 1 8 4 Apache 2 0 6 2 Delivery and usage 6 2 1 Package information The code is made of two modules 1 swift deployment module This module has configuration files and installation scripts required for installing Swift with Keystone as well as scripts for doing cluster wide installation of storlets 2 nemo sto
31. nguage focusing on loT domain For detailed information about how to define custom events and event detection rules please refer to dolce language specification mentioned in D4 1 1 The dynamic changing of rules will be available via REST client 2 2 4 Licensing information Currently the SOL CEP is distributed as closed source software CEP Management service is distributed under Apache 2 0 license 2 2 5 Download The source code is available on the COSMOS SVN under SourceCodeM 10 Prototypes WP4 CEP Date 19 07 2014 Grant Agreement number 609043 Page 10 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial y D4 2 1 Information and Data Lifecycle Management Software prototype Initial 3 Data Mapping 3 1 Implementation 3 1 1 Functional description Data mapping will be used in COSMOS in order to collect raw data that is published from virtual entities through the message bus and store it as data objects with their associated metadata in the cloud storage Additional information on motivation can be found in section 4 1 1 Functional Overview of Deliverable 4 1 1 Information and Data Lifecycle Management Design and open specification Initial 3 1 1 1 Fitting into overall COSMOS solution In COSMOS we would like to be able to store objects with enriching metadata in order to enable search on them as described in chapter 5 This metadata cou
32. nnne 12 3 2 3 User Man al ED 12 3 2 4 Licensing INFOPMATION cccccccecessessssececececesseseaeseeeeecesseseaaeseeeesessseaaeaeeeeseeeees 12 3 2 5 DOWNMIOA Gs P teangeeboaseuit aaea aaa a E a a a aai 12 LEMMIIAELIIWIUMe E M 13 4 1 A e PP 13 4 1 1 Functional description iie ee A ida 13 4 1 2 Technical description at 13 4 2 Delivery arid usage ict ttt ette ee Pe ce eve ee ne oot aee e ecc euo 14 4 2 1 Package information escrire eia aan a Eia aaia nsei 14 4 2 2 Installation instructions essent nn cnn rca nnne nnne 14 4 2 3 User Marital PL 14 4 2 4 Licensing information icc teer dada 14 4 2 5 Download mme 14 5 Cloud Storage Metadata Search ccccccononocooncnncnnnnnonoonnnnnnnnnnnnononnnnnnnnnnnnnnnnnnnnnnnnnanann innen 15 5 1 Implementation octets innatas 15 5 1 1 Functional CESCLIPTION c cccccccessssssseceeececesseseaeseceeecesseseaaeseceeecesseaaeaeeeeseesees 15 5 1 2 Technical description 2 c eee et site reote eae Toate 15 5 2 Delivery and USage est m ote Pe E donne 16 5 2 1 Package intormatlori coit erede tre esee ete Ra 16 5 2 2 Installation inStrUCtIONS ooonncccnncccnonnnononnnnnannnonn conan cnn nn nn nana cnn nennen nnne 16 5 2 3 User Manual sariei aid 16 5 2 4 Licensing INFOPMATION cccccccecessesssseseceeecessesseaeseceescesseseaaeseeeeeeessesesaeeeeeeeceeees 16 5 2 5
33. on specification e System test files e Executables o Solcep ctrl Server control for Debian o Solcep Standalone SOL CEP binary o Plugins Network protocol and data format plugins The delivery of the CEP Management service contains following files e Libraries libraries mentioned above e Source files Java sources of this component e Configuration files 2 2 2 Installation instructions This installation manual assumes that following prerequisites are already installed and running Date 19 07 2014 Grant Agreement number 609043 Page 9 of 31 O Java SE EE Runtime Environment o Apache Tomcat It is recommended to create a new user and home directory before actual installation of the services by executing sudo adduser user name and log in as the new user Steps 1 Unpack the content of provided package tar xvzf custom location Cosmos CEP Services tar gz Ensure that the execution bits of services are enabled If not execute chmod x Solcep Copy control script solcep ctrl into etc init d directory Register service with the operating system infrastructure sudo update rc d add solcep ctrl defaults Review and update provided configuration files when interoperation with distributed CEP components is required 2 2 3 User Manual An event detection mechanism within SOL CEP is variation of rule based inference engine Rules are defined using specialized Dolce la
34. onment 1 Make sure you have an eclipse installation with pydev CDT and java 2 Checkout the Storlets and swift deployment repositories 3 The storlets repo has an eclipse project definition in its root directory nemo storlets you will need to use it so that the java code will get compiled Step 2 Configure your development deployment cluster 1 Edit your cluster configuration file Examples can be found in the swift deployment repo under the cluster config directory If you are on a dev machine you probably want to look at localhost json which has a single node 2 Make sure that each node to be installed with storlets has the role storlet Also make sure that the root password is updated 3 Edit your cluster wide constants file swift deployment cluster wide constants Leave it as is Just make sure you know where it is Note the file has an entry called Ixc device This entry points to a directory where all LXC related persistent data will be kept Make sure that 1 The directory exists 2 It has full permissions 777 To deploy storlets on a node or any storlet sub module as described below the node must have the role storlet Step 3 Building the code 1 Auto build the storlet java code 2 Use ant to build the sub module you are working with or all modules if you are about to deploy everything The storlet packaging scripts assume that the code was automatically built in Eclipse More specifically t
35. own execution environment and communicate with COSMOS through the message bus using standard network connection Date 19 07 2014 Grant Agreement number 609043 Page 7 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial Artefacts Cosmos Node artefact a CEP Management Service lt lt execution environment Linux x64 lt lt deploy gt gt r ps lt lt artefact gt gt amp _ lt lt deploy gt 2 _ r Complex Event Processor lt lt artefact gt gt a Message Bus lt lt artefact gt gt Virtual Entity lt lt internet gt gt lt lt execution environment gt gt Application Node lt lt execution environment gt gt lt lt internet gt gt VE Node Sensor Actuator En Figure 3 Deployment model The proposed deployment for prototype is described on figure 3 The Complex Event Processor and the Message bus are deployed on Linux as system services 2 1 2 4 CEP Management Service A CEP Management Service provides RESTful web service based on HTTP 5 methods and the concept of REST Service is accessible through central URI and supported MIME type is JSON 6 Service is implemented in Java by utilizing Jersey 2 reference implementation for the JSR 311 7 Java Specification Request specification The service itself is executed in the Java servlet container and hosted on the Apa
36. rlet module The module has the various components described above swift deployment module The module consists of the following cluster config directory A set of json files each describes a cluster where we install Swift Storlets and metadata search Used by the various installation scripts cluster wide constants A json file with installation defaults md search install Installation scripts for the metadata search components Swift cluster install Installation scripts for Swift and Keystone scp py master An LGPL library used by the installer for scp operations Paramiko An LGPL library used by the installer for ssh operations nemo storlet module The module consists of the following build xml ant build files schannel The implementation of the communication channel between the host and the Linux container mentioned above storlet daemon factory The implementation of the daemon factory mentioned above Storlet Samples Mainly the pdf to text converter storlet mentioned above Date 19 07 2014 Grant Agreement number 609043 Page 20 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial system tests A bunch of system tests storlet daemon The implementation of the storlet daemon mentioned above StorletManager A Java command line tool used for uploading storlets 6 2 2 Installation instructions Step 1 Preparing the Envir
37. s essentially uploading the storlet and its dependencies to designated containers in the account we are working with While a storlet and a dependency are regular Swift objects they must carry some metadata used by the storlet engine When a storlet is first Date 19 07 2014 Grant Agreement number 609043 Page 24 of 31 D4 2 1 Information and Data Lifecycle Management Software prototype Initial y D4 2 1 Information and Data Lifecycle Management Software prototype Initial executed the engine fetches the necessary objects from Swift and installs them in the Linux container Note that the dependencies are meant to be small Having a large list of dependencies or a very large dependency may result in a timeout on the first attempt to execute a storlet If this happens just re send the request again Following is an example for uploading a storlet that transforms pdf to text It is called TranscoderStorlet and has 4 dependencies 1 The storlet packaged in a jar In our case the jar was named storlettranscoder 10 jar The jar needs to be uploaded to a container named storlet The name of the uploaded storlet must be of the form lt name gt lt version gt The metadata that must accompany a storlet is as follows X Object Meta Storlet Language currently must be java X Object Meta Storlet Interface Version currenltly we have a single version 1 0 X Object Meta Storlet Dependency A comma s
38. side the Linux container used to control the per storlet daemons We refer to it as the daemon factory below 6 1 2 2 Components description The Storlet middleware The Storlet middleware is made of two pieces One piece is plugged into the Swift proxy server and the other to the Swift object server The role of the storlet proxy server middleware is twofold 1 To intercept storlets upload and validate that they carry all the necessary metadata e g the language in which they are written 2 Toauthorize storlet execution requests The roles of the storlet middleware in the object server are 1 Fetch a storlet code from the cluster upon first invocation and copy it into the Linux container 2 Bring up a daemon that can execute a certain storlet code Specifically this daemon loads the storlet code that was copied into the Linux container 3 Forward storlet execution requests coming from the user to the above daemon The Storlet Daemon The Storlet daemon is a generic daemon that can load given storlets and serve invocation requests on given data The Daemon Factory A daemon process brought up with the Linux container used to start and stop the execution of storlet daemons The Storlet API Library A library that defines the interface a storlet needs to support and the API s class definitions See section 7 4 1 in the M8 scientific report Schannel A communication channel between the Storlet middleware in the host side a
39. single method that looks like this public void invoke ArrayList StorletInputStream inStreams ArrayList StorletOutputStream outStreams Map lt String String gt parameters StorletLogger logger throws StorletException Date 19 07 2014 Grant Agreement number 609043 Page 22 of 31 y D4 2 1 Information and Data Lifecycle Management Software prototype Initial Here is a class diagram illustrating the classes involved in the above API m lt lt nterface gt gt IStorlet StorletLogger Ja m Invoke emitLog Strin Vf rayList lt StorletinputStream gt inputStreams o a 9 ArrayList lt StorletOutputStream gt outpuiStreams Map lt String String gt params StorletLogger logger Storletl nputStream getM etadata HashM ap lt String String gt getStream java io InputStream StorletOutputStream getMetadata HashMap String String gt getStream java io OutputStream StorletObjectOutputStream setM etadata Map lt String String gt The StorlelnputStream is used to stream in object s data into the storlet It is used both in the GET scenario as well as in the PUT scenario to stream in the object s content In the GET scenario it is the content of the object in the store to be processed and streamed to the user In the PUT scenario it is the content of the user s uploaded data to be streamed to the store as an object To consume th
40. url i X GET http sde softlayer com vl AUTH 2dcl1440a41e94fc696bced36c6e3c249 stor letlog storlettranscoder log Passing parameters to the storlets is done using the query string e g curl i X GET http sde softlayer com v1 AUTH 2dcl440a41e94fc696bced36c6e3c249 my c ontainer example pdf argl valuel amp arg2 value2 H auth header H X Run Storlet storlettranscoder 10 jar Now lets assume that we have a local file called example pdf and we want to keep it as text only Here is a regular PUT request curl i X PUT http sde softlayer com v1 AUTH 2dcl440a41e94fc696bced36c6e3c249 my c ontainer example txt H auth header F filedata tmp example pdf Here is how to PUT it invoking a storlet curl i X PUT http sde softlayer com v1 AUTH 2dcl1440a41e94fc696bced36c6e3c249 my c ontainer example txt H auth header H X Run Storlet storlettranscoder 10 jar F filedata tmp example pdf 6 2 4 Licensing information Swift is under apache license 2 0 LXC user space tools are under LGPL Otherwise we are using Python 2 7 standard libraries and standard openjdk 7 libraries The additional libraries licenses appear in the technical specification section above The storlets source code should be considered confidential i e accessible only by COSMOS partners and reviewers from the EU 6 2 5 Download Selected source code is available on the COSMOS SVN under SourceCode M10 Prototypes WP4 CloudStorage Date 1
41. wnload The source code is available on the COSMOS SVN under SourceCode M10 Prototypes WP4 MessageBus Date 19 07 2014 Grant Agreement number 609043 Page 14 of 31 5 Cloud Storage Metadata Search y D4 2 1 Information and Data Lifecycle Management Software prototype Initial 5 1 Implementation 5 1 1 Functional description Metadata search will be used in COSMOS in order to index objects according to metadata attributes and values and therefore enable search on them Additional information on motivation can be found in section 4 3 1 Metadata Search of deliverable 4 1 1 5 1 1 1 Fitting into overall COSMOS solution In COSMOS we would like to be able to index objects according to various properties such as e Timestamps e Geospatial locations e Textual information such as a residence street name e Numerical values such temperature readings This allows searching and retrieving objects according to their values for these properties A diagram of how metadata search fits into the WP4 architecture can be found in section 3 High Level Architecture of the D4 1 1 Information and Data Lifecycle Management Design and open specification Initial document In addition deliverable D2 3 1 discusses the COSMOS overall architecture Storlets described in the next section can both read and write metadata Metadata which is written is indexed and therefore becomes searchable 5 1 2 Technical descr

Download Pdf Manuals

image

Related Search

Related Contents

Philips F5343/NV/    Manual de Usuario Serie S  Mobile Scanner User`s Manual  M-Budget Mobile Data Manager  Application Note - NXP Semiconductors  Tennissense Hardware Demonstrator Rev 1  Sony XR-C2200 User's Manual  平成25年度独立行政法人国民生活センター業務実績報告書④  fag equipos y servicios de montaje y mantenimiento  

Copyright © All rights reserved.
Failed to retrieve file