Home
        as pdf
         Contents
1.     I   I   I    The replicas should be changed to fit the system   A replica will generally be connected to a specific physical location  though a physical location can have several replicas   These settings can be found under  settings common replicas       lt settings gt    lt common gt    lt replicas gt    lt replica gt    lt replicalId gt A lt  replicalId gt    lt replicaName gt ReplicaA lt  replicaName gt    lt replicaType gt bitArchive lt  replicaType gt    lt  replica gt    lt replica gt    lt replicalId gt B lt  replicalId gt    lt replicaName gt ReplicaB lt  replicaName gt    lt replicaType gt bitArchive lt  replicaType gt    lt  replica gt    lt  replicas gt    lt common gt    lt settings gt     The JMS broker is defined at the global level  and it should be set to the administation machine  e g  the machine with the  dk netarkivet common webinterface GUIApplication  the  dk netarkivet archive arcrepository ArcRepositoryApplication and the instances of  dk netarkivet archive bitarchive BitarchiveMonitorApplication should be run    This is defined in the settings   settings common jms broker       lt settings gt     lt common gt    lt broker gt kb test   adm    001 kb dk lt  broker gt   f  lt common gt    i  lt settings gt     If more replicas are wanted  they have to be defined in the settings at the deployGlobal level    Each replica needs a unique replicalId and replicaName  and it also needs the following applications   dk netarkivet archive bitarchive Bitarchiv
2.  6 0 07 is  specifically called here  though any Java version above 1 6 0 should be usable     Files    When deploy is run a number of files are created in the output directory  These includes scripts to install  start and kill the applications on the    distributed platform  Also the NetarchiveSuite package file is copied to this location  unless it already exists in the output directory      In addition to a NetarchiveSuite settings file  the following configuration files are also created on a per machine or per application basis     Jmxremote password file    This file is created from scratch for each machine  A large instructional header for the use of the jmxremote  password is initially created for the  file  then the jmx username and jmx password for the monitor and for heritrix is appended  It is only the jmx logins  username and password    which is used by the applications     The login variables for the monitor are found through the paths in the settings for any of the applications  settings monitor  jmxUsername  and settings monitor jmxPassword     The login variables for heritrix are found through the paths in any of the application settings   settings harvester harvesting heritrix  jmxUsername and settings harvester harvesting heritrix  jmxPassword     If any application has a monitor defined in the settings file  the monitor must have a jmx login defined  The monitor jmx logins has to be the same    for all applications on a machine  This also applies for herit
3.  NetarchiveSuite GUI that uses JMX to communicate with all running applications makes it easy monitor a running  NetarchiveSuite installation  This component gives you access to the 100 latest logmessages from the applications  and a proper errormessage  if  any application is off line     If you want to get more information about the current status of a particular application  you can use the program  jconsole   You need to know on  which machine the the application is running  MACHINE   the JMX port  JMX_PORT  and RMI port  RMI_PORT  assigned to the application  instance  and password for the monitorRole  set in jmx password file and settings settings monitor  jmxUsername and  settings monitor  jmxPassword  see Configure Monitoring   Then you just write jconsole  and click on the  advanced  tab  enter the URL     When asked for username  enter monitorRole and the password set for the application  Log entries can now be examined for the given  application instance by selecting MBeans  and unfolding dk netarkivet common logging  Furthermore you can examine the system  resources allocated to any given application     Starting and stopping       Appendix_A       Appendix A   Necessary external software    Contents    e Windows specific  e Installing and configuring a JMS broker  e Obtaining a JMS broker  e Installing the JMS broker  e Configuring the JMS broker  e Starting and stopping JMS  e How to empty queues  e How to allocate additional JMS broker memory  e Installing and 
4.  conf killall sh     v7    echo  Usage   0   start   stop     exit 1    Where USERNAME is the name of the user for the installation  and ENV_NAME is the environment name for NetarchiveSuite  defined in the  configuration file      The following command has to be run for the net arkiv script to be run during start up and shut down of Linux     Q  2  A  Q  O  3  Fh  H   Q        w  Q   Q  5  D  ct  w  5  a  H    lt     The script can also be run manually  by the commands     service netarkiv stop  service netarkiv start    Windows    This is an example of how to make Windows 2003 Server automatically call a script during start up  The restart script has to be run  since it might  not have closed correctly last time  e g  power failure  spontaneous reboot  etc    This cleans up before the applications are restarted     Create the service     e Install Microsoft Resource Kit Windows 2003 Server    e Run the program RkTools exe  and install with standard settings    e Open a Command Prompt  and go to the directory where the Resource Kit has been installed  e g  C  Program Files Windows  Resource Kits Tools     e Install a service with the following command Instsrv  lt ServiceName gt   lt path to resource kit gt  srvany exe  e g  Instsrv  BitApp  C  Program Files Windows Resource Kits Tools srvany exe      e Open the registration database with regedit  and find the service through the path  HKEY LOCAL MACHINE SYSTEM CurrentControlSet Services  lt SercviceName gt     e Make sure tha
5.  gt    lt deployDatabaseDir gt myDatabaseDir lt  deployDatabaseDir gt    lt settings gt    lt common gt    lt database gt    lt url gt jdbc derby myDatabaseDir fullhddb lt  url gt    lt  database gt    lt  common gt    lt  settings gt    lt applicationName name  myLinuxApplication  gt    lt  applicationName gt    lt  deployMachine gt    lt deployMachine name  myWindowsMachine  os  windows  gt    lt deployInstallDir gt C  myInstallationDirectory lt  deployInstallDir gt    lt deployJavaOpt  gt  Xmx1150m lt  deployJavaOpt gt    lt applicationName name  myWindowsApplication  gt    lt deployClassPath gt lib dk netarkivet common  jar lt  deployClassPath gt    lt deployClassPath gt lib dk netarkivet harvester  jar lt  deployClassPath gt    lt deployClassPath gt lib dk netarkivet viewerproxy jar lt  deployClassPath gt    lt  applicationName gt    lt  deployMachine gt    lt  thisPhysicalLocation gt    lt deployGlobal gt     This defines two different machines each with a single application  These machines have different operating systems  one with windows and one  with linux   and therefore they have different installation directories and Java options     The Linux machine inherits the Java option    Xmx1536m from the physical location  which inherits it from deployGlobal  The Windows machine  has a Java option specified and does therefore not inherit deployGlobal Java option     The deployDatabaseDir is only specified on the Linux machine  and the database will therefore be unpacked 
6.  install a distributed NetarchiveSuite installlation  The deploy software  offers a way to gather settings for multiple machines in one configuration file  which eases the job of configuration and installation  This software  generates the installation and start stop scripts for a multiserver NetarchiveSuite system     If you are hampered by any limitations in the deploy software  it is of course possible to make your own custom made installation scripts  An  inspection of the scripts generated by the deploy software will probably help you in this respect     For description of the configurations used for installation  please refer to the Configuration Manual    Contents    Installation Overview   Choose an Installation Scenario   Functionality of the Deploy Software   The Deploy Configuration File   Manual installation of the NetarchiveSuite   Starting and stopping the NetarchiveSuite  Monitoring a running instance of NetarchiveSuite  Appendix A   Necessary external software  Appendix B   Starting Netarchivesuite automatically  Appendix C   Easy Installation of NetarchiveSuite    Search manual     Download as pdf  installation manual pdf     Installation Overview       Installation Overview       e Contents   e Audience   e Limitations  e Installation Overview    Contents    The first part describes the functionality of the deploy software and how it can be used  This involves a description of how to run this module  mentioning the required and optional arguments  and the fu
7. A ANAStAIATIONME WIAA  EE E T E EE oh cee Sores neta ey Se at vs Ee we re ne Seth vk  WE i A hs dere ha HO ae ne Rahs  Wee HOS 2    Aah MAS let OMWOVEIVICW ias aeos s a N Bie i weed aher denen Bw SoS A deena odd a a A EA aw a S AN AA 2  1 2 Gnoose an Installation Scenario  scien eae eae es NOES wR OEE He 4 OE OEE ae awd 3  1 3 Functionality of the Deploy Software scassi enanss e i a ea a N a ea e e aaia a e E aa ea eee A aii 6  14 The Deploy Conig  rationm File 22247550 rnama aA E a A A E danas fis E E A AE A RA hos Rese 14  1 5 Manual installation of the NetarchiveSuite        0 0    eee ee een teen een ees 20  1 6 Starting and stopping the NetarchiveSuite scc cso cas eed eho ee eee eR RAEN E OS PY ER OE Bee Oe Ree 26  1 7 Monitoring a running instance of NetarchiveSuite           0 0 00 eee eens 27  1 8 Appendix A   Necessary external software        0    0  eee eens 28  1 9 Appendix B   Starting Netarchivesuite automatically          0 00  eee eee 31    1 10 Appendix C   Easy Installation of NetarchiveSuite            0 0 0    eee eens 32    Installation Manual    This is a manual for installing the software in a distributed environment  including how to use the deploy software which makes it easy to configure  and install the software  It requires some technical background to understand and use this manual     This manual describes how to install the NetarchiveSuite web archive software package     We first describe how to use the included deploy software to configure and
8. L  e MySQL database    By default  the NetarchiveSuite uses an external Derby  Note that from release 3 14   the choice of an embedded Derby database has been  removed to allow several applications to access the database simultaneously  The choice of the database is further described in the section on  Plugins     Besides the configuration of the plug in  where Derby database is the default   there are additional installations and configurations that must be  done as described below     Note that  lt deployInstallDir gt    lt deployDatabaseDir gt  and  lt deployMachine gt  will be used as reference to items corresponding to deploy  settings  The meaning of them are described in the Deploy Settings     Derby Database    If you want to use a Derby database  you have to run it as a separate process     1  Start Derby separately   2  Gd  directory with the extracted database   e g   lt deployiInstallDir gt   lt deployDatabaseDir gt     3  export  CLASSPATH  lt deployInstallDir gt  lib db derbynet 10 4 2 0 jar  lt deployInstallDir gt  lib db derby 10 4 2 0 Ja1   4  java org apache derby drda NetworkServerControl start  p port    The default port is 1527   For the NetarchiveSuite to use this kind of external database  you need to    e Set the setting settings common database class to dk netarkivet harvester datamodel DerbyServerSpecifics    e Set the setting settings common database url to jdbc  derby     lt deployMachine gt  1527 fullhddb  substitute the server  host for  lt deplo
9. MOD by default   lt Limit SITE _CHMOD gt   DenyAll   lt  Limit gt     This enables or disables the PAM authentication module     The default is    on       AUthPAM off    UO  D  Fh  ie   G     ct  Q  D  Q  H   BK  2  N  Fh  c  ae     If the  ftp does not exist  the server will fallback to the         Starting and stopping a Proftpd server    Log as root on to the server  where Proftpd is installed  and the following command will start the FTP server    x  c  n  K  x  ke  O  Q  w  ke  x  n  oO  H   a  S  O  K  O  Hh  ct  O  Q    x  H      Ju  w  j        WO  ue   BK  O  Fh  ct  ue   Q     Monitoring O  Appendix_B       Appendix B   Starting Netarchivesuite automatically    Contents       Linux  e Windows    This manual contains the description about how to make the applications start automatically when the operating system is starting     Currently  when a computer is rebooted  the applications has to be started manually  This describes how to make the operating systems start the  applications during startup     Linux    Note  This has been tested with Redhat Enterprise Linux 5  so it probably works on Fedora  Core  as well     Log in as administrator  Create the following script in   etc init d    the name of the script will be referred to as netarkiv         bin bash    chkconfig  345 80 20    description  netarkiv     x  home USERNAME ENV_NAME conf startall sh      exit 0  case  1 in  start   su   netarkiv  c  ENV_NAME conf startall sh   stop   su      netarkiv  c  ENV_NAME
10. Requirements    Deploy has the following requirements     The environmentName  settings common environmentName  has to be set in settings on the global level    The environmentName  settings common environmentName  must be a combination of digits  0 9  and the letters  a z  lower or upper  case   Deploy fails if the environmentName contains other characters    Different environmentNames between physical location level  machine level and application level is not supported  or meaningful    Databases are not supported on Windows    The GUIApplication and the  ArcRepositoryApplication must be placed on the same machine    The install directory on Windows must be  C  Documents and Settings user    where user is the username on the machine  Except  Windows Vista  or equivalent server os   where the directory must be C   Users user  where user is the username on the machine   All applications on the same machine with jmx login for monitor must have identical login    All applications on the same machine with jmx login for heritrix must have identical login    When creating a test instance  the arguments  http port  and    offset    is only supported as 4 digit numbers    Every physical location  machine and application must have the name attribute defined    Deploy does not handle network connection permissions  E g  if there is a firewall  it has to be setup to allow the applications in  NetarchiveSuite to communicate with each other    Permission to create the wanted directories 
11. The latter must only be installed on one of the access servers  as there can only be one in  the system    e Wayback machine  one server   Here we deploy the WaybacklndexerApplication  the AggregatorApplication  and an instance of the  wayback web application configured with the NetarchiveSuite plugin     Apart from the HarvestControllerApplications  there is no requirement that the applications are placed like this  but we will use it as an example  throughout the rest of the manual  In the standard set up used in our test environment  we have 10 machines     e 1 bitarchive server  on physical location WEST    e 2 bitarchive servers  on physical location EAST    e 1 admin machine  placed on physical location EAST    e 1 harvester machine  placed on physical location WEST   e 2 harvester machines  placed on physical location EAST   e 1 access server  placed on physical location WEST    e 1 access server  placed on physical location EAST    e 1 wayback server  placed on physical location EAST     Choose other plug ins    Except from the plug ins described in this section  the installation of plug ins consists only of the configuration of them     Installation overview     Deploy Software       Functionality of the Deploy Software    Contents    e Functionality of the Deploy Software  e Terminology  e Performing a deploy  e Deploy arguments  e Other dependencies  e Example  e Files  e Jmxremote password file  e Log property file  e Security policy file  e Evaluate  e Test insta
12. a password file  which is the same throughout the installation  6deployInstallDir conf jmxremote password     D  x  Ze   O  K  o  Q  Z  k  ep  FA  H  H  H  zZ  Q  ep  II     J  u  OD  cr  cr  H   5  G  u  Q  Q  3  3  O  5  EA  3  X      O  B   E  II      ke           oO  u  D  a  a  H   5  Q  n  Q  O  3  3      S  kas  3  X  5  3  H   rd  O  6  a  II  oe   N           Note  For the StatusSiteSection to work  your logging must be configured to use java util logging with the  dk netarkivet monitor logging CachingLogHandler enabled  see Command Line Logging section  This is done automatically  if the  NetarchiveSuite deploy software is used to configure and install your NetarchiveSuite installation      Select the appropriate settings file for the application    The conf settings xml  the new one configured to your environment  is probably OK for most applications  But you may need to use special  purpose settings files for some applications  e g  BitarchiveApplications  since you can t allocate more than one baseFileDir on the  commandline   The settings file used in an application can be specified by     D  X  O  O  BK  ct  nN  ga   ar   H  H  Z  Q  II     UO  Q   wv  D  D  ct  w  BK  y  H    lt   D  ct  n  D  ct  ct  H   D  Q  n  Fh  H      D  II  U  Q  D  ge  Ju  O  K lt   H  D  n  ct  w        J  H   BK  N  Q  O  D  Fh  N   09   D  ct  ct  H   D  Q  n  X  3  H    JVM options    We need to set the maximum Java heap size to 1 5 Gbytes  You may use this to change that or add o
13. atform   Some of the application are supported on Windows  and therefore some machines with Windows as operating system can be used in the  distributed system  Just not the machine where the deployment takes place  since the deployment is done through the scripting language Bash  which only works on Linux Unix    The figure below shows what happens when the deploy application is run            log prop  Logging properties    IT configuration     NetarchiveSuite xxx zip     security policy    File in new style Software download      QOutputdir            Security policies j optionally F        Col  lt output dir   default name is set in EnvironmentName setting  gt   0 NetarchiveSuite xxx  zip  LY install   lt physical location  sh    O startall  lt physical location  sh    LJ killall  lt physical location  sh    Cu  lt deploy machine name defined in IT configuratian gt    J jmxreamote _ password   d security  policy   J settings _ lt application name gt  _ lt daploy Application Instance Id gt    xr  d log_sapplication name gt  _sdeploy Application Instance Id   prop    J killall sh  kill  lt application name  _ lt deploy Application Instance ld gt   sh      J startall sh  J start_ lt application name  _ lt deploy Application Instance Id   sh    2    Deploy arguments    Deploy takes the following arguments     e  C   The configuration file for deploy  has to have the   xml  suffix   e The required structure of this file is described in the Configuration file section  It has to 
14. be XML parseable   e  Z   The NetarchiveSuite file  has to be   zip       e This is the NetarchiveSuite package file  which is unzipped on all the machines during installation  This contains the libraries   which is used when applications are run  The NetarchiveSuite package file is copied to the output directory when deploy is run   e  L  The log property file  has to be   prop     e This file contains the basic properties for logging  A copy of this file is made for each machine  where it is changed to fit purposes   of the machine  See the Log property file section under Files   e  S   The security policy file  has to be   policy       e The security policy file defines where the applications are allowed to operate  A copy of this file is made for each machine  where   the required security properties for the applications are granted  See the Security Policy file section under Files   e  O  OPTIONAL    The output directory    e This is the directory on the root machine  the machine where deploy is run from  where the scripts and setting files are created   by deploy  the environmentName is used as default name for the output directory    e  D  OPTIONAL    The database  has to be either   zip  or    jar       e The database where the harvesting informations are to be located  If the database is not given as an argument  the default  database in NetarchiveSuite package file is used  The database has to be placed in an unzippable file    zip  or   jar    and it is  only unzip
15. bitarchive  BitarchiveApplication  then each application must have a unique  temporary file directory defined  settings common tempDir      Configuration example  Here is an example of a configuration file for deploy   Example of deploy configuration file    The following part of this section describes how to change this configuration file template to fit your specific system     This describes how to make the changes  scope for scope  to fit a system with the same structure   and it describes how to expand the scopes with new machines and applications     Deploy Global    The deployGlobal scope contains two parts  the parameters and the settings     Just leave the  lt deployClassPath parameters  since they will be overwritten for the applications which need other libraries   The  lt deployJavaOpt  gt  Xmx1536m lt  deployJavaOpt gt  parameter just sets the maximum heap size to 1 5 GB  1536 MB    This value should not be larger than the amount of accessible memory on a machine     Within the settings scope of deployGlobal the following needs to be done     The environment name is not required to be changed for the system to work  though it is usually a good idea to change this to a more  appropriately name for the installation or system   This is the settings at  settings common environmentName         lt settings gt    lt common gt      lt common gt     I  I  I  I  I  I  I  I  I  I  I 1 1   lt environmentName gt test lt  environmentName gt   I  I  I  I  I 1  i  lt SeCtings gt
16. cation of the instance  dk netarkivet harvester harvesting HarvestControllerApplication is killed  This is because a Heritrix is not throughly tested  on Windows  and might not be supported     Choose an Installation Scenario     Deploy Configuration       The Deploy Configuration File       Contents    e Settings scope  e Deploy scope   e Parameters   e Application Instance Id  e Limitations and Requirements  e Configuration example   e Deploy Global  Physical Locations  Machine  Application  BitarchiveApplication  HarvestControllerApplication  IndexServerApplication and ViewerProxyApplication   BitarchiveMonitorApplication    The deploy configuration file contains the definitions for the installation and distribution of  NetarchiveSuite  This involves the scopes for the levels  in the figure below  and their settings     This figure also shows the pattern of inheritance of the settings  pbhysicalLocation inherits settings and parameters from deployGlobal   deployMachine inherits from physicalLocation  etc                    Level 1 Defines a deploy global scope  Level 4  ovawrieieveli 328     These levels can have several instances of the levels below them     Settings scope    The settings scope is described in the Configuration Manual for NetarchiveSuite  It is no longer required that every variable within the settings  scope is explicitly defined for an application  since the undefined variables are replaced by the default settings  when the application is run     Each l
17. configuring FTP  e Starting and stopping a Proftpd server    The NetarchiveSuite is developed and tested with Sun Java SE  Standard Edition  JDK version 1 6 0_21  In any case a Java 1 6  JDK will be  necessary to compile and run the NetarchiveSuite  and we recommend that all applications use the same JDK     The following external software is required for running the applications    JMS   FTP This is only required  if FT PRemoteFile is the chosen RemoteFile Plugin    SSH  Installed as default under Unix Linux  and WinSSHD by http   www bitvise com does the trick on Windows    Unzip   unzip exe  on Windows  and  unzip  on Linux     Windows specific    Some application requires the Unix command sort  but they should be able to run under Windows if Cygwin is installed  This should only affect  the ViewerProxy  the IndexServer  and the wayback AggregatorApplication     Installing and configuring a JMS broker    The software have been tested with the free JMS broker from Sun  Open Message Queue 4 4   and the commercial JMSBroker  Sun MQ 3 6  Enterprise Edition      Obtaining a JMS broker    Sun s Open Message Queue can be obtained from the following site  https   mq dev java net downloads html    Go to the section named  Legacy Versions   and click on the Linux link in the subsection  Open MQ 4 4 Binary Downloads   This will give you a  jar file named  mq4_4 binary Linux_X86 XXXXXXXX jar    We have no reason to suppose that NetarchiveSuite will have problems with newer  versions b
18. describes the architecture and any custom settings  This will also specify your environmentName  e g   MY_WEBARCHIVE       Modify the other configuration files  logging and security properties  if necessary      Run the Deploy utility  This will create a sub directory MY_WEBARCHIVE with all the deploy scripts and configuration files you need      Run the install scripts  then the start scripts  You should now have a running netarchivesuite installation     Previous     Choose an Installation Scenario       Choose an Installation Scenario    AUN    N O O1    Contents    e 1 Choose a platform  e 2 Choose Repository  e 3 Choose the type of database  e 3 1 Derby Database  e 3 2 MySQL Database  e 3 3 PostgreSQL Database  4 Choose a JMS broker  5 Java  6 Choose the set of machines taking part in the installation deployment  7 Choose other plug ins    Choose a platform    NetarchiveSuite can be installed in a number of different ways  with varying numbers of machines on different sites  There are a number of  separate applications in play  most of which can be put on separate machines as needed  To keep clear what is necessary for which setups  we  will consider the following types of setup     e A  Single machine setup  This corresponds to the setup used in the Quick Start Manual  where all applications run on the same  machine  and file transfer are done by simply copying files locally  It is the simplest setup  but does not scale very well    e B  Single site setup  In this scena
19. e do have a couple of external  calls to the Unix sort command  The parts of our software using this external command therefore only run on Linux Unix  or Windows with Cygwin  installed  The parts in question are     e The dk netarkivet common GUIApplication  if the sitesection  dk netarkivet viewerproxy webinterface QASiteSection is used  e The dk netarkivet archive indexserver IndexServerApplication    Specifically the following methods all use an external call to the Unix sort   command     e FileUtils sortCrawlLog  e Used in  e dk netarkivet archive indexserver CrawlLogIndexCache   e dk netarkivet viewerproxy webinterface Reporting  e FileUtils sortCDX    only used in dk netarkivet archive indexserver CrawlLoglndexCache   e dk netarkivet archive indexserver CDXIndexCache sortFile    e dk netarkivet viewerproxy LocalCDXCache getIndex      The Software is mainly tested on a Linux platform  but with some of the BitarchiveApplication s installed on a Windows platform     Installation Overview    Using NetarchiveSuite s Deploy utility  the steps required to configure and start a webarchive are    1  Determine the required architecture   ie how many machines you will be using  their locations  their operating systems and which  applications should run on each machine     Configure the required machines  the required external software  see Appendices  and any relevant firewalls     Unpack NetarchiveSuite zip in a directory on a linux machine     Create the config xml file which 
20. e more than that number of applications of the same kind on the same bitarchive replica  for instance  more than 20 bitarchiveapplications   e Set max producers to 100  You add the following line   img autocreate destination maxNumProducers 100    in the file      SINSTALLATION_DIR mq var instances imqbroker props config properties       If you get an error like this       Producer can not be added to destination PROD_  COMMON_MONITOR Queue  limit of 100 producers would be exceeded       in the JMS broker log  you need to increase this value     Starting and stopping JMS    The broker is started directly in this way     in  H  Z  n  H  D  E   E   D  H  H  O  r  J  H  ve   ss  3  OQ       o   H   D  DS  H      Q  o   BK  O  oe  D  5  Q     B  D  io  D     u  a  O  B  D     a  c  K  Q    The sysadmin would maybe like to start the broker on machine startup by inserting the statement above into the  etc rc d rc local    The broker is stopped in this way     logon on machine as root      find processid for the broker  ps auxw   grep imqbrokerd     kill  9 SIMQ_PROCESSID    Alternatively press Crtl c  if the terminal where the broker was started  is still available    You can test that JMS broker is alive by telnetting to its port  where it will give some technical information in reply        user udvikling kb dev    adm 001 kb dk   telnet localhost 7676  Trying 127 0 0 1       Connected to localhost localdomain  127 0 0 1      Escape character is          101 imgbroker 4 1  portma
21. e the content of the configuration file when deploying  by giving the   E  parameter with argument either  y  or  yes   This is a  tool for finding bugs within a configuration file  e g  a mispelled name or wrongly placed branch      This checks if the all the branches in the configuration file can be found within the default settings  and makes a warning for those it cannot find  It  does not check if the content of these branches are correct  e g  http port    1   it only checks whether the branches also exists in the default  settings     Deploy does not abort the program when unknown branches are found  It only generates warnings about each unknown branch and then  continues with the deployment     Some module have plugins which uses some values within the settings  which is not part of the default settings  and they will therefore be noted  as unknown  Such plugin specific branches should not be considered errors  even though warnings are issued for these     Test instance    In the case where test argument are given a new configuration file is created  with the _test appended to the name  e g  deploy_config xml will  have the test instance configuration file  deploy_config_test xml      The following test arguments are given  test_HttpOffsetPort  test_HttpPort  test_EnvironmentName  and test_Mailreceivers   These arguments are given without spaces between them in the above order  An Offset variable is calculate as the difference between the  test_HttpPort and the test_H
22. eApplication  and  dk netarkivet archive bitarchive BitarchiveMonitorApplication     Physical Locations    The configuration example file has two physical locations  EAST and WEST   Every physical location need to have a unique name      lt thisPhysicalLocation name  EAST  gt      lt  thisPhysicalLocation gt    lt thisPhysicalLocation name  WEST  gt      lt  thisPhysicalLocation gt     For the settings of a physical location the following need to be done    A physical location needs to know which replica it uses    This replicald has to be amongst the replicas defined in the deployGlobal scope   It has the path   settings common useReplicald         lt settings gt    lt common gt    lt useReplicaId gt A lt  useReplicalId gt        lt  common gt      lt  settings gt          lt remoteFile gt        lt serverName gt kb test har 001 kb dk lt  serverName gt        lt userName gt ftptestuser lt  userName gt      lt userPassword gt ftptestpasswd lt  userPassword gt         lt  remoteFile gt     The notifications settings should be setup to tell where mails should be sent   The receiver should be changed to the mail of the administrator of the system     Aaaa aaa aa aa aaa a a A I  I I  1 i  1  lt notifications gt    lt sender gt example netarkivet  dk lt  sender gt   I I  i  lt receiver gt example netarkivet dk lt  receiver gt   I I  i  lt  notifications gt  i     Lem ee EE BE EE EE EE eel I    It is currently not possible to have more than two physical locations  but this problem 
23. ebinterface GUIApplication    dk netarkivet archive arcrepository ArcRepositoryApplication    dk netarkivet archive bitarchive BitarchiveMonitorApplication    a  Now you can shutdown the databases  if you like   2  The BitarchiveApplication on all bitarchive servers are shut down     Q  a  D  D  ct  w  BK  ran  H    lt   D  ct  w  BK  Q  DJ  H    lt   D      H   ct  w  BK  Q  B  H    lt   D  W  H   ct  w  BK  Q  D  H    lt   D  D  KO   ue      H   Q  w  ct  H   O  D    3  The applications on the harvester machines are shut down in arbitrary order   4  The applications on the access servers are shutdown by first killing the IndexServer and then the ViewerproxyApplication instances     Remember to empty the JMS queues after shutting down the NetarchiveSuite if you are upgrading the system or want to reset the system  If any  outstanding JMS messages are around next time the NetarchiveSuite is started  they may cause deserialization errors if the message definitions  have changed  To empty the JMS queue  you need to know what JMS environmentName your NetarchiveSuite instance have been using  The  details of this are explained in Appendix A     In the Danish installation  we empty the queues each time the system is restarted  so the effect of leaving messages in the queues over a restart  even when not upgrading has not been tested in practice     Manual installation O  Monitoring       Monitoring a running instance of NetarchiveSuite       Contents    The Status component of the
24. eritrix gt    lt serverDir gt harvester_high_2 lt  serverDir gt    lt  harvesting gt    lt  harvester gt    lt  settings gt    lt  applicationName gt     I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  i  lt harvesting gt  i  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  I I  i i  i i  i i  I I    How to configure which Heritrix report has to be uploaded in the metadata ARC file    Three settings properties control which heritrix reports are added to the metadata ARC file     e settingsharvesterharvestingmetadataheritrixFilePattern is a java pattern that allows you select which files in the crawl dir  not  recursively  to include in the metadata ARC     e settingsharvesterharvestingmetadatareportFilePattern is also a java pattern that controls which subset of the files selected by  heritrixFilePattern are to be considered as report files All the other files will be considered as setup files     e settingsharvesterharvestingmetadatalogFilePattern is a third java pattern that controls which files in the logs subdirectory of the  crawldir are to be added as log files to the metadata ARC     Appendix_B       
25. ettings common applicationInstanceld   and its own distinct base directory   settings viewerproxy baseDir   They also belong to a Replica settings archive bitarchive useReplicald   In the start sample below  the instance  uses application instance id  first  and  viewerproxy_first  as base directory  and belongs to ReplicaOne with Id ONE     cd SdeployInstallDir  export APP_OPTIONS     Dsettings common applicationInstancelId first     Dsettings viewerproxy baseDir viewerproxy_first       Dsettings archive bitarchive useReplicaId ONE   export APP dk netarkivet viewerproxy ViewerProxyApplication  java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP_OPTIONS SAPP    About the NetarchiveSuite support for wayback  see Additional Tools Manual    Deploy configuration     Starting and stopping       Starting and stopping the NetarchiveSuite       Contents    e NetarchiveSuite application startup order  e NetarchiveSuite application stopping order    This section describes how to start and stop the NetarchiveSuite     Note that the deploy module can make scripts for this purpose  Please refer to the  Configuration Manual 3 16  for more information on how to use  the deploy module     You need to start and stop the NetarchiveSuite applications in the correct order  The most critical part is that the BitarchiveMonitor must not start    before the BitarchiveServers  as it might then initiate batch jobs before all BitarchiveServers are up and running and thus not receive the batch  me
26. evel  in the figure at the beginning of this section  inherits the settings from the level above it  until deployGlobal   though only the variables  which is not explicitly defined at the current level  The content of the settings scope at the application level  level 4  is printed into an application  specific settings file  which is used for running the application     Some parts within the settings scope is used by deploy  and they will be described in the following section   Deploy scope    The levels in the figure can have an instance of the settings scope defined  These settings are inherited through the hierarchy     The scope levels of Deploy     e  lt deployGlobal gt     Defines a deploy global level 2 scope where settings can be set to overwrite setting defaults    e  lt thisPhysicalLocation name       gt     Defines the level 2 scope for a physical location  The settings for this scope will overwrite the settings for the 1  level scope   deployGlobal   The attribute  name  for thisPhysicalLocation overwrites settings common thisPhysicalLocation    e  lt deployMachine name       os       gt     Defines a deploy machine level 3 scope where common settings for the machine and the applications running in the machine can be  set  These settings will overwrite 1  and 2  level settings  The attribute  name  for the machine is the network name the machine  and will  be used for communicating with the machine  The attribute  os  is optional and defines the operating system 
27. fter having  created the new settings to be used in the deployment of the software  zip together the NetarchiveSuite files including the new settings and copy  the modified NetarchiveSuite zip to all machines taking part in the deployment     export USER test  export MACHINES  machinel domainl  machine2 domainl     machinel domain2  machine2 domain2   for MACHINE in SMACHINES  do  scp NetarchiveSuite zip SUSER SMACHINE SdeployInstallDir  ssh SUSER SMACHINE  cd SdeployInstallDir  amp  amp  unzip NetarchiveSuite  zip   done    NetarchiveSuite settings    The NetarchiveSuite settings can be set for applications in three different ways     e use default setting  e ina setting file  e on command line    Using NetarchiveSuite default settings    If no settings are set  the default setting is used  Please refer to the  Configuration Manual 3 16 DefaultSettings  for more information on these     Setting NetarchiveSuite settings on the command line    To set the value of a setting on the command line  add   Dkey value  to your java command line  for instance     QU   w   lt   w          n  D  Hi  c  H   2   Q  n  Q  O  3     O  5   gt    ct  ct   xe   ze  O  K  c  Il   00   O  J  OD  Q   A  5  O   ct  w  D  a  H    lt   O   ct  Q  O  3  3  O  5     O  oO  H   z     O  K  Fh  w  Q  D  Q  Cc  H  D   O   KO  ke  H   Q  w  pa  H   O  5    will override the setting for the http port to be 8076     Setting NetarchiveSuite settings with settings files    To set the values using a configurati
28. guration files     settings1 xml     lt settings gt    lt common gt    lt http gt    lt port gt 8076 lt  port gt      lt  http gt  i   lt  common gt    lt  settings gt        lt settings gt      lt common gt        lt http gt   i  lt port gt 8077 lt  port gt   i  lt  http gt  i     lt  common gt        lt  settings gt       java    Ddk netarkivet settings file settingsl xml settings2 xml  Dsettings common http port 8078  dk netarkivet common webinterface GUIApplication    java    Ddk netarkivet settings file settings1l xml settings2 xml  dk netarkivet common webinterface GUIApplication    java    Ddk netarkivet settings file settings2 xml settingsl xml  dk netarkivet common webinterface GUIApplication    Standard commandline settings    The CLASSPATH    The CLASSPATH needed to start and run the java applications in NetarchiveSuite consists of 5 jarfiles  dk  netarkivet harvester jar   dk netarkivet archive jar  dk netarkivet viewerproxy jar  dk netarkivet wayback  jar  and  dk netarkivet monitor  jar  The dk netarkivet common jar and all our 3rd party dependencies need not be added explicitly to the  CLASSPATH  as they are referenced indirectly in the jar files     export deployInstallDir  path to netarchiveSuite   export CLASSPATH SCLASSPATH  S deployInstallDir lib dk netarkivet harvester jar  export CLASSPATH SCLASSPATH  S deployInstallDir lib dk netarkivet archive jar  LASSPATH SCLASSPATH  SdeployInstallDir lib dk netarkivet viewerproxy jar  LASSPATH SCLASSPATH  SdeployIns
29. harvester machine and viewerproxy machine Only one physical location has an administator machine  which contains the GUI application  the  Bitarchive monitors  the HarvestJooManager  HarvestJobMonitor and the arc repository        How to add a harvester more on the same machine and set all to HIGHPRIORITY selective harvesting    Using eg deploy_examplexml    e Duplicate the existing harvester  lt applicationName gt  definition within  lt deployMachine gt    In the new duplicate harvester config  change all following duplicate values to new unique values within  lt deployMachine gt       lt applicationInstanceld gt    lt common gt  lt jmx gt  lt port gt  and  lt rmiPort gt    lt heritrix gt  lt guiport gt  and  lt jmxPort gt    lt serverDir gt harvester_high_2 lt  serverDir gt     and set    e  lt queuePriority gt HIGHPRIORIT Y lt  queuePriority gt      lt applicationName name  dknetarkivetharvesterharvestingHarvestControllerApplication  gt    lt settings gt    lt common gt    lt applicationInstanceId gt high2 lt  applicationInstancelId gt    lt jmx gt    lt port gt 8112 lt  port gt    lt rmiPort gt 8212 lt  rmiPort gt    lt  jmx gt    lt  common gt    lt harvester gt      lt queuePriority gt HIGHPRIORITY lt  queuePriority gt   lt heritrix gt    lt guiPort gt 8192 lt  guiPort gt   lt     T  jmxPort to be modified by test  was 8093     gt    lt jmxPort gt 8193 lt  jmxPort gt    lt jmxUsername gt controlRole lt  jmxUsername gt    lt jmxPassword gt R_D lt  jmxPassword gt    lt  h
30. hive arcrepository baseDir     deployMachine  settings tempDir    applicationName    where    in Directory is the value of the path  All the directories along this path will be created  if they do not exists already  A directory is only  created if the path is defined under settings for the branch level  or inherited to the branch level  and it contains a not empty value     The installation of the directories will be executed from the installDir  The directories will only be installed if they do not already exist  with the  optional exception of the tempDir  which will be removed before creation if the  R argument is set to  yes   It is only the directory at the end of the  path  which has its content removed  not all the directories along the path  E g  a tempDir with the path myPath myEndDir will only clean the  directory  myEndDir   and not the directory  myPath        On Linux Unix machines directories are created directly through ssh  while Windows machines use a batch program  which is installed  run and  then deleted     Install scripts  settings and database    The jmxremote password file has to be not writable when the applications are running  which means that a reinstallation of this file cannot happen  before it is made writable again     Then all the script and setting files are copied from the local directory with the machine name to the  conf  directory in the installation directory on  the machine     Then the optional database is handled  though only on the 
31. ines the directory for the database to unzipped  This directory can be full path or path relative to install directory  It is an optional  parameter for defining where a machine should have the database unpacked  and if the machine does not include this parameter it will  not have the database unpacked  Also it requires the settings common database url set  Note  This must be set on the machines where  the database are to be unpacked  Only one database directory is supported  if several  a warning is placed in the log and the first  database directory is used     e  lt deployBitpreservationDatabaseDir gt     Defines the directory for the bitpreservation database to be unzipped  This directory can be full path or path relative to the installation  directory  It is an optional parameter for defining where a machine should have the bitpreservation database unpacked  and if a machine  does not have this parameter it will not have the database unpacked   An example of how this works is given below      lt deployGlobal gt    lt deployClassPath gt lib dk netarkivet common jar lt  deployClassPath gt    lt deployClassPath gt lib dk netarkivet archive jar lt  deployClassPath gt    lt deployJavaOpt gt  Xmx1536m lt  deployJavaOpt gt    lt thisPhysicalLocation name  myPhysicalLocation  gt    lt deployMachineUserName gt myUserName lt  deployMachineUserName gt    lt deployMachine name  myLinuxMachine  gt    lt deployInstallDir gt  home myUserName myInstallationDirectory lt  deployInstallDir
32. instances of dk netarkivet archive bitarchive BitarchiveMonitorApplication should be placed on the same machine as  the dk netarkivet common webinterface GUIApplication    These applications monitors the BitarchiveApplications at a given replica  though they do not have to be on the same physical location    They should therefore have the settings common useReplicald defined     Deploy Software     Manual installation       Manual installation of the NetarchiveSuite    Contents    e NetarchiveSuite settings  e Using NetarchiveSuite default settings  e Setting NetarchiveSuite settings on the command line  e Setting NetarchiveSuite settings with settings files  e The order of resolving NetarchiveSuite settings  e Standard commandline settings  e The CLASSPATH  e Logging  e JMX settings  e Select the appropriate settings file for the application  e JVM options  e Admin machine  e Starting the GUIApplication  e Starting the BitarchiveMonitorApplication instances    e Harvester machines  e Bitarchive machines  e Access servers    If the deploy software is not adequate for the installation needed  this section will give some hints on how to distribute and install the  NetarchiveSuite software on a number of machines     In the examples below  we assume that SdeployInstallDir is set to the directory in which the NetarchiveSuite code is to be installed     We assume that all machines in the chosen scenario are unix linux servers  The procedure below may not work on other platforms  A
33. ion  for this site section       gt    lt webapplication gt webpages HarvestDefinition war lt  webapplication gt    lt  siteSection gt   i  lt  webinterface gt  i     lt  common gt     and similar for other sitesections     Now we are ready to start the application     l cd  deployInstallDir    export APP dk netarkivet common webinterface GUIApplication  java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP    Starting the BitarchiveMonitorApplication instances    In the general set up with two distributed bitarchive replicas  we have a BitarchiveMonitorApplication associated with each replica  Here the  replicas are ReplicaOne  with replicald ONE  and ReplicaTwo  with replicald Two      To distinguish the two instances from each other  we use the  we use BMONE and BMTWO  as the two identifiers     settings common applicationInstanceld    setting  which is used as a identifier  here    Start the monitor for bitarchive at ReplicaOne using BMONE as identifier thus     cd SdeploylInstallDir  export APP_OPTIONS      Dsettings common archive bitarchive useReplicaId ONE      export APP dk netarkivet archive bitarchive BitarchiveMonitorApplication        Dsettings common applicationInstanceId BMONE   java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP_OPTIONS SAPP    cd SdeployInstallDir  export APP_OPTIONS     Dsettings common archive bitarchive useReplicaId TWO      export APP dk netarkivet archive bitarchive BitarchiveMonitorApplication        Dsettings common applicati
34. is required    The unzip command  or program  has to be accessible through  ssh  on every machine    Two instances of the same application on the same machine must have different applicationInstancelds    Several instances of the same setting cannot extend one setting  E g  a physical location with several instances of the remoteFile defined  need to have each remoteFile setting completely defined  since they are not extended by a single remoteFile in the global settings     The deploy configuration has the following limitations in comparison to the manual installation     e Only embedded Derby databases have been tested with the new Deploy  and other databases have to be installed manually   The limitations and requirements for the configuration of the applications can be found in the Configuration Manual  Specific for deploy  are the following     Every application must have a jmx port and rmi port  and they must be unique for the machine where the application is running   dk netarkivet harvester harvesting  HarvestControllerApplication does not run on Windows machines    A dk netarkivet archive bitarchive  BitarchiveApplication must have at least one settings archive bitarchive baseFileDir defined    Only the dk netarkivet archive bitarchive  BitarchiveApplication is properly tested on the Windows platform  Some of the other applications  should work  though they have not been tested enough to say for certain    e ifa machine has several instances of dk netarkivet archive 
35. ith two machines  one with Linux Unix and one with Windows  The Linux Unix machine has two  applications   myApplication    and  myOtherApplication     while the Windows machine has only one application   myApplication        Parameters    Each of the above scopes can have several of the following parameters defined  These parameters can be applied to each of the above scopes   and they are inherited from the parent scope in the same way as settings     The parameter scopes the levels can have     e  lt deployClassPath gt     Defines a class path to be added for running an application  Note  several additional class paths can be specified within a scope  but  new definitions in inner scopes will overwrite outer scopes    e  lt deployJavaOpt gt     Defines a Java option for an application  Note  several additional java options can be specified within a scope  but new definitions in  inner scopes will overwrite all outer scopes    e  lt deploylnstallDir gt     Defines the installation directory for a deployMachine  can only handle one deploylInstallDir  Note  only one install directory is supported   if several  a warning is placed in the log and the first install directory is used     e  lt deployMachineUserName gt     Defines the user name for a deployMachine  This is used when communicating with the machine  Note  only one machine user name is  supported  if several  a warning is placed in the log and the first machine user name is used     e  lt deployDatabaseDir gt     Def
36. machines with a specified database directory  This database overrides the existing  standard database in the NetarchiveSuite package  The database is then unzipped to the database directory  but only if it is empty     Then the scripts are made executable and the jmxremote password is made read only   Start  Restart and Kill    The figure below shows how the applications are started  and the same pattern are used for killing the applications again  replace start with kill in  the figure            lt output dir gt   O install    Physical Location gt  sh        lt      lt      O startall  lt physical location gt  sh  0 killall  lt  physical location gt  sh    0O NetarchiveSuite xxx  zip     deploy machine   name defined in IT configuration     Bik        startall_ lt physical location gt  sh       DeployMachine   name 2 gt     DeployMachine  lt name 1 gt     p F ai                         EoOorrr a a    P i a T  z       DeployMachine S    Fa   Logged on with user defined in og   w  Fa DeployMachineUserName in the i Bs       r configuration file   is y        Note that an application cannot be started if it is already running  and how this is checked is different on the two supported platforms  Linux and  Windows platforms  as we will see below     The restart script can be used for restarting the running applications  It starts by calling the killall script  then waits 5 seconds for the applications  to terminate completely  and finally runs the startall script  This script ca
37. mae      Install script pseudo code      The install script for a physical location has the following procedure   e for each machine do the following   1  Install the NetarchiveSuite file     2  Install the necessary directories   3  Install scripts  settings and database     Install the NetarchiveSuite file    The NetarchiveSuite file is copied to the machine using scp  Secure copy   Then file is unzipped in the installation directory  which is created as a  subdirectory in the local user directory     Install necessary directories    In the config file a number of directories are defined  and these directories have to be created during the installation on a machine  The following  table show which directories are created based on the main branch where they are defined  and their path from this branch  The branch level  represents where the applications have to be defined before they can be applied  They can easily be defined in a prior instance  and then be  inherited to the given branch level     Path Directory Branch level  settings harvester harvesting serverDir      applicationName  settings archive bitarchive baseFileDir     applicationName    settings archive bitarchive baseFileDir    filedir  applicationName  settings archive bitarchive baseFileDir    tmpdir  applicationName    settings archive bitarchive baseFileDir    atticdir  applicationName    settings viewerproxy baseDir    applicationName  settings archive bitpreservation baseDir    deployMachine  settings arc
38. n be used for Windows Services  automatic execution during startup      Linux    On the Linux platform an application is only started if no instances of this application be found among the running processes  Likewise an  application is only killed if it can be found in the process list     The way an instance of a specific application can be found amongst the list of running processes  is by looking for any process with the same  name  and which is using the same settings file     When killing the an application of the instance dk netarkivet harvester harvesting HarvestControllerApplication  then the  Heritrix application is also killed     Windows    It requires several files on windows to run the application  and making sure that maximum one instance of the application is running  Two scripts  for killing it  two scripts for starting it and one temporary file for telling whether it an instance is running     The application can only be started if the temporary run file does not exist  It is done by calling a VBS script for running the application  This script  starts the application as a process and saves method for killing this process in a kill process file     The application can only be killed if the temporary run file exists  The kill process file is called for killing the process of the application  Then the  temporary run file is removed  thus telling that the application is not running and can be started again     The Heritrix application is not killed when an appli
39. n sh    O NetarchiveSuite xxx zip    P     __J lt deploy machine   name defined in IT configuration  gt     jmxremote password    security policy    cine   settings_ lt application name  _ lt deploy Application Instance ld gt   xml  Location      killall sh    kill_ lt application name gt  _ lt deploy Application Instance Id gt   sh      startall sh    start_ lt application name gt  _ lt deploy Application Instance Id gt   sh                                                Physical Ts   log_ lt application name gt  _ lt deploy Application Instance ld   prop                                              DeployMachine  lt name 2 gt     DeployMachine  lt name 1 gt         T          _      7  77 DeployMachine  ce Logged on with user defined in  DeployMachineUserName in the  FA configuration file    7    C  instal dir  s U Netarchivesuite xxx zip    f GJ  lt output dir   gt  from deploy run  i J  lt files from unzip Install dir NetarchiveSuite xxx_zip gt  1 l      1   i rae conf        O jmxremote password  read only  l         L  security policy j F  4 O settings    application name gt  _ lt deptoy Application Instance Id gt    ml y F      O bog  lt application name gt    deploy Application Instance ld gt   prop fy      LJ Killall sh  executable       U kill  lt application name  _ lt deploy Application Instance ld gt   sh  executable  F    w O startallsh  executable  r  s O start_ lt application name gt   lt deploy Application Instance Id   sh  executable            a            
40. nce  e Install  e Install script pseudo code  e Install the NetarchiveSuite file  e Install necessary directories  e Install scripts  settings and database  e Start  Restart and Kill  e Linux  e Windows    Functionality of the Deploy Software    The main function of deploy is to install and configure NetarchiveSuite on a distributed system  This is done through scripts to install  start and  stop the applications of NetarchiveSuite based on a configuration file for the system  A sample file is provided with NetarchiveSuite in the file  examples deploy_distributed_example xml     The figure below shows the hierarchy of the instances in the deploy configuration file           Level 1 Defines a deploy globalscope   Level   niin ates we location   _  vermeil    28e8   aiiai premanir IAA        Terminology    e environmentName  The required value in the deploy configuration file     e machineUser  The login for the machine   e installDir  The directory on a machine where the installation is done  This is the directory environmentName from the ssh initial directory     Linux path   home machineUser environmentName   and most versions of Windows uses the path  C  Documents and  Settings  machineUser environmentName  except Windows Vista  and newest equivalent server  which has the path   C  Users machineUser environmentName     Performing a deploy    The Deploy module has to be run from a Linux Unix machine  since the scripts for handling the physical locations only works on this pl
41. nctionality of the scripts generated     The second part describes the configuration file used by the deploy software  both in structure  content and examples  This also describes the  requirements and limitations of Deploy     The third part describes the different possible installation scenarios    The fourth part describes the means of deployment  which includes description of how to obtain and install required libraries  how to install the  software on separate machines  Finally  the starting  stopping and monitoring of the system is described  This part is useful for those who want to  go beyond the limitations inherent in the deploy software     Some parts of NetarchiveSuite requires external software to run  This is described in appendix A     This manual does not explain how to configure the applications themselves  see the Configuration Manual for this   how to extend the functionality    of the system  see the development project for this  or how to use the running system  see the User Manual for this      Audience    The intended audience of this manual is system administrators who will be responsible for the actual installation of NetarchiveSuite as well as  technical personnel responsible for proper operation of NetarchiveSuite  Knowledge of Unix system administration is expected  and some  familiarity with XML and Java is an advantage     Limitations    Even though the NetarchiveSuite software is developed in Java  and therefore is mostly platform independent  w
42. nd it is only unzipped on machines where the  lt globalArchiveDatabaseDir gt  parameter is  defined in the configuration  This is currently only supported on Linux machines     Other dependencies    Deploy requires the following libraries in the classpath                    dk netarkivet deploy jar  dk netarkivet archive jar  dk netarkivet common  jar  dk netarkivet harvester jar  dk netarkivet monitor jar  dk netarkivet viewerproxy jar    dom4j 1 6 1  Jar  or newer   commons    logging 1 0 4  jar  or newer   commons cli 1 0  jar  or newer   jaxen 1 1  jar  or newer     Deploy uses Java 1 6 and therefore this has to be put in the path before calling the java application     Note that you only need to mention the dk netarkivet deploy jar explicitly in the classpath  because the others are referenced inside the  dk netarkivet deploy jar    Example    The complete call  without optionals  for running deploy will therefore be the following  with 1ib  being the directory for the libraries      export JAVA_HOME  usr java jdk1 6 0_07   export PATH SJAVA_HOME bin SPATH   java  cp lib dk netarkivet deploy jar dk netarkivet deploy DeployApplication  Cdeploy_config xml   ZNetarchiveSuite zip  Ssecurity policy  Llog prop    where deploy_config xml is the name and path to the configuration file  NetarchiveSuite  zip Is the path of the NetarchiveSuite package   security policy is the path of the security policy file and log  prop is the path of the property file for logging  Java version 1
43. on file  save the settings in an XML file as described above  By default  NetarchiveSuite will look for the  settings file in conf   settings  xm1  that is  the file settings  xm1 under the directory conf from the current working directory  You can  override this  by specifying  Ddk netarkivet settings file path to settings file xml on the commandline  for instance     java    Ddk netarkivet settings file  home netarchive guisettings  xml  dk netarkivet common webinterface GUIApplication    will read settings from the file  home netarchive guisettings xml      You can even specify multiple configuration files  if you wish  You do this by separating the paths with     on unix linux MacOS or     on windows   For instance     java    Ddk netarkivet settings file guisettings xml basicsettings  xml  dk netarkivet common webinterface GUIApplication    will read settings from both guisettings xml and basicsettings xml in the current directory     The order of resolving NetarchiveSuite settings    If a setting is set on both command line and in settings files  or if it is set in multiple settings files  the setting is resolved as follows     e lf the setting is set with system properties  i e  set on the command line   use these    e Else if the setting is specified in configuration files  use the  first  e Else use default value    specified value    As an example  consider the resulting value for http port  knowing that the default value is empty  when using the following two confi
44. on the machine  If  os  is not  set or has value different from  windows   not case sensitive   then the default  Linux Unix  is used    e  lt applicationName name       gt     Defines the level 4 scope where the application specific settings are placed  These settings will overwrite the inherited 1   2  and 3  level  settings  The attribute  name  for applicationName is used for calling the application  Only the last part of the name is used for all  purposes  except running the application  and it overwrites settings common applicationName   e g  the application dk netarkivet archive bitarchive BitarchiveApplication will have the name BitarchiveApplication    If the application has an specific applicationInstanceld  it is specified under settings   One level can have several instances of a lower level  e g  a deployMachine can have several applicationName  and not vice versa      This will look like the following      lt deployGlobal gt    lt thisPhysicalLocation name  myPhysicalLocation  gt    lt deployMachine name  myMachine  os  linux  gt    lt applicationName name  myApplication  gt    lt  applicationName gt         lt applicationName name  myOtherApplication  gt    lt  applicationName gt     lt  deployMachine gt     lt deployMachine name  myOtherMachine  os  windows  gt    lt applicationName name  myApplication  gt    lt  applicationName gt     lt  deployMachine gt     lt  thisPhysicalLocation gt    lt  deployGlobal gt     This configuration has one physical location w
45. onInstanceId BMTWO   java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP_OPTIONS SAPP    e one ARCRepository  this application handles all access to the bitarchives      cd SdeployInstallDir    export APP dk netarkivet archive arcrepository ArcRepositoryApplication  i java SJAVA_OPTS SSETTING  LOG_SETTINGS   JMX_SETTINGS SAPP I    Harvester machines    On each harvester machine  we have one or more HarvestControllerApplications  Settings related to the HarvestControllerApplication are    setting common applicationInstanceld  to distinguish between HarvestControllerApplications running on same machine   settings harvester harvesting queuePriority  to select which of two queues to accept jobs from  HIGHPRIORITY  jobs part of a selective  harvest   or LOWPRIORITY  jobs part of a snapshotharvest    e settings harvester harvesting minSpaceLeft  how many bytes  must  be available in the serverdir to accept crawljobs   The default is  400000000   400 Mbytes      In the following  a low priority HarvestControllerApplication is started with application instance id SEL    cd SdeploylInstallDir f    export APP_OPTIONS   Dsettings harvester harvesting queuePriority LOWPRIORITY    1    Dsettings common applicationInstanceId SEL     export APP dk netarkivet harvester harvesting HarvestControllerApplication  java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP_OPTIONS SAPP    Bitarchive machines    For each Replica  you can have BitarchiveServer s installed on one or more machine
46. only on this machine  It is specified  in settings common database url what type the database is  and where the it is found after it is unpacked  If a specific database is not given as  parameter when calling deploy the default Derby database  fullhddb jar  is used     The application myLinuxApplication on the Linux machine does not have any class paths specified  and does therefore inherit the  lib dk netarkivet common  jar and lib dk netarkivet archive  jar all the way from deployGlobal  through thisPhysicalLocation  and deployMachine      On the other hand myWindowsApplication on the Windows machine not inherit these libraries  since it has its own class paths specified  It has  the libraries Lib dk netarkivet common  jar  lib dk netarkivet harvester jarand lib dk netarkivet viewerproxy jarin  the class path  and does therefore not have the lib dk netarkivet archive  jar since it is neither specified nor inherited     The myLinuxApplication will be called with the following command     Lu   a   lt   w      x  S  X  1  O1  OO  OD  3     Q  ze      H   oO        Q   y  D  D  ct  w  BK  y  H    lt   D  ct  Q  O  2  3  O  a    lL   w  BK     H   oO  SS  O   nw  m   D  ct  w  BK  s  H    lt   D  ct  w  BK  Q  Hy  H    lt   D   lL   w  BK  3   K lt   E   H   D  G  x  D   ge   JO      H   Q  ie   ct  H   Q  D    java  Xmx1150m  cp      lib dk netarkivet common  jar  lib dk netarkivet harvester  jar  lib dk netarkivet viewerproxy jar    myWindowsApplication    The class paths are 
47. ped on machines where a database directory has been defined  Currently databases are only supported on Linux  machines    e  R  OPTIONAL    Whether the temporary file directory should be reset  Any argument different from  y  or  yes  will be considered a    no       e During installation some directories are created  if they do not already exists  This argument defines whether the temporary   directory should be cleared during installation  or reinstallation    e  T  OPTIONAL    For creating a test instance    e The argument is required to have the following format   HttpOffsetPort HttpPort  EnvironmentName MailReceivers   no spaces  between them   A new config file is created based on these inputs and the given config file  this file has the same name  just with  the extension  _test xml  instead of   xml    See the Test instance section    e  E  OPTIONAL    For evaluating the config file  Any argument different from  y  or  yes  will be considered a  no     e This evaluates whether the settings in the deploy configuration file is compatible with the standard settings  See the Evaluation  section below    e  A  OPTIONAL    The archive database  has to be either   zip  or   jar     e This database will be used for both the ArcRepository and the DatabaseBasedActiveBitPreservation  If the database is not given  as an argument  a default empty archive database in the NetarchiveSuite package file is used  The database has to be placed in  an unzippable file    zip  or   jar    a
48. portant to notify that when a new application is added to a machine  which already has an application of the same instance  these  applications must have the settings common applicationInstancelId defined with different values     Some of the applications require some specific settings to be defined   This is described in the following specifically    BitarchiveApplication    The dk netarkivet archive bitarchive BitarchiveApplication requires the settings  settings archive bitarchive baseFileDir to be defined   This path should be changed  and it has to be changed if the drive partition in the path does not exist on the machine     HarvestControllerApplication    For the dk netarkivet harvester harvesting HarvestControllerApplication the following settings defined under  settings harvester harvesting heritrix should be changed to fit your system  guiPort and jmxPort     A new instance of the dk netarkivet harvester harvesting HarvestControllerApplication requires the settings  settings harvester harvesting queuePriority to be defined to either LOWPRIORITY or HIGHPRIORITY   A system requires at least one  HarvestControllerApplication with each priority     IndexServerApplication and ViewerProxyApplication     Both the dk netarkivet archive indexserver IndexServerApplication and  dk netarkivet viewerproxy ViewerProxyApplication should have the settings common http port and the  settings viewerproxy baseDir  Changed to fit your system     BitarchiveMonitorApplication    All the 
49. pper tcp PORTMAPPER 7676  sessionid 1729683678303517696     cluster_discovery tcp CLUSTER_DISCOVERY 46760    jmxrmi rmi JMX 0  url service jmx rmi   udvikling kb dk stub ro0   Hg        admin tcp ADMIN 46763      jms tcp NORMAL 46762  cluster tcp CLUSTER 46764      Connection closed by foreign host     en eee ee eee eee ee ee eee eee eee eee eee eee eee eee eee ed      S INSTALLATION_DIR mg lib jms jar  INSTALLATION_DIR mq lib imq  jar    H   5  Q  H   3  Q  Q  3  Q  Oo  w  n  n     O  5  T  J  FJ  ae   tA  D  Q  B  z  H  H  7  ae   D  ep   ep      O  ye   J    How to empty queues    log on as root to the server  where the JMS broker is installed  The following assumes that the JMS environmentName is PROD  and that JMS  password file resides in  root  imq_passfile     export JMS_ENV PROD  export MQ _HOME  usr local    imqcmd using  u admin  passfile    imq_passfile       SMQ_HOME bin imgcemd list dst  t q  u admin  passfile    imq_passfile   grep  S JMS_ENV _   cut  f1  d    xargs  r  n 1 SMQ_HOME bin imgcemd destroy dst  t q  u admin  passfile    imq_passfile  f  n  SMQ_HOME bin imgemd list dst  t t  u admin  passfile    imq_passfile   grep  S JMS_ENV _   cut  f1  d      xargs  r  n 1 SMQ_HOME bin imgcemd destroy dst  t t  u admin  passfile          imq_passfile  f  n       export MQ_HOME  usr local f     MQ HOME mq bin imqbrokerd  vmargs   Xms256m  Xmx512m   reset store  tty  amp        which adds min 256Mb and max 512MB heap space    Installing and configuring FTP    If yo
50. r must be accessible from all machines in the installation on not only port 7676  but also port  33700  from RMI      Java    All machines must run Java version 1 6 0 or higher     Choose the set of machines taking part in the installation deployment    When you have chosen a scenario  you must decide on the number of machines  you want to use in the deployment of the NetarchiveSuite  For  scenario A  the answer is of course one  For the scenarios B  C  and D  the answer is more complicated     An extra complication is added by installing the system at two different physical location  here referred as EAST and WEST   The distinction  between different physical location are relevant if the system is installed at two different institutions with firewalls between them     At the Danish installation  we operate with 5 kinds of machines     e Admin machine  one server   Here we deploy one or more BitarchiveMonitorApplications  one for each bitarchive Replica   one  ArcrepositoryApplication  one GUIApplication  and a JooManagerApplication  which takes care of job scheduling    e Harvester machines  one or more   Here we deploy the HarvesterControllerApplications    e Bitarchive machines  one or more   These machines only run one BitarchiveApplication each  there must be at least one for each  bitarchive Replica     e Access servers  one or more   On these machines  we have the ViewerproxyApplication enabling us to browse in already stored  webpages  and the IndexServerApplication  
51. rio  multiple machines are involved  necessitating file transfer between machines and multiple  installations of the code  However  the machines are expected to be within the same firewall  so port setup should be no problem    e C  Single site setup with duplicate archive  This expands on the single site set up in that more than one copy of the archived files are  used  using the concept of separate  Replica  to indicate the duplicates    e D  Multi site setup  When more than one site  physical location  is involved  separated by firewalls  extra issues of opening ports and  specifying the correct site come into play  This is the most complex scenario  but also more secure against systematic errors  hacking   and other threats     Choose Repository    Scenario A and B from section Choose a platform involve having a local arcrepository without means of bitarchive replicas  This is configured by a  plug in  please refer to Configure Plugins in the Configuration Manual      Scenarios C and D from section Choose a platform involve having distributed bitarchive replicas  In these scenarios we have at least two  bitarchive replicas  The Replica information must be configured before deployment either in the local settings file or included in the deploy  configuration file for your system  please refer to Configure Repository in the Configuration Manual      Choose the type of database    The NetarchiveSuite can use three types of database     e Derby database  default   e PostgreSQ
52. rix jmx logins  though the monitor jmx login and heritrix jmx login does not have to be  the same     Log property file  A log property file for each application is created  This file is given as input and it is changed to fit the application     The only changes in the log property file are    e Changing the tag APPID to the identification of the application  applicationName   _    applicationInstancelId   Where the       applicationInstanceld only is appended to the applicationName if the application has an applicationInstanceld  defined     e Removing any ConsoleLoggers defined on Windows machines  as these have been found to cause applications to hang    The name of this application specific log property file is   log_    applicationIdentification     prop   Where the  applicationIdentification is given as applicationName   _    applicationInstanceld  as described above     Security policy file    The security policy file for a machine is initially a copy of the security policy file given as argument  This machine specific security policy file is then  modified to suit the needs of the machine and it s applications     The tag ROLE is replaced by the monitor jmxUsername for the machine  This has to be defined on the machine level in the deploy configuration  file     Permission to read the baseFileDir under bitarchive for all applications is granted  The path to these directories are changed to fit the language in  security policy     Evaluate    It is possible to evaluat
53. s  We suggest using just one BitarchiveServer for each  machine  though it is possible to use more than one     Each BitarchiveServer can have storage on several filesystems  so if archive storage is spread over more than one filesystem  you need to modify  the settings file like this     lt settings gt    lt archive gt    lt bitarchive gt      lt baseFileDir gt  home fileSys1  lt  baseFileDir gt    lt baseFileDir gt  home fileSys2  lt  baseFileDir gt      lt  bitarchive gt    lt j archive gt      lt  settings gt     Starting a BitarchiveServer requires knowing what Replica it resides on  and the credentials required for correcting the data stored in the  bitarchive  for ReplicaOne with id ONE this would be     cd SdeployInstallDir      export APP_OPTIONS   Dsettings archive bitarchive useReplica  Id 0NE     Dsettings archive bitarchive thisCredentials CREDENTIALS   f export APP dk netarkivet archive bitarchive BitarchiveApplication    java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS   APP_OPTIONS SAPP    Access servers    On the access servers  we deploy any number of ViewerProxyApplication instances  and maybe one IndexServerApplication  only one in all   used to generate indices needed by the harvesters and the ViewerProxyApplication instances     cd SdeployInstallDir  export APP dk netarkivet archive indexserver IndexServerApplication  i java SJAVA_OPTS SSETTING  LOG_SETTINGS SJMX_SETTINGS SAPP i    Each ViewerproxyApplication instance uses a application instance id s
54. separated with     on Linux Unix and with     on Windows     Application Instance Id    The scope settings common applicationInstanceld defines identification of a single application instance  e g  suffix for application specific scripts   suffix for directory to place files etc    This is needed to provide unique identifiers  and hence JMS queue names  for applications in cases where  there are mulitple instances of the same application on the same machine  e g  BitarchiveMonitors or HarvestControllers      An example of two identical applications with different application instance id on the same machine is given below      lt deployGlobal gt    lt thisPhysicalLocation name  myPhysicalLocation  gt    lt deployMachine name  myMachine  gt    lt applicationName name  dk netarkivet archive bitarchive BitarchiveApplication  gt    lt settings gt    lt common gt    lt applicationInstanceld gt myFirstInstance lt  applicationInstancelId gt    lt  common gt    lt  settings gt    lt  applicationName gt    lt applicationName name  dk netarkivet archive bitarchive BitarchiveApplication  gt    lt settings gt    lt common gt    lt applicationInstancelId gt mySecondInstance lt  applicationInstanceld gt    lt  common gt    lt  settings gt    lt  applicationName gt    lt  deployMachine gt    lt  thisPhysicalLocation gt    lt  deployGlobal gt     These application will be called  BitarchiveApplication_myFirstInstance and  BitarchiveApplication_mySecondInstance respectivly     Limitations and 
55. sion java net SocketPermission  127 0 0 1 3306     Connect  resolve      Firewall note  You will need to allow the GUIApplication and the HarvestTemplateApplication to be able to access port 3306 on the server where  you run the database     This jar must then be added to the classpath for the applications  that accesses the database  GUIApplication and HarvestT emplateApplication    You can do this manually  when starting these applications  Alternatively  you can add the mysal connector java 5 0 X bin jar to the lib db  directory  and modify build xml accordingly     e Add aline db mysql connector java 5 0 X bin  jar to the property jarclasspath just below the line  db derby 10 1 1 0 4ar   e Add a line   lt include name  db mysql connector java 5 0 X bin jar   gt  below include name  db derby 10 1 1 0 jar  gt     You can then generate a new NetarchiveSuite zipball with    w  3  ct  B  D     D  w  n  D  N  H   ue   oO  w  Ju  Ju    This assumes  that you have downloaded the source distribution of the NetarchiveSuite     PostgreSQL Database    To be written         Choose a JMS broker    NetarchiveSuite requires a JMS broker to run  The only type of JMS broker supported at this time is the SunMQ broker and its open source  counterpart Open Message Queue     The installation and start up of a JMS broker is described in Appendix A   For description of how to configure the JMS broker  please refer to the Configure JMS Broker     Firewall note  The machine that runs the JMS broke
56. ssage  The following is a suggested order of startup     NetarchiveSuite application startup order    1  Start the databases used by NetarchiveSuite and the message broker   2  The BitarchiveApplication  one or more  on all bitarchive servers is started     Q  A  D  D  ct  w  BK  z  H    lt   D  ct  w  BK  Q  T  H    lt   D  oO  H   ct  w  BK  Q  ao  H    lt   D  W  H   ct  a   BK  Q  T  H    lt   D  D  KO   ue      H   Q  w  ct  H   O  D      dk netarkivet common webinterface GUIApplication    dk netarkivet archive arcrepository ArcRepositoryApplication      dk netarkivet archive bitarchive BitarchiveMonitorApplication for Replica One      dk netarkivet  harvester scheduler HarvestJobManagerApplication    dk netarkivet archive bitarchive BitarchiveMonitorApplication for Replica Two    4  The applications on the harvester machines are started  Start each HarvesterControllerApplication instance deployed on this machine  5  The applications on the access servers are started by first starting the IndexServer and then one or more ViewerproxyApplication  instances     NetarchiveSuite application stopping order    After locating the process id of any given process  the actually killing of the process is done on unix machines with the kill command     ran  H         UW  A   HH  UO    The killing itself is done in the following order     1  The applications on the admin machine are killed       dk netarkivet harvester scheduler HarvestJobManagerApplications     dk netarkivet common w
57. t the start value is 2  starting automatically     e Create a new  Key  called Parameters    e In this  Key  create a new  String Value    called Application  which contains the complete path to the bat script  e g   c  users USERNAME ENV_NAME conf restart bat     e Also within the  Key  create another    String Value    called AppDirectory  which should contain a path to the directory where the  bat script is placed  e g  c   users USERNAME ENV_NAME conf      Now the application should automatically start during Windows startup     Appendix_A     Appendix_C       Appendix C   Easy Installation of NetarchiveSuite    Contents  e Examples of deploy configuration files  e How to add a harvester more on the same machine and set all to HIGHPRIORITY selective harvesting  e How to configure which Heritrix report has to be uploaded in the metadata ARC file    e Verify that you have all the needed software installed according to Quick Start Manual eg  in  home test netarchive by starting the  Quickstart     Below  you find other deploy examples   They have to be modfied to your environment      e You can now create  run and browse according to the  QuickStart   or User Manual     Examples of deploy configuration files    The following example of configuration file requires adaptation to your own system before use   deploy_distributed_example xml  The instance with two replicas divided over two physical locations Each physical locations contain several machines Bitarchive machines     
58. tallDir lib dk netarkivet wayback jar  LASSPATH SCLASSPATH  SdeployInstallDir lib dk netarkivet monitor jar    export  export                                        C  C  C  export C     lt  lt Anchor CommandLineLogging  gt  gt     Logging    We use the apache commons logging framework  so we need to point to the wanted logger class  eg   org apache commons logging impl Jdk14Logger  as well as to the logging configuration file  You may want to use different logging properties for  different applications  especially when more than one application logs to the same logging directory  E g  you want the change line  java util logging FileHandler pattern   log APPID u log in the conf log prop file to something different     export LOG_SETTINGS      Dorg apache commons logging Log org apache commons logging impl Jdkl4Logger     Djava util logging config file SdeployInstallDir conf log prop     Note that if you use the MonitorSiteSection  your logging properties file must contain the handler  dk netarkivet monitor logging CachingLogHandler    handlers java util logging FileHandler  java util logging ConsoleHandler     dk netarkivet monitor logging CachingLogHandler    JMX settings    Each application instance on a given machine has its own JMX  and RMI port  For example the JMX port could be 8100 and the associated RMI  port 8200  as in the example below  for the first application instance on the machine   then 8101 8201 for the second application instance  and so  on  JMX also uses 
59. ther JVM options     D  X  KO   O  K     Q  D   lt   w  O  pa   H  ep  II     x     X       on  w  O   3    Admin machine    On the admin machine  we have to start the following 5 applications     1 GUIApplication    1 HarvestUobManagerApplication  handles the scheduling of jobs    e 2 instances of BitarchiveMonitorApplication  Controlling the access to a single bitarchive replica   one for each bitarchive replicas  e g   EAST and WEST     e 1ARCRepositoryApplication  this application handles access to the bitarchive replicas      Starting the GUIApplication    Before  we can start the GUlApplication  the external database needs to started in advance  The deploy software does for you if the external  database is a derby database      We also need to prepare the JSP pages  You can unzip the war files in the webpages directory as below     cd SdeployInstallDir webpages   rm  rf BitPreservation   unzip  0o BitPreservation war  d BitPreservation   rm  rf HarvestDefinition   unzip  0o HarvestDefinition war  d HarvestDefinition    rm  rf History f  i unzip  o History war  d History i    rm  rf QA i  unzip  0o QA war  d QA    rm  rf Status l  unzip  o Status war  d Status l    l  lt common gt     i  lt webinterface gt  i     lt siteSection gt    lt     A subclass of SiteSection that defines this part of the  web interface     gt    lt class gt dk netarkivet harvester webinterface DefinitionsSiteSection lt  class gt  I   lt        The directory or war file containing the web applicat
60. ttpOffsetPort  e g  Offset  test_HttpPort  test_HttpoOffsetPort   The value of this Offset  must be between 0 and 9     The test argument is applied to deploy_config_test file  where the following changes are made     e The environtmentName is changed to test_EnvironmentName   e For every level the test_HttpPort replaces the value in the settings path  settings common http port   e For every level the test_Mailreceiver replaces the value in the settings path  settings common notification receiver   e For every level the offset replaces a single digit in some four digit ports under settings  This is seen in the table below   Path index  settings common jmx port 3  settings common jmx rmiPort 3    settings harvester harvesting heritrix guiPort 2    settings harvester harvesting heritrix jmxPort 2    E g  Offset   7 and a settings common jmx port   1234 will yield a new settings common jmx port   1274 for the test instance  whereas a  settings harvester harvesting heritrix jmxPort   1234 will yield a new  settings harvester harvesting heritrix jmxPort   1734     Install    An installation script is created for each physical location  This script contains the commands for making the installation on all the machine of the  physical location as described in the pseudo code     The figure below shows the pattern of installation     CJ    Output dir gt     _                    Qinstall_ lt Physical Location gt  sh  J startall  lt physical locatlon gt  sh  J killall   physical locatio
61. u decide to use FTPRemote for file transfer in the NetarchiveSuite  you need to install and start one or more FTP servers  before you begin  the installation of the NetarchiveSuite  Any brand of FTP servers will probably do  but we have good experience with Proftpd     You can download Proftpd from http   www proftpd org   We are using version 1 2 10  but any recent non beta version will probably do     The text below shows part of the proftpd conf needed by NetarchiveSuite  Other parameters in proftpd conf may be left with their default values       Port 21 is the standard FTP port    Port 21     Umask 022 is a good standard umask to prevent new dirs and files    from being group and world writable    Umask 022   To prevent DoS attacks  set the maximum number of child processes  to 30  If you need to allow more than 30 concurrent connections  at once  simply increase this value  Note that this ONLY works  in standalone mode  in inetd mode you should use an inetd server                     that allows you to limit maximum number of processes per service     such as xinetd     MaxInstances 250      Set the user and group under which the server will run     User nobody   Group nogroup  Group nobody      To cause every FTP user to be  jailed   chrooted  into their home    directory  uncomment this line    DefaultRoot      Normally  we want files to be overwriteable      This is necessary to allow the append operation  AllowOverwrite on  AllowStoreRestart on    Bar use of SITE CH
62. ut these are still untested with our software      Note  We only support installation on the Linux platform here  However  you may want to install your JMS broker on a different platform  Binary  versions are available at the site for  Solaris Sparc  Solaris x86  Linux  x86   Windows  x86   If you want to build a binary for another platform  the  source can be downloaded from the download page     Installing the JMS broker    Select Linux server where you want to install JMS broker  and select an installation directory  Then log on the linux server as root  and do the  following     export INSTALLATION_DIR  path to installationdir    cd SINSTALLATION_DIR    unzip mq4_l binary Linux_X86 XXXXXXXX  jar      chmod  x   mq bin imgbrokerd    mg bin imqbrokerd  reset store  tty  tests that the broker can start   CTLR C to stop       We are now ready to configure the JMS broker     Configuring the JMS broker    e Edit the file SINSTALLATION_DIR mg etc imgenv conf to set IMQ DEFAULT JAVAHOME to a JDK1 5 0   e Changing the number of the listening port number 7676 is done by editing the line   img portmapper port 7 676    in the file    SINSTALLATION_DIR mgq lib props broker default properties  e Set max listeners any given queue to 20  You need to make sure  that the following line   img autocreate queue maxNumActiveConsumers 20    is present and not commented out in the file    SINSTALLATION_DIR mgq var instances imgqbroker props config properties     increase the number 20 if you hav
63. will be dealt with  and it will be possible in a future release     Machine    The name of a machine must be set to either its network name or IP address   The  os  attribute should only be set for the windows machines  which can only run applications of the instance  dk netarkivet archive bitarchive BitarchiveApplication     A  Q  O  Ze  ke  O  he  Ss  w  Q  T  H   5  D  O  n  I  z  H   J  O  O     n  5  w  z  D  I     if  Q   o  i  O  w  fi          Ke  o  H   F  w  D  RI  H   a     oO  O   a  vV    Change the following parameters to fit to the machine definition   A machine needs to have the following parameters defined or inherited from a higher level      lt deployMachineUserName gt test lt  deployMachineUserName gt    lt deployInstallDir gt  home test lt  deployInstallDir gt     There are no specific settings required at the machine level  which is not inherited by the outer scopes   And therefore no settings to change to fit to your system     Application    All applications need the following settings defined under settings common  jmx     On any given machine  these parameters must have unique values for each application     A new application needs the name attribute to be defined as the fully qualified classname of the application     A  w  1e  5  ke  H   Q  w  an  H   O  5  Z  w  3  O  5  ie      0   I  O   aw  2  0   T  w  K  a  H   4  0   ce  Q  O  3  3  O  5  z  0   oO  H   3  a  D  6  Hh  w  Q  O   Q     aa   D  O  ge       H   Q  w     H   O     Vv    It is im
64. yMachine gt  and 1527 for correct port    e Need to add a permission to the policy file used by your installation  if you use security  see below   The following will allow  NetarchiveSuite to access a Derby database on port 1527     grant    permission java net SocketPermission  127 0 0 1 1527     Connect  resolve      Firewall note  You will need to allow the GUIApplication and the HarvestTemplateApplication to be able to access port 1527 on the server where  you run the database     More details on using Derby as a server are available on http   db apache org derby docs dev adminguide cadminov825266 html the derby pages     MySQL Database    If you want to use a MySQL database  you have to     e Set the setting settings common database class to dk netarkivet harvester datamodel MySQLSpecifics   e Set the setting settings common database url correctly   jdbc mysql   localhost fullhddb user root  amp password secret  substitute the server host for localhost  and  username password for root secret    Install the MySQL database server  v  5 0 X  on a machine of your choice   Create an empty database on the server using the schema definition in scripts sql createfullhddb mysql   Download a mysal connector java 5 0 X bin jar from http   dev mysql com downloads connector j 5 0 htm    Add a permission to the policy file used by your installation  if you use security  The following will allow NetarchiveSuite to access MySQL  on localhost on the default port 3306     grant    permis
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  JBLed A8 Zoom Bedienungsanleitung  Sika Viscocrete 20 HE  Apple 073-0808 REV. C User's Manual  SIKA® 2 - Materiales Jerez  Samsung GT-I9515 Manual de Usuario  Intenso 8" PhotoPartner  Télécharger le manuel.  SPH-DA120 - avicfeeds.com    Copyright © All rights reserved. 
   Failed to retrieve file