Home
Oracle Ultra Search User's Guide
Contents
1. Oracle Ultra Search supports secure searches which return only documents satisfying the search criteria that the search user is allowed to view To turn on secure search in the query application follow these steps 1 2 Deploy Oracle Ultra Search query ult rasearch_query ear Edit the OC4J jazn xm1 file to connect to Oracle Internet Directory For example lt jazn provider LDAP default realm us location ldap localhost 3060 gt lt property name ldap user value orcladmin gt lt property name ldap password value welcome gt lt jazn gt Restart OC4J Edit applications ultrasearch_ query META INF orion application xml to turn on JAZN LDAP Edit applications ultrasearch_query query WEB INF web xml to enable login functionality in usearch jsp For example lt init param gt lt param name gt login enabled lt param name gt lt param value gt true lt param value gt lt init param gt Enable mod_osso in Apache Access http hostname port ultrasearch query usearch jsp to see the login function and test secure search See Also Secure Search on page 1 6 Installing and Configuring Oracle Ultra Search 3 9 Configuring the Default Oracle Ultra Search Instance Backend Reconfiguration After a Database Character Set Change If the database character set has been changed after Oracle Ultra Search installation you must reconfigure the Oracle Ultra Search backend
2. c ccccccecsesesesteesesceneteseeceesessseanenenens 1 3 Oracle Ultra Search Features irinta iesen ct ate oe nl aii 1 4 Integration with Oracle Application Server c ccccssccsesescecenesesesneneneseseeeeeseseecenesescsesnanenees 1 5 Extensible Crawler and Crawler Agent cccccscsecesessesesesesceesssesnsnseseseeeeeseseecenesessseananenees 1 5 Federated Searchin n inno eee en Mies Pe Raat iy es aa eae vices 1 6 Secure Sear hise ces isi he seeeve EEE aie tte oes ads eer veceteceder sand elev is Sane AEE 1 6 Dependency on Oracle XML DB ccccccscesesssnesesesesteeeseececesescsceanesesescsneneseseeceeesesssnananenens 1 7 Sample Query Applications edene tech E tases etal e dean EE E E 1 8 Sample Search Portlet rons ron a RE E E R e 1 8 Query APD enii e r e E E a EAE EEEN EEA EEA E 1 9 URL Rewrite ieusi ae ea e E E a AE E ETER SEES 1 9 Robots Exclusions saienisi skoas heisipiieese ianea ba Saba einders an eea eaaa as slab Aero S E OPen 1 9 Display URL Support AREETA E A ETA 1 10 Document and Search Attributes sisseoste iisi esiisa Reiha eib koa na Eno ai EEE ia iaeiei 1 10 Metadata Lo der sererek Pae nE E dive a NAE A EA E EEDS E Sheets on 1 10 Document Relevancy Boosting einc ninae innie ieren esee ean iaraa etaik 1 11 Data Harvesting Mode c ccccccccscsscsssssesssssssseescecesesescsesnsnesesesesceseseseecenesesesesusnenesesceeesesesesanenens 1 11 Instance Snapshot SUPPOTt cccccsesessescsescsseescscesesescsesnsnensssse
3. for many things including document management access control or version control Different data sources can have attributes of different names which are used for the same purpose for example version and revision It can also have the same attribute name for different purposes for example language as in natural language in one data source but as programming language in another Search attributes are created in three ways a System defined search attributes such as title author description subject and mimetype a Search attributes created by the system administrator a Search attributes created by the crawler During crawling the crawler agent maps the document attribute to a search attribute with the same name and data type If not found then the crawler creates a new search attribute with the same name and type as the document attribute defined in the crawler agent The list of values LOV for a search attribute can help you specify a search query If attribute LOV is available then the crawler registers the LOV definition which includes attribute value attribute value display name and its translation Crawling Process for the Schedule The first time the crawler runs it must fetch Web pages table rows files and so on based on the data source It then adds the document to the Oracle Ultra Search index The crawling process for the schedule is broken into two phases 1 Queuing and Caching Documents 2 Ind
4. 10 16 Oracle Ultra Search User s Guide Crawler Configuration AP Is SET_ADMIN_READONLY Syntax Examples Use this procedure to prevent a crawler configuration setting from being modified from the administration GUI page This procedure is useful when a setting such as the location of a cache directory should not be controlled from the administration GUI this might be the case for example when the people managing the server machine do not also manage Oracle Ultra Search OUS_ADM SET_ADMIN_READONLY config_name IN NUMBER read_only IN NUMBER DEFAULT YES crawler_id IN NUMBER DEFAULT LOCAL_CRAWLER i config_name The name of the crawler configuration setting Possible values are Configuration Name Description CC_CACHE_DIRECTORY crawler cache directory path CC_CACHE_SIZE size of the cache in megabytes CC_CACHE_DELETION enable disable removing cache files after indexing CC_LOG_DIRECTORY crawler log file location read_only Set to YES to prevent the setting from being modified from the GUI crawler_id The ID of the crawler whose configuration you are modifying This may be set either to LOCAL_CRAWLER or the ID of a remote crawler OUS_ADM SET_ADMIN_READONLY ous_adm CC_CACHE_DIRECTORY ous_adm YES OUS_ADM SET_ADMIN_READONLY ous_adm CC_LOG_DIRECTORY ous_adm NO Administration PL SQL APIs 10 17 UPDATE_CRAWLER_CONFIG UPDATE_CRAWLER_CONFIG Syntax Example
5. 7 8 Oracle Ultra Search User s Guide Web Crawling Boundary Control Each crawler thread fetches the document from the Web The page is usually an HTML file containing text and hypertext links Each crawler thread calculates a checksum for the newly retrieved page and compares it with the checksum of the cached page If the checksum is the same then the page is discarded and crawler goes to step 3 Otherwise the crawler moves to the next step Each crawler thread scans the document for hypertext links and inserts new links into the URL queue Links that are already in the document table are discarded Crawler caches the document in the local file system See Figure 7 2 Crawler registers URL in the document table If the file system cache is full or if the URL queue is empty then Web page caching stops and indexing begins Otherwise the crawler thread starts over at Step 3 Web Crawling Boundary Control Oracle Ultra Search provides the following mechanisms to control the scope of a Web data source crawling URL boundary rule domain rule and path rule Robots txt file and robots META tag Crawling depth URL Rewriter API URL Boundary Rule The URL boundary rule consists of domain rules and path rules A domain rule specifies the set of Web sites allowed using a host name prefix or suffix A path rule specifies the URL file path allowed or disallowed for a particular host You can specify an inclusion or exclusion ru
6. SORACLE_HOME jdbc lib classes12 jar SORACLE_HOME jdbc lib nls_charset12 zip SORACLE_HOME 1lib xmlparserv2 jar SORACLE_HOME 1lib activation jar SORACLE_HOME lib mail jar Oracle Ultra Search Portlet uses the connection pooling functionality of J2EE container You must define a container authenticated data source This data source must return an Oracle connection Oracle recommends using the Java class equal to oracle jdbc pool OracleConnectionCacheImp1 for this data source In addition the data source should contain the field location equal to jdbc UltraSearchPooledDS user name password equal to the Oracle Ultra Search instance owner s database user name and password and URL equal to the JDBC connection string in the form of jdbc oracle thin database_ host oracle_port oracle_sid See Also Editing the data sources xml File on page 3 21 for the data source configuration of Oracle J2EE container Editing the data sources xml File Caution Storing clear text passwords in data sources xml poses a security risk Avoid this by using password indirection to specify the password This lets you enter the password in jazn data xml which automatically gets encrypted and point to it from data sources xml For more information see Creating An Indirect Password in Oracle Application Server Containers for J2EE Security Guide Installing and Configuring Oracle Ultra Search 3 21 Installing the Oracle Ultra Search Middle
7. sys apps ultrasearch sys apps ultrasearch_acl xml If you do not see this confirmation then this step has failed and you cannot proceed Recheck that all previous steps were performed correctly Step 5 Turn on secure search functionality in Oracle Ultra Search Because there is currently no way to programatically verify a proper Oracle OID installation the secure search functionality in Oracle Ultra Search is turned off by default You must explicitly turn on this feature after completing all previous steps Step 6 Turn On Secure Search in the Query Application To turn on secure search functionality in Oracle Ultra Search 1 Login to the Oracle Ultra Search database using SQL Plus as user WKSYS 2 Invoke the following PL SQL API exec WK_ADM SET_SECURE_MODE 1 The argument 1 indicates that you are turning on secure search 3 8 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Backend After you have turned on secure search functionality you can create Oracle Ultra Search instances that are secure search enabled Note At any subsequent point in time you can turn off security by invoking WK_ADM SET_SECURE_MODE 0 Doing so designates that any instances created after that will not support secure searches However existing secure search enabled instances are not modified Hence if the Oracle OID link ceases to function you cannot perform searches on crawled documents that are secured
8. Dirt coming out wrong color Solved by bleaching dirt INSERT TO problems VALUES 8 Claire 5 20 Jun 03 Overheats cheese Solved by using better quality cheese INSERT TO problems VALUES 9 Dontenmann 2 08 Jan 03 Gum stuck Solved by increasing power INSERT TO problems VALUES 10 Glass 4 22 Aug 03 Ferret allergic to magnets Solved by washing magnets INSERT TO problems VALUES 11 Heyboll 1 03 Sep 02 Solved by ducking will not shake out Flying cantaloupes injuring family Crawl and Index Ultra Appliance s Intranet Documents This section describes the steps you acting as the Ultra Appliance search administrator use to set up Oracle Ultra Search to crawl and index the company intranet After you perform this setup call center agents can use Oracle Ultra Search to obtain information about the Springmaster 2000 refrigerator 2 6 Oracle Ultra Search User s Guide Crawl and Index Ultra Appliance s Intranet Documents To crawl and index the Ultra Appliance intranet 1 Log on to Oracle Ultra Search using the Oracle Ultra Search Administration Tool screen In your Web browser enter the domain and port of the computer where you have Oracle Ultra Search installed followed by ultrasearch admin index jsp http your_computer domain http_port altrasearch admin index jsp For example http compl ultrasupply com 7778 ultras
9. Oracle Ultra Search is a client program to the Oracle server at run time It can be deployed in two configurations in the backend or in the middle tier The Oracle Ultra Search query interface and the administration tool can be accessed from any HTML browser client The administration tool relies on certain Java classes in the middle tier This logical middle tier can be the same physical computer as the one that runs the database server or a different one running Oracle Application Server The Oracle Ultra Search database backend consists of the Oracle Ultra Search data dictionary that stores metadata on all the different repositories as well as the schedules and Java classes needed to drive the crawler The crawler itself can run either on the database server computer or remotely on another computer See Also Chapter 3 Installing and Configuring Oracle Ultra Search for more information about the components Figure 1 1 illustrates the Oracle Ultra Search system configuration 1 14 Oracle Ultra Search User s Guide Oracle Ultra Search System Configuration Figure 1 1 Oracle Ultra Search System Configuration Client Oracle Ultra Search Admin Tool Middle tier Oracle Ultra Search Backend Remote Crawler Component Oracle Ultra Search admin tool Web Server gt J2EE Engine Oracle Ultra Search Java support files aa PL SQL packages Oracle Text for indexing DBMS _JOBS f
10. Use this procedure to update crawler configurations OUS_ADM UPDATE_CRAWLER_CONFIG config_name IN NUMBER config_value IN VARCHAR2 i config_name The name of the crawler configuration config_value The configuration value Possible values are Configuration Name Description Value CC_CACHE_DIRECTORY crawler cache directory path any valid directory path CC_CACHE_SIZE size of the cache in megabytes any positive integer CC_CACHE DELETION enable disable removing cache OUS_ADM DELETE_CACHE or files after indexing OUS_ADM_KEEP_CACHE CC_LOG_DIRECTORY crawler log file location any valid directory path CC_DATABASE connection string to the any valid JDBC connection backend database string CC_PASSWORD database connection password database connect password for for the crawler to connect to the schema that owns the the database instance CC_JDBC_DRIVER JDBC driver type used by the OUS_ADM THIN_DRIVER or local crawler OUS_ADM OCI_DRIVER OUS_ADM UPDATE _CRAWLER_CONFIG ous_adm CC_CACHE_DIRECTORY privatel ultrasearch cache OUS_ADM UPDATE_CRAWLER_CONFIG ous_adm CC_CACHE_SIZE 15 10 18 Oracle Ultra Search User s Guide A Loading Metadata into Oracle Ultra Search Oracle Ultra Search provides a command line tool to load metadata into an Oracle Ultra Search database If you have a large amount of data then this is probably faster than using the HTML ba
11. lt xsd element gt lt xsd sequence gt lt xsd complexType gt lt xsd element gt lt xsd schema gt XML Schema for LOVs and LOV Display Names The XML schema for LOV entries and display names is described as follows lt xml version 1 0 encoding UTF 8 gt lt Generated by XML Authority Conforms to w3c http www w3 org 2001 XMLSchema gt lt xsd schema xmlns xsd http www w3 org 2001 XMLSchema elementFormDefault qualified gt lt xsd element name lov_list gt lt xsd complexType gt lt xsd sequence gt Loading Metadata into Oracle Ultra Search A 5 XML Schema for LOVs and LOV Display Names lt xsd element name lov maxOccurs unbounded gt lt xsd complexType gt lt xsd sequence gt lt xsd element name default minOccurs 0 gt lt xsd complexType gt lt xsd sequence gt lt xsd element name lov_values minOccurs 0 gt lt xsd complexType gt lt xsd sequence gt lt xsd element name entry maxOccurs unbounded gt lt xsd complexType gt lt xsd attribute name value use required type xsd string gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd complexType gt lt xsd element gt lt xsd element name lov_display_names minOccurs 0 maxOccurs unbounded gt lt xsd complexType gt lt xsd sequence gt lt xsd element name entry maxOccurs unbounded gt lt xsd complexType gt lt
12. Post Installation Information 4 9 Upgrading Oracle Ultra Search Upgrading Oracle Ultra Search to Oracle Collaboration Suite Release 1 See Also Oracle Ultra Search Release Information describes the Oracle Ultra Search release numbering Pre Upgrade Steps Before you upgrade log on to the Oracle Ultra Search administration tool Stop and disable all crawler synchronization schedules in every Oracle Ultra Search instance You can enable all crawler synchronization schedules after the upgrade See Schedules Page on page 8 34 for details on how to stop and disable the synchronization schedule Upgrading Oracle Ultra Search Shipped with Oracle Database To upgrade Oracle Ultra Search shipped with the Oracle Database release do the following 1 Run the Oracle Ultra Search backend upgrade This includes upgrading the Oracle Ultra Search database schemas and server files Install the new Oracle software and run Oracle Database Upgrade Assistant to upgrade the database and Oracle Ultra Search component to the new release See the Oracle Database Upgrade Guide for details 2 Follow the steps in Installing the Oracle Ultra Search Middle Tier on Web Server Hosts on page 3 11 to install the new Oracle Ultra Search middle tier Upgrading Oracle Ultra Search Shipped with Oracle Application Server To upgrade Oracle Ultra Search shipped with the Oracle Application Server do the following 1 Install the new Oracle Application Se
13. Searches run against a secure search enabled Oracle Ultra Search instance are slower than those run against a non secure search enabled instance This is because each candidate hit could require an ACL evaluation ACLs are evaluated natively by the Oracle server for optimum performance Nevertheless this is a finite time Therefore the time taken to return hits in a secure search varies depending on the number ACL evaluations that must be made Dependency on Oracle XML DB Oracle Ultra Search stores ACLs in the Oracle XML DB repository Oracle Ultra Search also uses Oracle XML DB functionality to evaluate ACLs This dependency only exists for those users who are making use of secure searching The ACLs are managed by Oracle Ultra Search ACLs are uniquely referenced by documents from a single Oracle Ultra Search instance ACLs are not shared by multiple Oracle Ultra Search instances For acceptable performance the ACL cache size must be large enough to contain all ACLs evaluated at run time ACLs in the XML DB repository are protected by other ACLs known as protector ACLs Oracle Ultra Search ensures that the protector ACLs grant appropriate privileges in order for Oracle Ultra Search to invoke the XML DB ACL evaluation mechanism The evaluation performance is primarily affected by the total number of ACLs used by all XML DB client applications that also utilize its ACL evaluation mechanism This set of applications includes Oracle Ultra
14. When the file system cache is full default maximum size is 20 megabytes document caching stops and indexing begins In this phase Oracle Ultra Search augments the Oracle Text index using the cached files referred to by the document table See Figure 7 3 Understanding the Oracle Ultra Search Crawler and Data Sources 7 7 Data Synchronization Figure 7 3 Indexing Documents Oracle Ultra Search Index Oracle uses cached pears nen EET HTML files to Peels soos create index data Server document table with Oracle Text and ey Base table Oracle Ultra Search SA contains cached Sig filenames Data Synchronization After the initial crawl a URL page is only crawled and indexed if it has changed since the last crawl The crawler determines if it has changed with the HTTP If Modified Since header field or with the checksum of the page URLs that no longer exist are marked and removed from the index To update changed documents the crawler uses an internal checksum to compare new Web pages with cached Web pages Changed Web pages are cached and marked for reindexing The steps involved in data synchronization are the following 1 Oracle spawns the crawler according to the synchronization schedule you specify with the administration tool The URL queue is populated with the data source URLs assigned to the schedule 2 Crawler initiates multiple crawling threads 3 Each crawler thread removes the next URL in the queue
15. mail css Style sheet for sample email Web application Sample JavaServer Page Mailing List Browser Applications Files File Description mail jsp Mailing list browser applications that selectively include HTML code returned by other JSP files depending on what the end user wants to view mailindex jsp JSP page that displays all email sources mailing lists of an Oracle Ultra Search instance mailmsgs jsp JSP page that displays all emails for an email source mailing list mailreader jsp JSP page that displays an email mailutil jsp JSP page that defines various functions that are used by mailreader jsp Graphics Files for All Applications File Description images ultra_ Oracle Ultra Search banner mediumbanner gif 9 28 Oracle Ultra Search User s Guide Oracle Ultra Search URL Rewriter AP File Description images wsd gif Background image used in sample query application Setting up the Sample Mailing List Browser Application For detailed instructions on setting up the sample JSP mailing list browser application see Installing the Oracle Ultra Search Middle Tier on Web Server Hosts on page 3 11 Oracle Ultra Search URL Rewriter API A URL rewriter is a user supplied Java module that implements the Oracle Ultra Search UrlRewriter Java interface When activated it is used by the crawler to filter and rewrite extracted URL links before they are inserted into the URL queue Web crawling generally cons
16. name IN VARCHAR2 interval IN VARCHAR2 crawl_mode IN NUMBER DEFAULT REGULAR_CRAWL recrawl_policy IN NUMBER DEFAUL RECRAWL_WHEN_MODIFIED crawler_id IN NUMBER DEFAULT LOCAL CRAWLER return number name The name of the schedule to create interval The schedule interval This is a string generated from the OUS_INTERVAL function crawl_mode The crawl mode can be REGULAR_CRAWL CRAWL_ONLY or INDEX_ONLY recrawl_policy The recrawl condition can be RECRAWL_WHEN_ MODIFIED or RECRAWL_ON_ EVERYTHING crawler_id The ID of the crawler used to execute the schedule This can be LOCAL_CRAWLER or the remote crawler ID This example creates a crawler schedule that mandates only crawling a marketing Web site with no indexing it is started every 6 hour by the local crawler schedule_id OUS_ADM CREATE_SCHEDULE marketing site schedule OUS_ADM INTERVAL ous_adm HOURLY 6 ous_adm CRAWL_ONLY Administration PL SQL APIs 10 9 DROP_SCHEDULE DROP_SCHEDULE Use this procedure to drop a crawler schedule Syntax OUS_ADM DROP_SCHEDULE name IN VARCHAR2 i name The name of the schedule to drop Example OUS_ADM DROP_SCHEDULE marketing site schedule 10 10 Oracle Ultra Search User s Guide Schedule Related AP Is INTERVAL Syntax Examples Use this function to generate a schedule interval string OUS_ADM INTERVAL type IN NUMBER frequency IN NUMBER DEF
17. 8 46 Oracle Ultra Search User s Guide 9 Oracle Ultra Search Developer s Guide and API Reference This chapter explains the Oracle Ultra Search APIs and related information This chapter contains the following topics a Overview of Oracle Ultra Search APIs a Oracle Ultra Search Query API Customizing the Query Syntax Expansion a Oracle Ultra Search Query Tag Library a Oracle Ultra Search Crawler Agent API a Oracle Ultra Search Java Email API a Oracle Ultra Search URL Rewriter API a Oracle Ultra Search Sample Query Applications See Also Oracle Ultra Search Java API Reference Oracle Ultra Search Developer s Guide and API Reference 9 1 Overview of Oracle Ultra Search APIs Overview of Oracle Ultra Search APIs Oracle Ultra Search provides the following APIs a The query API works with indexed data The Java API does not impose any HTML rendering elements The application can completely customize the HTML interface a The crawler agent API crawls and indexes proprietary document repositories a The email API is used by the Oracle Ultra Search query application to display emails It can also be used when building your own custom query application The URL rewriter API is used by the crawler to filter and rewrite extracted URL links before they are inserted into the URL queue Oracle Ultra Search also includes highly functional query applications to query and display search results The query applications ar
18. A data source type is an abstraction of a data source You can define new data source types with the following attributes Oracle Ultra Search Developer s Guide and API Reference 9 21 Oracle Ultra Search Crawler Agent AP Name of data source type For example Lotus Notes The name cannot be more than 100 bytes ID of data source type This is automatically assigned Description of the data source type This limit is 4000 bytes Agent Java class name For example WebDbAgent The location of this class is predefined by Oracle Ultra Search in ORACLE_ HOME ultrasearch lib agent and cannot be changed Agent Java jar file name The agent class can be stored in a Java jar file This jar file must be in SORACLE_HOME ultrasearch lib agent where SORACLE_HOME is the Oracle home directory where the Oracle Ultra Search backend not the middle tier is installed Parameters Parameters are the properties of a data source for example seed URL inclusion pattern and robots exclusion for a Web data source Define a parameter by specifying a parameter name 100 bytes maximum and a description 4000 bytes maximum By default a parameter is not encrypted Encryption Should the value of this parameter be encrypted when stored Oracle Ultra Search does not enforce the occurrence of parameters You cannot specify a particular parameter to have 0 or more at least 1 or only 1 occurrence Data Source Registra
19. ADDRESS PROTOCOL TCP HOST cls02a PORT 3001 ADDRESS PROTOCOL TCP HOST cls02b PORT 3001 CONNECT_DATA SERVICE_ NAME sales us acme com Remote Crawler Profiles Use this page to view and edit remote crawler profiles A remote crawler profile consists of all parameters needed to run the Oracle Ultra Search crawler on a remote computer other than the Oracle Ultra Search database To register a remote crawler you need to use the PL SQL API wk_crw register_ remote_crawler You can choose either RMI based or JDBC based remote crawling To configure the remote crawler click Edit Here is a list of configuration parameters that you can change for the remote crawler a Cache file access mode You have two options for the remote crawler to handle cache files Through a JDBC connection In this case the remote crawler will send cache files over the crawler s JDBC connection to the server s cache directory Through a mounted file system If you choose this option the cache file will be saved in the remote crawler cache directory The remote crawler cache directory must be mounted to the server side crawler cache directory specified under Crawler Settings tab otherwise the documents cannot be indexed See Also For more on crawling with JDBC connections see Using the Remote Crawler on page 5 6 a Cache directory location absolute path a Crawler log file directory 8 16 Oracle Ultra Search User
20. Chapter 3 Installing and Configuring Oracle Ultra Search Changing Oracle Ultra Search Schema Passwords on page 4 2 for information about changing the WKSYS password Instances Page on page 8 6 for more information about creating Oracle Ultra Search instances Users Page on page 8 43 for more information about granting permission to other users Logging On and Managing Instances as SSO Users on page 8 5 for more information about how Oracle Ultra Search handles SSO users Logging On and Managing Instances as SSO Users Note Single Sign On SSO is available only if the Oracle Identity Management infrastructure is installed Logging On to Oracle Ultra Search When a single sign on SSO user logs on to the SSO protected Oracle Ultra Search administration tool the user is first prompted with the SSO login screen Enter the SSO user name and password After the SSO server authenticates the user the user sees a list of Oracle Ultra Search instances that they have the privilege to manage There are different URLs for different users For example a SSO users http lt host gt lt http port gt ultrasearch admin_ sso index jsp a Portal users http lt host gt lt http port gt pls portal a Enterprise Manager users http lt host gt lt em port gt Granting Privileges to SSO Users You might need to grant super user privileges or privileges for managing an Oracle Ultra Search instance to an SSO
21. Click Proceed to step 3 2 10 Oracle Ultra Search User s Guide Crawl and Index Ultra Appliance s Database Documents c Create Schedule Step 3 of 3 screen Select Web from the drop down menu Select Ultra Appliance from the Available Sources menu and transfer it to the Assigned Sources menu by clicking gt gt Click Finish 7 Execute crawling and indexing On the Synchronization Schedules page locate Ultra Appliance in the Schedules column a Inthe Status column for the Ultra Appliance row verify that the status is in the Scheduled condition b Click the Scheduled link The Synchronization Schedule Status page is displayed c Click the Execute immediately button so that Oracle Ultra Search can crawl and index the Ultra Appliance intranet site Click the Refresh status button so that you see schedule status changes When the schedule status displays Scheduled the crawling is complete Crawl and Index Ultra Appliance s Database Documents This section describes how you acting as the Ultra Appliance search administrator would set up Oracle Ultra Search to search for the Ultra Appliance company database for problem information about the Springmaster 2000 refrigerator You can configure the Oracle Ultra Search crawler to crawl the database you set up in Setting up the Ultra Appliance Demo To crawl and index the Ultra Appliance database 1 Follow steps 1 through 4 in the Crawl and Index Ultra Appliance
22. Example 2 1 appliances sql script DROP TABLE product CREATE TABLE product id NUMBER PRIMARY KEY Description VARCHAR2 200 Parts VARCHAR2 80 INSERT TO product VALUES 1 Springmaster 2000 Cantaloupe Tray INSERT TO product VALUES 2 TipNClear 2000 TipNVac Tray INSERT TO product VALUES 3 Spew 2000 Extra Dirt INSERT TO product VALUES 4 Hold Em 2000 Spare Magnets INSERT TO product VALUES 5 Pizza Legend 2000 No 7 Pizza Tube INSERT TO product VALUES 6 SnoozePower 2000 Lint Screen DROP TABLE problems CREATE TABLE problems Problem_ID NUMBER PRIMARY KEY Customer_Name VARCHAR2 40 Product_ID NUMBER Date_ID DATE Problem_Description VARCHAR2 200 Resolution_Text VARCHAR2 200 INSERT TO problems VALUES 1 Jones 4 10 Aug 03 Magnets pointed wrong way Solved by reversing pet INSERT TO problems VALUES 2 Smith 1 01 Oct 02 Cantaloupe wrong color Solved by icing down melons INSERT TO problems VALUES 3 Chan 3 10 Apr 03 Clogged by cat hair Solved by getting new cat INSERT TO problems VALUES 4 Ali 5 29 May 03 Will not work with anchovies Cannot solve INSERT TO problems VALUES 5 Johnson 2 28 Feb 03 Husband on couch Solved by removing husband INSERT TO problems VALUES 6 Kawamoto 6 11 Nov 02 Pillow too loud Solved by turning pillow over INSERT INTO problems VALUES 7 Weiss 3 15 May 03
23. Set the following values in the initialization file PROCESSES Set this to 50 or more SORT _AREA SIZE Set this to 5MB or more SORT_AREA_RETAINED_SIZE Set this to 5MB or more a JOB_QUEUE_PROCESSES Set this to three or higher Set it to at least one This is needed because the Oracle Ultra Search crawler is launched by scheduling a database job If this is zero then no database jobs are run As a result any attempts to launch the Oracle Ultra Search crawler fail Also consider other requirements for job queue processes when you set this value For the latest information on initialization parameters relating to Oracle Ultra Search see the Oracle Ultra Search Readme Step 2 Create and Assign the Temporary Tablespace to the CTXSYS User The starter database created by the Oracle Installer may create a temporary tablespace that is too small Oracle Ultra Search uses the Oracle Text engine intensively Therefore a large temporary tablespace must be created for the Oracle Text system user CTXSYS If you want greater read and write performance create the tablespace on raw devices When you have created the temporary tablespace assign it as the temporary tablespace for the CTXSYS user To do so you must log on as the SYSTEM or SYS user Assign the temporary tablespace to the CTXSYS user with the following statement ALTER USER CTXSYS TEMPORARY TABLESPACE new_temporary_tab
24. Tune the Oracle Database Increase the Size of the Oracle Redo Logs if necessary Every instance of an Oracle database has an associated online redo log which is a set of two or more online log files that record all committed changes made to the database Online redo logs protect the database in the event of an instance failure The size of redo log files determines the frequency of redo log file switches This in turn significantly impacts text indexing speed To reduce the frequency of log file switches ensure that the redo log files are each 100MB or more The following section lists some tips on how to increase the redo log file sizes if necessary Enter the statements in the following section with the appropriate Oracle administrator privileges 4 2 Oracle Ultra Search User s Guide Configuring the Oracle Server for Oracle Ultra Search See Also a Oracle Database Performance Tuning Guide a Oracle Database Administrator s Guide Locate redo log files and determine their sizes SELECT vSlogfile member vSlogfile group vSlog status vSlog bytes FROM v log vS logfile WHERE vSlog group vSlogfile group Add larger redo log files ALTER DATABASE ADD LOGFILE redo_log_directory newredol log size 100m ALTER DATABASE ADD LOGFILE redo_log_directory newredo2 log size 100m ALTER DATABASE ADD LOGFILE redo_log_directory newredo3 log size 100m A production database should have more log members for each log group and
25. in SCHEDULED status until you explicitly invoke data synchronization with the Execute Immediately button of the admin tool see Launching Synchronization Schedules on page 8 37 3 Assign data sources to the schedule After a data source has been assigned to a group it cannot be assigned to other groups Updating Schedules Update the indexing option in the Update Schedule page 8 34 Oracle Ultra Search User s Guide Schedules Page Editing Synchronization Schedules After a synchronization schedule has been defined you can do the following in the Synchronization Schedules List To assign the schedule to either a crawler that runs on the database host or a remote crawler that runs on a separate host click Hostname To change its frequency click the schedule interval text To alter its status click Status To delete it click Delete To edit its name data source assignments recrawl policy or crawling mode click Edit When the crawler retrieves a document it checks to see if it has changed By default if the document has not changed the crawler does not process it In certain situations you might want to force the crawler to reprocess all documents Click Edit to edit schedules in the following ways Update schedule name This step is optional To change the schedule name specify a name for the schedule and click Update Schedule Name Assign data sources to schedule To assign a data source select on
26. in the Oracle Ultra Search EAR files is deployed The application field describes the application name It should match the application name in server xml The name field describes the Web application name This should match the WAR file name within the EAR file corresponding to the application For root specify the virtual path for this Web application The virtual path is the path under the URL For the administrative Web application access it using http hostname domainname port ultrasearch admin Note The virtual path for a particular Web application is defined in three files default web site xml mod_oc4 j conf and application xml in the META INF directory of the EAR file The META INF is created by extracting the EAR file You must modify the root attribute of web app in default web site xml and the value enclosed by tag context root in application xml to change the virtual path point to each Web application 3 16 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts 2 Modify modOC4J configuration files Add the following to ORACLE_ HOME Apache Apache conf mod_oc4j conf Oc4jMount ultrasearch oc4J_Portal Oc4jMount ultrasearch OC4J_Portal Oc4jMount ultrasearch query oc4J_Portal Oc4jMount ultrasearch query oc4J_Portal Oc4jMount ultrasearch ohw oc4J_Portal Oc4jMount ultrasearch ohw Oc4J_Portal O
27. or table data sources mappings are created manually when you create the data source For user defined data sources mappings are automatically created on subsequent crawls Click Edit Mappings to change this mapping Editing the existing mapping is costly because the crawler must recraw all documents for this data source Avoid this step unless necessary 8 20 Oracle Ultra Search User s Guide Sources Page Sources Page Web Sources Note There are no user managed mappings for email sources There are two predefined mappings for emails The From field of an email is intrinsically mapped to the Oracle Ultra Search author attribute Likewise the Subject field of an email is mapped to the Oracle Ultra Search subject attribute The abstract of the email message is mapped to the description attribute A collection of documents is called a source The data source is characterized by the properties of its location such as a Web site or an email inbox The Oracle Ultra Search crawler retrieves data from one or more data sources The different types of sources are Web Sources Table Sources Email Sources File Sources Oracle Sources User Defined Sources requires a crawler agent See Also a Schedules Page on page 8 34 to assign one or more data sources to a synchronization schedule a Queries Page on page 8 39 to assign data sources to data groups to enable restrictive querying You can create as ma
28. privilege then the user can only administer instance 2 and 3 If the SSO user has the admin privilege on a particular Oracle Ultra Search instance for example instance 2 then the user can administer the instance instance 2 that is associated with the subscriber subscriber A Privileges to Create and Drop an Oracle Ultra Search Instance To create or drop an Oracle Ultra Search instance the user either the database or the SSO user must have the super user privilege 6 6 Oracle Ultra Search User s Guide About Oracle Ultra Search Security In non SSO mode the database user can create or drop an instance and associate the instance with any subscriber including the default subscriber In SSO mode a Ifthe login SSO user belongs to the default subscriber then the user can create or drop an instance and associate the instance with any subscriber including the default subscriber If the login SSO user belongs to a particular subscriber then when the user creates an instance the instance is created and associated with the subscriber to which the login user belongs Because the user might not have access to create the instance schema in the database the user must inform the hosting company default subscriber to create the database schema for hosting the instance Privileges to Grant or Revoke a Super User To grant or revoke a super user login to the administration tool as a super user In non SSO mode database user
29. properties connectionURL userName password and instanceName in the Understanding the Oracle Ultra Search Administration Tool 8 31 Sources Page oc4j ra xml file This file is typically located under the J2EE_ HOME application deployments default FederatorSearchlet directory For example lt connector factory location eis Federator connector name Federator Adapter gt lt config property name connectionURL value jdbc oracle thin dbhost 1521 sid gt lt config property name userName value wk_test gt lt config property name password value wk_test gt lt config property name InstanceName value wk_test gt lt connector factory gt After editing oc4j ra xm1 restart the OC4J instance If you do not see errors upon restart then the searchlet has been successfully instantiated and bound to JNDI User Defined Sources Oracle Ultra Search lets you define edit or delete your own data sources and types in addition to the ones provided You might implement your own crawler agent to crawl and index a proprietary document repository or management system such as Lotus Notes or Documentum which contain their own databases and interfaces For each new data source type you must implement a crawler agent as a Java class The agent collects document URLs and associated metadata from the proprietary document source and returns the information to the Oracle Ultra Search crawler which enqueues it f
30. return the list of URLs that have been updated inserted and deleted The crawler only crawls URLs returned by the agent and does not recrawl existing ones For URLs that were deleted the crawler removes them from the URL table If the smart agent can only return updated or inserted URLs but not deleted URLs then deleted URLs are not detected by the crawler In this case you must change the schedule crawler recrawl policy to periodically run the schedule in force recrawl mode Force recrawl mode signals to the agent to return every URL in the data source The agent API isDeltaCrawlingCapable tells the crawler whether the agent it invokes is a standard agent or a smart agent The agent API startCrawling boolean forceRecrawl Date lastCrawlTime lets the crawler tell the agent the last crawl time and whether the crawler is running in force recrawl mode Document Attributes and Properties Document attributes or metadata describe document properties Some attributes can be irrelevant to your application The crawler agent creator must decide which document attributes should be extracted and saved The agent also can be created such that the list of collected attributes are configurable Oracle Ultra Search automatically registers attributes returned by the agent The agent can decide which attributes to return for a document Crawler Agent Functionality This section describes aspects of the crawler agent Data Source Type Registration
31. several LOVs for the string type search attribute Department are loaded to Oracle Ultra Search They are a Default LOV entries for search attribute Department a Search attribute Department LOV for data source data source a a Search attribute Department LOV for data source data source b A 4 Oracle Ultra Search User s Guide XML Schema for LOVs and LOV Display Names XML Schema for Document Relevance Boosting The XML schema for document relevance boosting terms and scores is described as follows lt xml version 1 0 encoding UTF 8 gt lt Generated by XML Authority Conforms to w3c http www w3 org 2001 XMLSchema gt lt xsd schema xmlns xsd http www w3 org 2001 XMLSchema elementFormDefault qualified gt lt xsd element name doc_list gt lt xsd complexType gt lt xsd sequence gt lt xsd element name doc maxOccurs unbounded gt lt xsd complexType gt lt xsd sequence gt lt xsd element name term maxOccurs unbounded gt lt xsd complexType gt lt xsd simpleContent gt lt xsd extension base xsd string gt lt xsd attribute name score use required type xsd integer gt lt xsd extension gt lt xsd simpleContent gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd attribute name url use required type xsd string gt lt xsd attribute name data_source_name use required type xsd string gt lt xsd complexType gt
32. the log table There can be up to only eight primary key columns in the base table a Each column in the log table that corresponds to a primary key column must be named Kx where x is a number from one to eight a Each column in the log table that corresponds to a primary key column must be of type VARCHAR2 1000 There must be exactly one column named mark that has type CHAR 1 a The column named mark must have a default value F Tuning and Performance 5 17 Table Data Source Synchronization For example the base table employees has the following structure Column Name Column Type ID NUMBER NAME VARCHAR2 200 ADDRESS VARCHAR2 400 TELEPHONE VARCHAR2 10 USERNAME VARCHAR2 24 If the primary key of the employees table comprises of the ID and NAME columns then a log table WK LOG whose name is generated automatically is created with the following structure Column Name Column Type K1 NUMBER K2 VARCHAR2 200 The SQL statement for creating the log table is as follows CREATE TABLE WKSLOG K1 VARCHAR2 1000 K2 VARCHAR2 1000 MARK CHAR 1 default F Create Log Triggers An INSERT trigger UPDATE trigger and DELE trigger definitions are as follows H E trigger are created The Oracle INSERT Trigger Statement Every time a row is inserted into the employees base table the INSERT trigger inserts a row into the log table The row in the log t
33. transportable tablespaces have multiple Oracle Ultra Search instances For example an organization could have separate Oracle Ultra Search instances for its marketing human resources and development portals The administration tool requires you to specify an instance before it lets you make any instance specific changes To select an instance do the following 1 Click Select on the Instances Page 8 10 Oracle Ultra Search User s Guide Instances Page 2 Select an instance from the pull down menu 3 Click Apply Note Instances do not share data Data sources schedules and indexes are specific to each instance Deleting an Instance To delete an instance do the following 1 Click Delete on the Instances Page 2 Select an instance from the pull down menu 3 Click Apply Note To delete an Oracle Ultra Search instance the user must be granted the super user privileges Editing an Instance To edit an instance click Edit on the Instances Page You can change the instance mode make the instance updatable or change the instance password Instance Mode You can change the instance mode to updatable or read only Updatable instances synchronize themselves to the search domain on a set schedule whereas read only instances snapshot instances do not do any synchronization To set the instance mode select the box corresponding the to mode you want and click Apply Schema Passwor
34. two instances can answer more queries about that domain than one instance Because snapshot instances do not involve crawling and indexing snapshot instance creation is fast and inexpensive Thus snapshot instances can improve scalability Backups If the master instance becomes corrupted its snapshot can be transformed into a regular instance by editing the instance mode to updatable Because the snapshot and its master instance cannot reside on the same database a snapshot instance should be made updatable only to replace a corrupted master instance A snapshot instance does not inherit authentication from the master instance Therefore if you make a snapshot instance updatable you must re enter any authentication information needed to crawl the search domain To create a snapshot instance do the following 1 Prepare the database user As with regular instances snapshot instances require a database user This user must have been granted the WKUSER role Copy the data from the master instance This is done with the transportable tablespace mechanism which does not allow renaming of tablespaces Therefore snapshot instances cannot be created on the same database as its master Identify the tablespace or the set of tablespaces that contain all the master instance data Then copy it and plug it into the database user from step 1 Follow snapshot instance creation in the Oracle Ultra Search administration tool F
35. 3 18 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Deploying the Oracle Ultra Search EAR File on a Third Party Middle Tier Because Oracle Ultra Search EAR files contain only Web applications WAR files they can be made to deploy on any J2EE 1 2 container To do so you must know the Oracle Ultra Search WAR file name the predefined URL root and the Java library required The following section explains the Oracle Ultra Search EAR files that you deploy in a standard J2EE 1 2 container It does not contain information on the configuration of each J2EE 1 2 container See Also a The documentation of the third party J2EE container for its configuration Configuring the Middle Tier with Oracle HTTP Server and OC4J on page 3 14 Deploying the Administration Tool The Oracle Ultra Search administration tool is a J2EE compliant Web application SORACLE_ HOME ultrasearch webapp ultrasearch_admin ear You can use Enterprise Manager to deploy undeploy this Web application To see the file structure of ult rasearch_admin run the following command jar tvf ultrasearch_admin ear META INEF META INF application xml META INF orion application xml admin war admin_sso war ohw war Deploying the Sample Query Applications Oracle Ultra Search sample query applications are Web applications contained in the ORACLE_ HOME ultrasearch sample ear file This file is already compliant to th
36. 43 Preferences inin aaa aaa aE area aeaa a aa E aaa paiar EAr a EE 8 43 S per serS sseni aah oooh Sate a a E E i E nov hae E A 8 43 Aata Erea PEE A E A A ET 8 44 Globalization Pa pessnescrsses tene n Ea R A E E TR E ER a R 8 44 Search Attribute NE T a e a a a a a a ee 8 45 LOV Display Name s iiaceiiiciettite e A e aa alate Ea a tae aE E 8 46 Data Group Name cmnsisssirns ee a Ana aS Eaa aE n E ra 8 46 Oracle Ultra Search Developer s Guide and API Reference Overview of Oracle Ultra Search APIS sirosis eskisi riis ii i ienie 9 2 Oracle Ultra Search Query APIsi sisikii irouia saikine ie aeien Rin 9 2 Customizing the Query Syntax Expansion sesssssssestessissessestesiseessestentesresnestesisnesnesreteseesnes 9 3 Default Query Syntax Expansion Implementation cccccceccscesc cence esesneneteeseeneneseseecenes 9 4 End User Query Synta K ser a EE ESE IDE EE EA E E EREET a Eat SE E 9 4 Scoring Classes y ee S a ANS E Te LAGI e ARES o E AERO 9 6 Expansion Rules sirenen E ahaa T E E e A e E E 9 7 Examples of Applying the Rules se ssrin nerie perae a A ENESA 9 7 Customizing the Rules oeiee e a e G E E a aeeeeierees 9 8 Oracle Ultra Search Query Tag Library sssssssssessssesssstessissesretestisenssestentesnesnentensisnesnenrenneseesnes 9 9 Query Tag DeScriptions s srei sesini ea EEEE nR raano Eeke SERRO SEERE een BaRa TE EATE S TERNS enna Rhe aeai kaata SE 9 11 lt instance gt Tag Connecting to the Oracle Ultra Search Insta
37. 7 11 Oracle Ultra Search Remote Crawler Oracle Ultra Search Remote Crawler To increase crawling performance set up the Oracle Ultra Search crawler to run on one or more computers separate from your database These computers are called remote crawlers However each computer must share log and mail archive directories with the database computer To configure a remote crawler you must first install the Oracle Ultra Search middle tier on a computer other than the database host During installation the remote crawler is registered with the Oracle Ultra Search system and a profile is created for the remote crawler After installing the Oracle Ultra Search middle tier you must log on to the Oracle Ultra Search administration tool and edit the remote crawler profile You can then assign a remote crawler to a crawling schedule To edit remote crawler profiles use the Crawler Settings page in the administration tool See Also Using the Remote Crawler on page 5 6 Oracle Ultra Search Crawler Status Codes The crawler uses a set of codes to indicate the crawling result of the crawled URL Besides the standard HTTP status codes it uses its own codes for non HTTP related situations Only URLs with status 200 will be indexed Table 7 1 shows these URL status codes Table 7 1 Oracle Ultra Search URL Status Codes Code Explanation 200 URL OK 400 bad request 401 authorization required 402 payment required 403 access forbidden 40
38. 8 39 Data Harvesting Mode For initial planning purposes you might want the crawler to collect URLs without indexing After crawling is done you can examine document URLs and status remove unwanted documents and start indexing You can update the crawling mode to the following a Automatically accept all URLs for indexing a Examine URLs before indexing a Index only See Also Schedules Page on page 8 34 Introduction to Oracle Ultra Search 1 11 Oracle Ultra Search Features Instance Snapshot Support You can create a read only snapshot of a master Oracle Ultra Search instance This is useful for query processing or for a backup You can also make a snapshot instance updatable This is useful when the master instance is corrupted and you want to use a snapshot as a new master instance See Also Instances Page on page 8 6 Integration with Oracle Internet Directory Oracle Internet Directory OID is Oracle s native LDAP v3 compliant directory service built as an application on top of the Oracle Database Oracle Internet Directory hosts the Oracle common identity All Oracle Web based products integrate with the SSO server for single sign on support Oracle Ultra Search Administration Groups in Oracle Internet Directory An Oracle Ultra Search administration group contains a set of users Each user can belong to one or multiple groups All groups are created using groupOfUniqueNames and orclGroup object classes The
39. For example if the user s input is cat using the stemming feature you can construct a Text query cat which will find documents with cat or cats You can use any tool to construct the Text query as long as it is a string object Depending on the complexity of user s query syntax you might want to leverage some existing lexers in Java 2 Construct a CtxContains using the Text query For example String textQuery Scat oracle ultrasearch Query query new oracle ultrasearch CtxContains textQuery The preceding code constructs a query for documents with cat or cats You can also limit that query to document titles not content as follows String textQuery cat StringAttribute titleAttribute instanceMetaData getStringAttribute TITLE oracle ultrasearch Query query new oracle ultrasearch CtxContains textQuery titleAttribute 3 You can optionally combine the Ct xContains with any other Oracle Ultra Search query by joining them with the And Or query operators 4 Run the query by invoking the getResult method with the constructed query object See Also Oracle Ultra Search Java API Reference for detailed information on the oracle ultrasearch query CtxContains API Oracle Ultra Search Query Tag Library On top of the Java query API Oracle Ultra Search provides a JSP tag library as an alternative for developing search applications Based on the Sun Microsystems JavaServer Pages specification v
40. Guide 6 Security in Oracle Ultra Search The ability to control user access to Web content is critical This chapter describes the architecture and configuration of security for Oracle Ultra Search This chapter contains the following sections a About Oracle Ultra Search Security Configuring a Security Framework for Oracle Ultra Search a Configuring Oracle Ultra Search Security See Also a Oracle Application Server 10g Security Guide for an overview of Oracle Application Server security and its core functionality a Oracle Identity Management Concepts and Deployment Planning Guide for guidance on the Oracle security infrastructure Security in Oracle Ultra Search 6 1 About Oracle Ultra Search Security About Oracle Ultra Search Security This section describes the Oracle Ultra Search security model It contains the following sections a Oracle Ultra Search Security Model a Classes of Users and Their Privileges a Oracle Ultra Search Admin Privilege Model in the Hosted Environment a Resources Protected by Oracle Ultra Search a Authorization and Access Enforcement a How Oracle Ultra Search Leverages Security Services a How Oracle Ultra Search Leverages the Identity Management Infrastructure a Oracle Ultra Search Extensibility and Security Oracle Ultra Search Security Model Security problems such as unauthorized access to information can lead to loss of productivity Search engines like Oracle Ultra Searc
41. LOV for a document attribute can help specify a search query An attribute value can have a display name for it For example the attribute country might use country code as the attribute value but show the name of the country to the user There could be multiple translations of the attribute display name To define a search attribute use the Search Attributes subtab Oracle Ultra Search provides some system defined attributes such as author and description You can also define your own After defining search attributes you must map between document attributes and global search attributes for data sources To do so use the Mappings subtab Note Oracle Ultra Search provides a command line tool to load metadata such as search attribute LOVs and display names into an Oracle Ultra Search database If you have a large amount of data this is probably faster than using the HTML based administration tool For more information see Appendix A Loading Metadata into Oracle Ultra Search Search Attributes Search attributes are attributes exposed to the query user Oracle Ultra Search provides system defined attributes such as author and description Oracle Ultra Search maintains a global list of search attributes You can add edit or delete search attributes You can also click Manage LOV to change the list of values LOV for the search attribute There are two categories of attribute LOVs one is global across all data sourc
42. Oracle Ultra Search Note At any point after installation you can run an Oracle Application Server Portal script to alter the running mode from non hosted to hosted Whenever this is done the Oracle Application Server Portal script invokes an Oracle Ultra Search script to inform Oracle Ultra Search of the change from non hosted to hosted modes See Also Hosting Developer s Guide at http otn oracle com Instances Page After successfully logging on to the Oracle Ultra Search administration tool you find yourself on the Instances Page This page manages all Oracle Ultra Search 8 6 Oracle Ultra Search User s Guide Instances Page instances in the local database In the top left corner of the page there are tabs for creating selecting editing and deleting instances Before you can use the administration tool to configure crawling and indexing you must create an Oracle Ultra Search instance An Oracle Ultra Search instance is identified with a name and has its own crawling schedules and index Only users granted super user privileges can create Oracle Ultra Search instances Creating an Instance To create an instance click Create You can create a regular instance or a read only snapshot instance Only users with super user privileges can create new instances Note If you define the same data source within different instances Oracle Ultra Search then there could be crawling conflicts for table data s
43. Regular Instance nosie nisar n an a e R o ra ERO ak 8 7 Creating a Snapshot Insta ic inisinan iania aaeain iaa aina 8 8 Selecting an MNstanee sneek an A ETE EEEE ESEE 8 10 Deleting an Instances srigati a i ara E GAE E E EE E 8 11 Editing an Instance nitian ee eiaeaen ae piee e a eea se i E Es Nee EER 8 11 Instance MOG ais 5a ass EN AA AEEA TE Ahea eE ENEN OA De EPEE EA S AOS 8 11 Schema P sSWOrds seietara hitaana aese E a i Eers eE 8 11 viii Crawler Papec pirenen a odes cies A EE A A AAE E E EEEE E sieved 8 12 Configu urethe Stan gs sere ost a E Ee Shek oN ees AE anes EE eke nol 8 12 Remote Crawler Pr fil si on pie Ra a aia ieie ait heats E ae eich 8 16 Crawler Statistics neroni gei ar aiicas natn A EAEE E E Eni 8 17 Summary of Crawler Activity se sssssessestsssssertessserttssstestesntesstentessteetteseentesnteestenteestten tt 8 17 Detailed Crawler Statistics ccccceccesssssssesesesssssssesesesssssssesesesesssssesesesssesesesesesseesesees 8 17 Crawler Progress osr en re seeen noiai are aie toes cetdon sh coseasaedasunaadsdanansiat adorgsnracdsisGieseagasdtesoes 8 17 Problematic URL S t ossnreisie rreri d i u is ieres aaar aeaa aape AEK ARREA SERERE 8 17 WebAccess Page conresieniesaiei eae aie E ea E cae E E A EE EE N aE E 8 18 Proxies unor E SA E E E Sitar A EA A E AAS 8 18 Authentic tionesi ne eiie aana lasts e a a e a e a a 8 18 FAV EP Authenteationsas engana e a a a a a e EE 8 18 HEMEL FOIS uaan aeea a E E a a ca Et
44. S ORACLE_ HOME ultrasearch adapter directory To deploy the Oracle Ultra Search searchlet in OC4J standalone use admin jar as follows java jar admin jar ormi lt hostname gt lt admin gt lt welcome gt deployconnector file ultrasearch rar name UltraSearchSearchlet At this point ult rasearch rar has been deployed in OC4J However it has not been instantiated to connect to any Oracle Ultra Search instance The Oracle Ultra Search searchlet can be instantiated multiple times to connect to several Oracle Ultra Search instances by repeating the following steps 8 30 Oracle Ultra Search User s Guide Sources Page To instantiate the searchlet configuration parameters values must be specified and a JNDI location must be specified where the searchlet instance should be bound to To do this you must manually edit oc4j ra xml This file is typically located under the J2EE_ HOME application deployments default UltraSearchSearchlet directory The Oracle Ultra Search searchlet requires four configuration properties connectionURL userName password and instanceName For example to bind a searchlet under eis UltraSearch to connect to the default instance wk_test on machine dbhost the following entry can be used lt connector factory location eis UltraSearch connector name Ultra Search Adapter gt lt config property name connectionURL value jdbc oracle thin dbhost 1521 sid gt lt c
45. SERVICE_ NAME acme us com See Also Oracle Database JDBC Developer s Guide and Reference Java Crawler The connect string used by Oracle Ultra Search crawler is initialized during installation and can be changed with the WK_ADM SET_LAUNCH_INSTANCE API When there is a system configuration change such as adding or dropping a node the connect string is changed automatically Choosing a JDBC Driver The Oracle Ultra Search administrator optionally can configure the local crawler to use the JDBC OCI driver to log on to the database This is done with the following PL SQL API WK_ADM SET_JDBC_DRIVER driver_type Where a Thin driver default driver_type 0 a OCI driver driver_type 1 This API requires super user privileges The change affects all Oracle Ultra Search instances Note The OCI driver requires that environment variables such as LD_LIBRARY_PATH and NLS_LANG be set properly on the launching database instance The crawler inherits the environment setting from the Oracle process Therefore you must configure them appropriately before starting Oracle See Also Oracle Database JDBC Developer s Guide and Reference for configuration details on using the OCI driver The following PL SQL API determines which kind of JDBC drivers are used currently WK_ADM GET_JDBC_DRIVER RETURN NUMBER 5 16 Oracle Ultra Search User s Guide Table Data Source Synchronization Table Data Source Synchronization O
46. Search Client admin tool rie Ultra earch p Admin Tool J2EE Engine Middle tier Oracle Ultra Search Java support files N JDBC driver i Single Sign On a PL SQL packages Oracle Ultra Oracle Text for indexing Search DBMS_JOBS for scheduling Backend Java VM for launching JDK or JRE for crawling Browser Requirements To use the administration tool your browser must be Netscape version 4 0 or Microsoft Internet Explorer version 4 0 or higher 3 12 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Installing the Middle Tier with the Oracle Database Release The Oracle Ultra Search middle tier is installed as part of the Oracle Database Server install which is accomplished using the Oracle Universal Installer OUI The Oracle Universal Installer automatically configures the Oracle Ultra Search middle tier For more information see the Oracle Universal Installer Concepts Guide Use the following command to start the Oracle Ultra Search middle tier You must run this command manually to bring up the Oracle Ultra Search middle tier after installation SORACLE_HOME bin searchctl start Use this command to stop the Oracle Ultra Search middle tier SORACLE_HOME bin searchctl stop Installing the Middle Tier with the Oracle Application Server Release Start the OUI on the relevant host Choose the destination ORACLE_HOME name and full path and complete the foll
47. Search Administration Tool 8 43 Globalization Page Privileges To grant super user administrative privileges to another user enter the user name of the user Specify also whether the user should be allowed to grant super user privileges to other users Then click Add Only instance owners users that have been granted general administrative privileges on this instance or super users are allowed to access this page Instance owners must have been granted the WKUSER role Single sign on SSO users can use a delegated administrative service DAS list of values to add privileges to another SSO user These users are authenticated by the SSO server before allowing access Database users can add privileges to another database user Note Database users cannot grant privileges to SSO users and SSO users cannot grant privileges to database users The DAS list of values only shows SSO users Granting general administrative privileges to a user allows that user to modify general settings for this instance To do this enter the user name and specify whether the user should be allowed to grant administrative privileges to other users Then click Add To remove one ore more users from the list of administrators for this instance select one or more user names from the list of current administrators and click Remove Note General administrative privileges do not include the ability to create or delete an instance These
48. Search Backend sqlplus nolog SORACLE_HOME ultrasearch admin wk0setup sql SORACLE_HOME cstr sys syspw as sysdba wksyspw tblspc tmptblspc portal cfs oui psep jdbcdrv jdbcnls jexec ctxhx jdbc_node jdbc_all SORACLE_HOME where the various parameters are as follows parameters should be enclosed in quotes to avoid misinterpretation a cstr TNS alias preceded with for example inst1 this parameter can also be passed in as a single white space syspw password for the sys database user schema wksyspw password to be used for the Oracle Ultra Search schema wksys _ tblspc tablespace for wksys a tmptblspc temporary tablespace for wksys cfs if ORACLE_HOME is on a Cluster File System CFS then true else false psep path separator for example on UNIX this is on Windows it is jdbcdrv path to JDBC drivers classes12 zip for example ORACLE HOME jdbc lib classes12 zip Gl a jdbcnis path to nls_charset12 zip or orail8 jar for example ORACLI HOME jdbc lib nls_charset12 zip a jexec Java executable path for example packages jdk1 4 1 bin java Note that this has to point to a JDK 1 4 1 or later installation ctxhx path to INSOFILTER ct xhx for example ORACLI HOME ctx bin ctxhx Gl a jdbc_node thin JDBC connect string and only the part after the for example host port sid note that in case of RAC this
49. Search scheduling mechanism runs within the Oracle Database it automatically uses the database s high availability features The Oracle Database uses one of two mechanisms to send launch requests to the remote crawler hosts The first is Java remote method invocation RMI The second is Java database connectivity JDBC Both mechanisms establish a launching sub system on the remote host You can conceptualize the launching sub system as a process that uses either RMI or JDBC to listen for launch requests This chapter refers to the launching sub system as the launcher Upon receipt of a launch request the launcher spawns a new Java process It is this process that is the actual remote crawler You should use JDBC based remote crawling if you do not want the dependency on RMI for example because of network restrictions Understanding the Launcher The launcher is the sub system that listens for launch requests and launches remote crawler processes When you register a remote crawler either RMI based or JDBC based you are actually registering the launcher with the Oracle Ultra Search backend By registering a launcher you make it available to be used by all Oracle Ultra Search instances within an Oracle Ultra Search installation Thus after registration an administrator or an Oracle Ultra Search instance may subsequently choose to associate the launcher and assign schedules to be launched with that launcher There is no way to
50. The JNDI name that identifies a JDBC data source Users should set either the URL or data source name properties This is optional if URL is specified The name of the Oracle Ultra Search instance that is owned by the schema user If the schema user owns only one Oracle Ultra Search instance then this is optional The URL path of the Web application that renders the contents of a database table The URL path of the Web application that renders the contents of an email The URL path of the Web application that renders the contents of a file This tag defines a scripting variable of the name set by the instanceld property All the other tag properties correspond to a property in the oracle ultrasearch query QueryInstance class Either the URL or the dat aSourceName attribute should be set They are exclusive of each other The following example uses the URL property to connect to the database lt US instance instancelId mybookstore url oracle jdbc thin dbhost 1521 inst1 username scott password tiger tablePage display jsp emailPage mail jsp filePage display jsp gt 9 12 Oracle Ultra Search User s Guide Oracle Ultra Search Query Tag Library lt iterAttributes gt Tag Show All Search Attributes When a user wants to perform an advanced query the application needs to show the list of attributes that are available the list of groups and the list of languages defined in the instance
51. This can be done using some iteration tags that define script variables for page rendering Each attribute in Oracle Ultra Search has a name a type and a display name that is translated depending on the locale that is set for the QueryInstance tag The attribute type should be used to determine which operators can be used on this attribute and how to parse the user s input Attribute Name Description instance name This is a mandatory attribute to refer to the object defined by the instance tag locale locale This determines the display name fetched using this tag This tag is an iteration tag It loops through all the search attributes in the instance referred to by the instance tag attribute In each loop it defines a scripting variable named attribute which is an oracle ultrasearch query Attribute object It also defines a string variable named displayname which is the localized name of the attribute The following example shows all the attributes in mybookstore instance using their English display names lt US iterAttributes instance mybookstore locale lt Locale ENGLISH gt gt lt attribute gt displayname gt lt US iterAttributes gt A lt iterGroups gt Tag Show All Search Groups Similar to the showAttributes tag the showGroups tag iterates through all the groups defined in an instance Attribute Name Description instance name This is a mandatory attribute to refer to th
52. a Database Character Set Change cccsceeseees 3 10 Configuring the Default Oracle Ultra Search Instance 0 c cccccecc cee cs este neteeseeeeneeeeeeenes 3 10 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts c cesses 3 11 Web Applications Concepts ccccccccssseesessensieescscesesesesnsnsnesssssneneseseececesesssnsnsnsneseseeneeneseeeeney 3 11 Browser Requirement c c cesees cases teseccsserbieeseasesesee cassesaasacecueseaschevensabssanecdesiedadassaceteedendese 3 12 Installing the Middle Tier with the Oracle Database Release cccccsseseseseseneteseeees 3 13 Installing the Middle Tier with the Oracle Application Server Release cccceceee 3 13 Configuring the Middle Tier with Oracle HTTP Server and OC4J ou cece 3 14 Configuring the Administration Tool with Single Sign On Server cccccceeeeees 3 17 Deploying the Oracle Ultra Search EAR File on a Third Party Middle Tier 3 19 Editing the data sources xml File cccccccccesessesneteeseseeneseseeeecesesesesnaneneseseeneeneseeeeney 3 21 Editing the ultrasearch properties File c cccccccssesesteseeseeceseseeeecesesesessensneseseeeeneseseeeeney 3 23 Starting the Web Server serrurier eatin Eneu oa EA TEN nina ats 3 24 Testing the Oracle Ultra Search Administration Tool s ssesessssssessesisstessessesteseeseess 3 25 Testing the Oracle Ultra Search Sample Query Applications cccseceecee
53. and their meanings which explain some of the terms used in the preceding rules A lt plus expression gt is an AND expression of all plus tokens A lt minus expression gt is a NOT expression of all minus tokens A lt phrase expression gt is a PHRASE formed by all tokens in the lt main expression gt A lt near expression gt is a NEAR expression of all tokens but minus tokens An lt accum expression gt is an ACCUMULATE expression of all tokens but minus tokens A lt simple query expression gt is used only when the end user query has multiple tokens and does not have any operator or a double quote Otherwise a lt generic query expression gt is used If there is no token that is neither plus token nor minus token then the lt plus expression gt and the lt accum expression gt are eliminated Examples of Applying the Rules The following table illustrates how the default query syntax expansion implementation converts end user query strings into Oracle Text compatible query strings Oracle Ultra Search Developer s Guide and API Reference 9 7 Customizing the Query Syntax Expansion End User Query String Expanded Query Siring Understandable by Oracle Text Oracle Oracle within TITLE__ 31 2 Oracle Oracle Applications Applications 10 10 amp Oracle Applications 2 Oracle Applications within TITLE__ 3 2 Applications 10 10 amp Oracle Applicat O
54. com accessibility Accessibility of Code Examples in Documentation JAWS a Windows screen reader may not always correctly read the code examples in this document The conventions for writing code require that closing braces should appear on an otherwise empty line however JAWS may not always read a line of text that consists solely of a bracket or brace Accessibility of Links to External Web Sites in Documentation This documentation may contain links to Web sites of other companies or organizations that Oracle does not own or control Oracle neither evaluates nor makes any representations regarding the accessibility of these Web sites What s New in Oracle Ultra Search This section describes Oracle Ultra Search new features with pointers to additional information It also explains the Oracle Ultra Search release history Secure Crawling Oracle Ultra Search provides secure crawling with the following types of authentication Digest Authentication Oracle Ultra Search supports HTTP digest authentication and the Oracle Ultra Search crawler can authenticate itself to Web servers employing HTTP digest authentication scheme This is based on a simple challenge response paradigm however the password is encrypted HTML Form Authentication HTML form based authentication is the most commonly used authentication scheme on the Web Oracle Ultra Search lets you register HTML forms that you want the Oracle Ultra Search crawler to auto
55. different storage devices should be used to increase performance and reliability Drop the old log files For each old redo log file enter the ALTER SYSTEM SWITCH LOGFILE statement until that log file s status is INACTIVE This is necessary to ensure that Oracle is not using that log file when you try to drop it Then drop the old redo log file with the following statement ALTER DATABASE DROP LOGFILE redo_log_directory redo0l log ALTER DATABASE DROP LOGFILE redo_log_directory redo02 log ALTER DATABASE DROP LOGFILE redo_log_directory redo03 log Manually delete the old log files from the file system For each old redo log file use the appropriate operating system statement to delete the unwanted log file from the file system Increase the Size of the Undo Space Every Oracle database must have a method of maintaining information that is used to roll back or undo changes to the database Such information consists of records of the actions of transactions primarily before they are committed Oracle refers to these records collectively as undo The undo space created by the Oracle Installer may be too small Oracle recommends that you use automatic undo management and increase the undo space See Also Oracle Database Administrator s Guide for details on using automatic undo management Post Installation Information 4 3 Configuring the Oracle Server for Oracle Ultra Search Tune Oracle Initialization Parameters
56. eiir Seoalectlievete ss tuatsdelis Soca dhctaatta usleccbeiilboens ctdbeledgsiestiiete debs A 3 Example ot the EOV XML Piles atdures eana i a E EEE sat O ceeds A 4 XML Schema for Document Relevance Boosting essssssessersertississesrertesrissesnsstessesnesneneeseess A 5 XML Schema for LOVs and LOV Display Names 0cccccccccccceeee cece ceeeeseecececssensneseceeenes A 5 B Altering the Crawler Java Classpath Reasons for Altering the Crawler Java Classpath ccccceseseeseessesssseesesesesessseeseseesesesees B 1 Difference Between the Crawler Classpath and the Remote Crawler Classpath B 1 Altering the Crawler Java Classpath on the Oracle Ultra Search Server Host 0 0 B 2 Altering the Crawler Java Classpath on a Remote Crawler HOt ccccsscscssseseseseees B 2 C Oracle Ultra Search Views OUS2INSTANCGES sniene e aa E EN evga lees E A E a a C 1 OUS SGCHEDULES EETA A AE tea ance lesa lah EEEE E E C 1 OUS_DEFAULT_CRAWLER_SETTINGSS monuisse kri i a Ei rii ea C 2 OUSZCRAWLEER SEL TINGS motenn E E Ee ES C 2 Index xiii xiv Send Us Your Comments Oracle Ultra Search User s Guide 10g Release 1 10 1 Part No B10731 01 Oracle Corporation welcomes your comments and suggestions on the quality and usefulness of this publication Your input is an important part of the information used for revision Did you find any errors Is the information clearly presented Do you ne
57. failures and Documents rejected Documents indexed Documents non indexable This could be a file directory a portal page that is a discovery node or a robot metatag that specifies no index Document conversion failures The binary file filter failed Index Optimization Index Optimization To ensure fast query results the Oracle Ultra Search crawler maintains an active index of all documents crawled over all data sources This lets you schedule when you would like the index to be optimized The index should be optimized during hours of low usage 8 38 Oracle Ultra Search User s Guide Queries Page Note Increasing the crawler cache directory size can reduce index fragmentation Index Optimization Schedule You can specify the index optimization schedule frequency Be sure to specify all required data for the option that you select You can optimize the index immediately or you can enable the schedule Optimization Process Duration Specify a maximum duration for the index optimization process The actual time taken for optimization does not exceed this limit but it could be shorter Specifying a longer optimization time results in a more optimized index Alternatively you can specify that the optimization continue until it is finished If your Oracle Ultra Search instance is secure search enabled then the index optimization process also triggers garbage collection of unused access control lists ACLs Q
58. front of a word requires that the word be found in all matching documents For example searching for Oracle Applications only finds documents that contain the word Applications Note In a multiple word search you can attach a in front of every token including the very first token Compulsory exclusion Attaching a in front of a word requires that the word must not be found in all matching documents For example searching for Oracle Applications only finds documents that do not contain the word Applications Note In a multiple word search you can attach a in front of every token except the very first token Phrase matching Putting quotes around a set of words only finds documents that contain that precise phrase For example searching for Oracle Applications finds only documents that contain the string Oracle Applications Oracle Ultra Search Developer s Guide and API Reference 9 5 Customizing the Query Syntax Expansion Rule Description Wildcard matching Attaching a to the right hand side of a word returns left side partial matches For example searching for the string Ora finds documents that contain all words beginning with Ora such as Oracle and Orator You can also insert an asterisk in the middle of a word For example searching for the string A e retrieves documents that contain words such as Apple Ate Ape and so on Wildcard matching requir
59. generation In this situation allow normalization of the extracted links so that URLs pointing to the same page have the same URL The algorithm for rewriting these URLs is application dependent and cannot be handled by the crawler in a generic way When a URL link goes through a rewriter there are the following possible outcomes a The link is inserted with no changes made to it Oracle Ultra Search Developer s Guide and API Reference 9 31 Oracle Ultra Search URL Rewriter AP The link is discarded it is not inserted A new display URL is returned replacing the URL link for insertion A display URL and an access URL are returned The display URL may or may not be identical to the URL link Creating and Using a URL Rewriter Follow these steps to create and use a URL rewriter 1 Create a new Java file implementing the UrlRewriter interface open close and rewrite methods A sample rewriter SampleRewriter java is available for reference under SORACLE_ HOME ultrasearch extension Compile the rewriter Java file into a class file For example 3jdk1 3 1 bin javac 0 classpath ORACLE_ HOME ultrasearch lib ultrasearch jar SampleRewriter java Package the rewriter class file into a jar file under the SORACLE HOME ultrasearch 1lib agent directory For example 30k1 3 1 bin jar cv0f SORACLE_HOME ultrasearch lib agent sample jar SampleRewriter class Specify the rewriter class name and ja
60. information oid app_entity_cn specifies the Oracle Ultra Search middle tier application entity name domain specifies the common domain for the IM identity management machine and the Oracle Ultra Search middle tier machine This enables delegated administrative service DAS list of values to work with Internet Explorer For example if the Oracle Ultra Search middle tier in us oracle com and the IM machine is uk oracle com then the common domain is oracle com Add the following line in ult rasearch properties domain oracle com Starting the Web Server With the Oracle Application Server release start the Web server using the Oracle Enterprise Manager Application Server Control See Also Oracle Application Server 10g Administrator s Guide for information on the Application Server Control With the database release do the following java jar SORACLE_HOME oc43j j2ee home oc4 jar config SORACLE_HOME oc44 32ee 0C4J_SEARCH config server xml 3 24 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Testing the Oracle Ultra Search Administration Tool Check that the Web Server is running Test your changes by attempting to log on to the administration tool m Visit http hostname domainname port ultrasearch admin index jsp where hostname domainname is the full name of the host where you have installed the Oracle Ultra Search middle tier and port is the default Web serve
61. list For RMI based remote crawlers you will see the host port combination that uniquely identifies the RMI subsystem For JDOBC based remote crawlers you will see the Launcher name Click Edit to complete the configuration process for the remote crawler profile Unregistering a Remote Crawler If you enter any wrong values for the register sql script you should unregister the remote crawler using the unregister sql script Invoke the unregister script the same way as you invoke the registration script The unregister sql script calls the wk_crw unregister_remote_crawler PL SQL API After you have successfully unregistered the remote crawler you can rerun the register sql script Configuring Oracle Ultra Search in a Hosted Environment Oracle Ultra Search is configured to be non hosted during the default install To change to a hosted environment perform the following steps to configure Oracle Ultra Search in the hosted environment Installing and Configuring Oracle Ultra Search 3 29 Configuring Oracle Ultra Search in a Hosted Environment Preconfiguration Tasks for a Hosted Environment Make sure the hosting mode is enabled Also make sure the subscriber is created in the Oracle Internet Directory server See Also OracleAS Portal Configuration Guide section Enabling Hosting on an Out of Box Portal for instructions on how to enable the hosting mode and section Adding Subscribers for instructions on how to add a subscriber to t
62. only way to grant a user administration privileges is to assign them to an administration group Oracle Ultra Search authorizes the user administration privileges based on the administration groups to which the user belongs The following groups are created for each Oracle Ultra Search instance a Super users Users in this group can create or drop Oracle Ultra Search instances and can administer Oracle Ultra Search instances within the installation Super users must obey the rules for document relevancy boosting and ACL defined for each of the documents associated with the Oracle Ultra Search instance For example if a document ACL does not grant access to the super user or group then the super user cannot search and browse the document Instance administrators Users in this group can administer the Oracle Ultra Search instance Only the instance database schema user and members in the super users group can drop the instance Authorization of the Administration Privileges The authorization of the administration user is performed in the following steps 1 12 Oracle Ultra Search User s Guide Oracle Ultra Search Features After the administration user is successfully authenticated by the SSO server or the Oracle Ultra Search database the Oracle Ultra Search GUI brings up the first screen for the user to choose an Oracle Ultra Search instance The Oracle Ultra Search GUI looks up the Oracle Internet Directory server or Oracle Ultra Se
63. privileges belong to super users See Also Step 4 Create and Configure New Users for Oracle Ultra Search Instances on page 4 5 Globalization Page Oracle Ultra Search lets you translate names to different languages This page lets you enter multiple values for search attributes list of values LOV display names and data groups 8 44 Oracle Ultra Search User s Guide Globalization Page Search Attribute Name This section lets you translate attribute display names to different languages The pull down menu lists the following languages a English a Arabic a Brazilian Portuguese a Canadian French a Czech a Danish C Dutch a Finnish a French a German a Greek a Hebrew a Hungarian a talian Japanese a Korean a Latin American Spanish a Norwegian a Polish a Portuguese a Romanian a Russian a Simplified Chinese a Slovak Spanish Understanding the Oracle Ultra Search Administration Tool 8 45 Globalization Page a Swedish L Thai a Traditional Chinese a Turkish LOV Display Name This section lets you translate data group names to different languages Select a search attribute from the pull down menu author description mimetype subject or title Select the LOV type and then select the language from the pull down menu Data Group Name This section lets you translate data group display names to different languages The pull down menu lists the language options
64. process involved configuring ORACLE_HOME for directory usage You must make sure to restart the Oracle listener to inherit the changes made to ORACLE_HOME Restart the listener if you have not already done so Step 3 Install or upgrade Oracle Ultra Search if necessary Installing and Configuring Oracle Ultra Search 3 7 Installing the Oracle Ultra Search Backend After you have configured the Oracle Ultra Search database to work with Oracle Internet Directory you can install or upgrade the Oracle Ultra Search backend into the Oracle Server if you have not already done so Step 4 Create the sys apps ultrasearch folder Immediately after installation or upgrade you must run a SQL script to create the sys apps ultrasearch folder in the XML DB repository This folder stores all Oracle Ultra Search ACLs in XML DB To create the sys apps ultrasearch folder do the following 1 cdto SORACLE_HOME ultrasearch admin 2 Login to the Oracle Ultra Search database using SQL Plus as user WKSYS 3 Invoke the SQL script wkOprepxdb sql See Also Changing Oracle Ultra Search Schema Passwords on page 4 2 for information on changing the WKSYS password Upon termination the wkOprepxdb sql script lists all Oracle Ultra Search related XML DB resources by running the following SQL SELECT any_path FROM resource_view WHERE any_path LIKE Sultrasearch Execution of that SOL statement must show two rows
65. pure Java application that runs in a Java virtual machine A Java virtual machine uses the Java classpath to find classes during runtime When Oracle Ultra Search is installed the default crawler classpath is stored in the database Whenever a new Oracle Ultra Search instance is created this default classpath is copied and used as the crawler classpath for that specific instance Reasons for Altering the Crawler Java Classpath Usually you do not need to alter the crawler Java classpath However there are certain reasons for you to do so One reason could be to replace the JavaMail reference implementation with a third party JavaMail implementation Difference Between the Crawler Classpath and the Remote Crawler Classpath The crawler classpath is the classpath of a crawler that runs on the same host as the Oracle Ultra Search backend However Oracle Ultra Search allows remote crawlers to be run on other hosts for scalability Remote crawler activation uses Java remote method invocation RMI technology As a result the classpath setting of a remote crawler is inherited from the classpath settings of the RMI registry and RMI daemon See Also Using the Remote Crawler on page 5 6 Altering the Crawler Java Classpath B 1 Altering the Crawler J ava Classpath on the Oracle Ultra Search Server Host Altering the Crawler Java Classpath on the Oracle Ultra Search Server Host Log on to the host where the Oracle Ultra Search backe
66. so that it can adapt to the new character set Two SQL scripts wk Oprefcheck sql and wk0idxcheck sql located in SORACLE_HOME ultrasearch admin are used for this reconfiguration a wkOprefcheck sql is invoked under wksys to reconfigure default cache character set and index preferences a wkOidxcheck sql is needed for reconfiguring instance s created before the database character set change for example the default instance This script must be invoked by the instance owner and wkOprefcheck sql must be run first as it depends on reconfigured default settings generated by wkOprefcheck sql Running wk0idxcheck sql also drops and recreates the Oracle Text index used by Oracle Ultra Search If there are already data sources indexed you must force a recraw of all of the data sources wk0idxcheck sql must be run once for each instance For example if there are two instances inst1 and inst2 owned by owner1 and owner2 respectively then wk0idxcheck sql1 should be run twice once by owner1 and once by owner2 Configuring the Default Oracle Ultra Search Instance The Oracle Ultra Search installer creates a default out of the box Oracle Ultra Search instance based on the default Oracle Ultra Search test user So you can test Oracle Ultra Search functionality based on the default instance after installation The default instance name is WK_INST It is created based on the database user WK_ TEST The default user pa
67. system Real Application Clusters the NFS mount can be limited to the specific node where the Oracle instance is serving the remote crawler There is no advantage to mounting the remote file system to all nodes it could lead to stale NFS handles when nodes go down When there is a configuration change to move to a different Oracle instance the remote file system should be NFS mounted to the new node accordingly Logging on to the Oracle Instance All components of Oracle Ultra Search use the JDBC Thin Driver with the connect string consisting of iostname port SID or the full connect descriptor as seen in tnsnames ora The administration middle tier connects to the Oracle database with a JDBC connection specified in the ultrasearch properties file If the client serving node is down then you must manually edit the ultrasearch properties file to connect to a different Oracle instance Query Search Application for Read Application Clusters Query components should fully utilize Real Application Clusters You can specify the JDBC connection string as a database connect descriptor so that it can connect to any Oracle instance in Real Application Clusters For example jJdbc oracle thin DESCRIPTION LOAD_BALANCE yes ADDRESS_ LIST ADDRESS PROTOCOL TCP HOST cls02a PORT 3999 Tuning and Performance 5 15 Oracle Ultra Search on Real Application Clusters ADDRESS PROTOCOL TCP HOST cls02b PORT 3999 CONNECT_DATA
68. the SORACLE_HOME tools remotecrawler scripts unix runall sh Bourne shell script for RMI based remote crawling Source runall1_ jdbc sh for JDBC based remote crawling a Ifthe remote crawler is running on a Windows host then run the SORACLE _ HOMES tools remotecrawler scripts winnt runall bat file for RMI based remote crawling Runrunall_jdbc bat for JDOBC based remote crawling For RMI based remote crawling the runal1 scripts perform the following tasks in sequence 1 define_env is invoked to define necessary environment variables 2 runregistry is invoked to start up the RMI registry 3 xrunrmidis invoked to start up the RMI daemon 4 register_stub is invoked to register the necessary Java classes with the RMI subsystem Tuning and Performance 5 11 Using the Remote Crawler Note You can invoke runregistry runrmid and register_ stub individually However you must first invoke define_env to define the necessary environment variables For JDBC based remote crawling the runall_jdbc scripts perform the following tasks in sequence 1 define_env is invoked to define necessary environment variables 2 The JDBC launcher is started with a command line Java process invocation There are the following command line arguments for the JDBC launcher a name name of launcher that you used to register this launcher a url JDBC connection URL to the backend Oracle Ultra Search database a user
69. the language For documents with no language specification the Oracle Ultra Search crawler attempts to automatically detect language Click Yes to turn on this feature The language recognizer is trained statistically using trigram data from documents in various languages Danish Dutch English French German Italian Portuguese and Spanish It starts with the hypothesis that the given document does not belong to any language and ultimately refutes this hypothesis for a particular language where possible It operates on Latin 1 alphabet and any language with a deterministic Unicode range of characters Chinese Japanese Korean and so on The crawler determines the language code by checking the HTTP header content language or the LANGUAGE column if it is a table data source If it cannot determine the language then it takes the following steps 8 12 Oracle Ultra Search User s Guide Crawler Page 1 Ifthe language recognizer is not available or if it is unable to determine a language code then the default language code is used 2 If the language recognizer is available then the output from the recognizer is used This language code is populated in LANG column of the wk url and wk doc tables Multilexer is the only lexer used for Oracle Ultra Search All document URLs are stored in wk doc for indexing and wk ur1 for crawling Default Language If automatic language detection is disabled or if a Web document does not ha
70. the express written permission of Oracle Corporation If the Programs are delivered to the U S Government or anyone licensing or using the programs on behalf of the U S Government the following notice is applicable Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are commercial computer software and use duplication and disclosure of the Programs including documentation shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement Otherwise Programs delivered subject to the Federal Acquisition Regulations are restricted computer software and use duplication and disclosure of the Programs shall be subject to the restrictions in FAR 52 227 19 Commercial Computer Software Restricted Rights June 1987 Oracle Corporation 500 Oracle Parkway Redwood City CA 94065 The Programs are not intended for use in any nuclear aviation mass transit medical or other inherently dangerous applications It shall be the licensee s responsibility to take all appropriate fail safe backup redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes and Oracle Corporation disclaims liability for any damages caused by such use of the Programs Oracle is a registered trademark and Oracle9i Oracle8i PL SQL Oracle Store SQL Plus are trademarks or registered trademarks of Oracle Corporation Other names may be trademarks of t
71. the term The document is also indexed the next time the schedule is run With manual URL entry you can only assign URLs for Web data sources Users get an error message on this page if no Web data source is defined Note Oracle Ultra Search provides a command line tool to load metadata such as document relevance boosting into an Oracle Ultra Search database If you have a large amount of data this is probably faster than using the HTML based administration tool For more information see Appendix A Loading Metadata into Oracle Ultra Search Query Statistics Enabling Query Statistics This section lets you enable or disable the collection of query statistics The logging of query statistics reduces query performance Therefore Oracle recommends that you disable the collection of query statistics during regular operation Note After you enable query statistics the table that stores statistics data is truncated every Sunday at 1 00 A M Viewing Statistics If query statistics is enabled you can click one of the following categories a Daily Summary of Query Statistics a Top 50 Queries Top 50 Ineffective Queries Top 50 Failed Queries Daily Summary of Query Statistics This summarizes all query activity on a daily basis The statistics gathered are Average query time the average time taken over all queries Number of queries the total number of queries made in the day a Number
72. to display the table data retrieved If display URL column is available then Oracle Ultra Search uses the column to get the URL to display the table data source content You can also specify display URL templates in the following format http hostname port path parameter_name key1 where key is the corresponding table s primary key column For example assume that you can use the following URL to query the bug number 1234567 and the bug number is the primary key of the table http bug 7777 pls bug rptno 1234567 You can set the table source display URL template to http bug 7777 pls bug rptno key1 Understanding the Oracle Ultra Search Administration Tool 8 25 Sources Page The Table Column to Key Mappings section provides mapping information Oracle Ultra Search supports table keys in STRING NUMBER or DATE type If key1 is of NUMBER or DATE type then you must specify the format model used by the Web site so that Oracle knows how to interpret the string For example the date format model for the string 11 Nov 1999 is DD Mon YYYY You can also map other table columns to Oracle Ultra Search attributes Do not map the text column 7 Specify the ACL access control list policy for the data source When a user performs a search the ACL controls which documents the user can access The default is no ACL with all documents considered public and visible Alternatively you can specify to use Oracle Ultra Se
73. to index and query data retrieved from your data sources The backend is not visible to users it indexes information from the crawler and serves up the query results See Also Installing the Oracle Ultra Search Backend on page 3 3 Oracle Ultra Search Administration Tool The administration tool is a J2EE compliant Web application You can use it to manage Oracle Ultra Search instances and you can access it from any browser in your intranet The administration tool is independent from the Oracle Ultra Search query application Therefore the administration tool and query application can be hosted on different computers to enhance security and scalability See Also Chapter 8 Understanding the Oracle Ultra Search Administration Tool Oracle Ultra Search APIs and Sample Applications Oracle Ultra Search provides the following APIs a The query API works with indexed data The Java API does not impose any HTML rendering elements The application can completely customize the HTML interface The crawler agent API crawls and indexes proprietary document repositories Introduction to Oracle Ultra Search 1 3 Oracle Ultra Search Features The email Java API accesses archived emails and is used by the query application to display emails It can also be used when building your own custom query application The URL rewriter API is used by the crawler to filter and rewrite extracted URL links before they are inserted into the URL que
74. ultrasearch_query jar gt E ultrasearch webapp config gt E jlib uix2 jar gt E jlib share jar gt E jlib regexp jar gt E jdbc lib nls_charset12 zip gt E jlib repository jar gt E jlib ohw jar gt E jlib ldapjclnt9 jar gt E j2ee home jazn jar gt E portal jlib ptlshare jar gt E portal jlib pdkjava jar gt A A ANC OI AS BAS BS OB WN AS GS Oeooooo ooo oo o The preceding libraries are required for the Oracle Ultra Search administration tool and query Web applications to run Installing and Configuring Oracle Ultra Search 3 15 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Note SORACLE_HOME ultrasearch webapp config contains the ultrasearch properties file For more information see Editing the ultrasearch properties File on page 3 23 a For default web site xml Under lt web site gt tag add the following lt web app application UltrasearchQuery name query root ultrasearch query gt lt web app application UltrasearchQuery name welcome root ultrasearch gt lt web app application UltrasearchAdmin name admin root ultrasearch admin gt lt web app application UltrasearchAdmin name admin_sso root ultrasearch admin_sso gt lt web app application UltrasearchAdmin name ohw root ultrasearch ohw gt The preceding lines describe which Web application WAR file
75. under products database and products ias but not under products ias web_cache will be crawled Domain inclusion www oracle com Domain inclusion otn oracle com Path inclusion for otn oracle com products database products ias Path exclusion for otn oracle com products ias web_cache robots txt Protocol and robots META Tag The robots txt protocol is the webmaster s path rule for any spider or crawler that visits his or her Web site It is described in the document A Standard for Robot Exclusion at http www robotstxt org wc norobots html The following example robots txt file specifies that no robots should visit any URL starting with cyberworld map or tmp or foo html robots txt for http www acme com User agent Disallow cyberworld map Disallow tmp Disallow foo html By default the Oracle Ultra Search crawler observes the robots txt protocol but it also allow the user to override it If the Web site is under the user s control a 7 10 Oracle Ultra Search User s Guide Web Crawling Boundary Control specific robots rule can be tailored for the crawler by specifying the Ultra Search crawler agent name User agent Oracle Ultra Search For example User agent Oracle Ultra Search Disallow tmp The robots META tag instructs the crawler whether to index a Web page or follow the links within it It is described in HTML Author s Guide to the Robots META tag http www robo
76. version if an unexpected error occurs such as a power failure or system failure For in place migration back up the database before starting migration For ETL migration because all previous data is kept you can switch back to the previous for example 9 0 1 system Oracle Ultra Search Migration Logs The upgrade script provides log files to show which actions the migration has taken The upgrade script writes the following contents to the log file a The current execution step a Any error message raised from the stored procedures Number of data records backup a Number of data records copied or migrated For in place migration the wk Qupgrade sql1 script writes the execution logs to the file wkOupgrade log in the SULTRASEARCH_HOME admin directory For ETL migration the wkOmigrate sql script writes the execution logs to the file wkOmigrate log in the s3ULTRASEARCH_HOME admin directory Upgrade from Oracle Ultra Search 9 0 2 to 9 0 3 To upgrade Oracle Ultra Search 9 0 2 to 9 0 3 perform the following steps 4 14 Oracle Ultra Search User s Guide Configuring the Query Application 1 Copy all Oracle Ultra Search 9 0 2 files recursively under the Oracle Application Server 9 0 2 infrastructure tier SORACLE_HOME ultrasearch to a different directory in case if you need to downgrade to 9 0 2 later 2 Logon to the Oracle Ultra Search 9 0 2 administration tool Stop and disable all crawler synchroniz
77. within one installation of Oracle Post Installation Information 4 5 Configuring the Oracle Server for Oracle Ultra Search Note Oracle Ultra Search requires that each Oracle Ultra Search virtual instance belong to a unique database user Therefore as part of the installation process you must create one or more new database users to own all data for your Oracle Ultra Search instance If you intend to create more than one database instance you should also create multiple user tablespaces one for each user You must grant the WKUSER role to database users hosting new Oracle Ultra Search instances See Also Users Page on page 8 43 Enter the following statements to create and configure a new user Run these statements as the WKSYS SYSTEM or SYS database user CREATE USER username IDENTIFIED BY password DEFAULT TABLESPACE default_tbs TEMPORARY TABLESPACE temporary_tbs QUOTA UNLIMITED ON default_tbs where username name of the Oracle Ultra Search instance owner and password password of the Oracle Ultra Search instance owner and default_tbs default tablespace for the Oracle Ultra Search instance created in step 3 and temporary_tbs temporary tablespace created in step 2 GRANT WKUSER TO username After these steps are completed WKSYS or an Oracle Ultra Search super user can create an Oracle Ultra Search instance on this user schema If you want this user to have the general administrative pr
78. xsd attribute name value use required type xsd string gt lt xsd attribute name display_name use required type xsd string gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd attribute name lang use required gt lt xsd simpleType gt lt xsd restriction base xsd string gt lt xsd length value 5 gt lt xsd pattern value a zA Z 2 a zA 2 2 gt lt xsd restriction gt lt xsd simpleType gt lt xsd attribute gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd complexType gt lt xsd element gt lt xsd element name data_source minOccurs 0 maxOccurs A 6 Oracle Ultra Search User s Guide unbounded gt XML Schema for LOVs and LOV Display Names lt xsd complexType gt lt xsd sequence gt lt xsd element name lov_values minOccurs 0 gt lt xsd complexType gt lt xsd sequence gt lt xsd element name entry maxOccurs unbounded gt lt xsd complexType gt lt xsd attribute name value use required type xsd string gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd complexType gt lt xsd element gt lt xsd element name lov_display_names minOccurs 0 gt lt xsd complexType gt lt xsd sequence gt lt xsd element name entry maxOccurs unbounded gt lt xsd complexType gt lt xsd attribute name value use re
79. 10 8 Single Sign On Server 1 5 SORT_AREA_RETAINED_SIZE initialization parameter 4 4 SORT_AREA_ SIZE initialization parameter 4 4 status codes 7 12 stoplists 4 7 default 4 7 modifying 4 8 T triggers 5 18 U Ultra Search administration tool 8 1 administrative privileges 8 44 APIs 1 3 backend 1 3 3 3 components 1 2 configuration 1 14 configuring 4 2 crawler 1 2 7 2 default instance 6 4 globalization 8 44 instance Index 3 default 3 10 instance administrators 1 12 6 4 instances 8 6 8 10 creating 8 7 snapshot 8 7 integration with OID 1 12 6 9 integration with Oracle Application Server 1 5 languages 8 13 8 44 logging on 8 3 managing users 8 43 metadata loader A 1 middle tier 3 11 on Real Application Clusters 5 13 overview 1 2 remote crawler 7 12 search portlet 1 8 snapshot instances 8 8 super users 1 12 6 4 system requirements 3 1 tuning 5 2 upgrading 4 11 approaches 4 12 users 8 43 Ultra Search searchlet 8 30 ultrasearch rar 8 30 undo space sizing 4 3 PDATE_CRAWLER_CONFIG procedure 10 18 PDATE_SCHEDULE procedure 10 14 pdating aschedule 10 14 RL boundary rule 7 9 RL link filtering 9 29 RL link rewriting 9 30 RL looping 5 2 RL rewriter 1 9 7 11 8 23 9 29 creating 9 32 using 9 32 RL rewriter API 1 4 RL status codes 7 12 RL submissions 8 40 rlRewriter 8 23 9 29 GEG ee Ge UCNE CACCE V views C 1 Index 4 OUS_CRAWLER SETT
80. 3 a Oracle Ultra Search Java API Reference for the oracle ultrasearch query Query interface a Size the shared pool The shared pool stores the library cache and the dictionary cache The library cache stores recently run SQL and PL SQL code A cache miss on the data dictionary cache or library cache is more expensive than a miss on the buffer cache For this reason the shared pool should be sized to ensure that frequently used data is cached The shared pool size is controlled by the SHARED_POOL_ SIZE initialization parameter 5 4 Oracle Ultra Search User s Guide Tuning Query Performance See Also Oracle Database Performance Tuning Guide for information on tuning this parameter Define JDBC connection pooling The Oracle Ultra Search middle tier connects to the database through JDBC Because creation of a connection is an expensive operation in JDBC a pool of open connections is used to improve the response time of queries With Oracle Application Server OC4J can manage the connection pool for the applications The minimum size maximum size and allocation algorithm of the pool can be specified in the data sources xml configuration file of OC4J The following is an example of a data source definition with minimum 2 and maximum 30 open connections Each connection closes after 30 seconds of inactivity and new connections are created dynamically according to load The other caching schemes are FIXED_WAIT_SCHEME and FIXED
81. 4 not found 405 method not allowed 406 not acceptable 407 proxy authentication required 408 request timeout 409 conflict 7 12 Oracle Ultra Search User s Guide Oracle Ultra Search Crawler Status Codes Table 7 1 Cont Oracle Ultra Search URL Status Codes Code Explanation 410 gone 414 request URI too large 500 internal server error 501 not implemented 502 bad gateway 503 service unavailable 504 gateway timeout 505 HTTP version not supported 902 timeout reading document 903 filtering failed 904 IOEXCEPTION in processing URL 906 connection refused 907 socket bind exception 908 filter not available 909 duplicate document detected 910 duplicate document ignored 911 empty document 951 URL not indexed 952 URL crawled 953 meta tag redirection 954 HTTP redirection 955 blacklist URL 956 URL is not unique 957 sentry URL URL as placeholder 958 document read error 959 form login failed 1001 data type is not TEXT HTML Understanding the Oracle Ultra Search Crawler and Data Sources 7 13 Oracle Ultra Search Crawler Status Codes Table 7 1 Cont Oracle Ultra Search URL Status Codes Code Explanation 1002 broken network datastream 1003 HTTP redirect location does not exist 1004 bad relative URL 1005 HTTP error 1006 error parsing HTTP header 1007 invalid URL table column name 1008 JDBC driver missing 1009 binary document reported as text document 1010 invalid display URL 7 14 Oracle
82. 8 36 Oracle Ultra Search User s Guide Schedules Page Launching Synchronization Schedules A schedule s synchronization frequency can be identical to another schedule s synchronization frequency This gives you maximum flexibility in managing data source synchronization You can launch a synchronization schedule in the following ways a Set a schedule frequency and wait for the predetermined launch time a Runit immediately To do so click Status then Execute Immediately a Manually start the schedule Note Launching a synchronization schedule can take a very long time If a schedule has been launched before then the next time a schedule is launched all URLs that belong to the data source to be crawled by the schedule are updated to put into a queue Depending on the number of URLs associated with that data source the enqueue operation may take a long time The administration tool displays the schedule state as Launching the entire time The launch of a schedule does not perform any enqueue if the URL queue is not empty or if there is a new seed added since the last crawl For example if the user stopped the crawler earlier or if the crawler terminated because of insufficient Oracle table space then the URL queue is not empty So on the next launch the crawler does not try to enqueue instead it works on the existing URL queue until it is empty In other words enqueue is only performed when the queue is empty a
83. ATH environment variable to include SORACLE_HOME ctx 1lib and on Windows set the PATH environment variable to include ORACLE_HOME bin Configure the Oracle Database for Oracle Ultra Search After you have installed all Oracle Ultra Search components you can optionally configure the Oracle database See Configuring the Oracle Server for Oracle Ultra Search on page 4 2 for more information Configure a Secure Oracle Ultra Search Installation Step 1 Check the database version requirements and configure Oracle Identity Management Before you can set up a secure Oracle Ultra Search installation you must do the following Install or upgrade the Oracle database to 9 2 0 4 or higher The middle tier and IM identity management version should be 9 0 4 or higher If you have a 3 6 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Backend 9 2 0 4 database you can use RepCA to convert a 9 2 0 4 database to an Oracle Application Server 9 0 4 metadata repository Install and configure the Oracle Internet Directory OID The middle tier and IM identity management version should be 9 0 4 or higher a Register the database to Oracle Internet Directory You can use repCA to register the database to Oracle Internet Directory After registration you need to perform these manual steps Add the distinguished name of the database to the database server parameter file as an RDBMS_SERVER_DN
84. AULT 1 start_hour IN NUMBER DEFAULT 1 start_day IN NUMBER DEFAULT 1 return varchar2 type The schedule interval type This allowed values are defined as package constants HOURLY DAILY WEEKLY MONTHLY and MANUAL frequency The schedule frequency This depends on the interval type it can be every x number of hours days weeks months Not used for MANUAL interval type start_hour The schedule s launching hour in 24 hour format where 1 represents 1 AM Not used for HOURLY and MANUAL schedules start_day The schedule s start day this parameter is only used for WEEKLY and MONTHLY intervals The day of the week is specified as 0 through 6 where 0 is Sunday the the day of the month is specified as 1 through 31 This specifies an interval of every 5 days starting at 6 PM OUS_ADM INTERVAL OUS_ADM DAILY 5 18 This specifies launch on demand OUS_ADM INTERVAL OUS_ADM MANUAL This specifies every 2 weeks on Monday starting at 6 AM OUS_ADM INTERVAL t ype gt OUS_ADM WEEKLY frequency gt 2 start_day gt 2 start_hour gt 6 Administration PL SQL APIs 10 11 INTERVAL This specifies every 3 months starting on the first day of the month at 11 PM OUS_ADM INTERVAL OUS_ADM MONTHLY 3 23 1 10 12 Oracle Ultra Search User s Guide Schedule Related AP Is SET_SCHEDULE Use this procedure to execute resume or stop a schedule Syntax OUS_ADM SET_S
85. CHEDULE name IN VARCHAR2 operation IN NUMBER i name The name of the schedule operation This may be EXECUTE RESUME or STOP Example OUS_ADM SET_SCHEDULE marketing site schedule ous_adm EXECUTE Administration PL SQL APIs 10 13 UPDATE_SCHEDULE UPDATE_SCHEDULE Syntax Examples Use this procedure to update a crawler schedule OUS_ADM UPDATE_SCHEDULE name IN VARCHAR2 operation IN NUMBER value IN VARCHAR2 DEFAULT null i name The name of the schedule to update operation The desired update operation Some operations may need a value Possible values include RENAME ADD_DS REMOVE_DS SET_INTERVAL CRAWL_MODE RECRAWL_POLICY and SET_ CRAWLER Values that are not allowed include ENABLE_SCHEDULE and DISABLI SCHEDULE GI value This parameter is context sensitive to the update operation It can be a new schedule name RENAME a data source name ADD_DS or REMOVE_DS an interval string SET_INTERVAL a crawl mode value CRAWL_MODE a recrawl policy RECRAWL or a crawler ID SET_CRAWLER OUS_ADM UPDATE_SCHEDULE marketing site schedule ous_adm SET_INTERVAL OUS_ADM INTERVAL ous_adm HOURLY 3 OUS_ADM UPDATE_SCHEDULE marketing site schedule OUS_ADM RENAME marketing site OUS_ADM UPDATE_SCHEDULE marketing site OUS_ADM ADD_DS marketing primary site OUS_ADM UPDATE_SCHEDULE marketin
86. Display Names on page A 5 Loading Metadata into Oracle Ultra Search A 3 Loading Search Attribute LOVs and LOV Display Names Example of the LOV XML File lt xml version 1 0 encoding UTF 8 gt lt lov_list gt lt lov search_attr_name Department search_attr_type string gt lt default gt lt lov_values gt lt entry value 100 gt lt entry gt lt entry value 200 gt lt entry gt lt lov_values gt lt lov_display_names lang en US gt lt entry value 100 display_name Human Resource gt lt entry gt lt entry value 200 display_name Finance gt lt entry gt lt lov_display_names gt lt default gt lt data_source name data source a gt lt lov_values gt lt entry value 300 gt lt entry gt lt entry value 400 gt lt entry gt lt lov_values gt lt lov_display_names lang en US gt lt entry value 300 display_name Sales gt lt entry gt lt entry value 400 display_name Marketing gt lt entry gt lt lov_display_names gt lt data_source gt lt data_source name data source b gt lt lov_values gt lt entry value 500 gt lt entry gt lt entry value 600 gt lt entry gt lt lov_values gt lt lov_display_names lang en US gt lt entry value 500 display_name Production gt lt entry gt lt entry value 600 display_name Research gt lt entry gt lt lov_display_names gt lt data_source gt lt lov gt lt lov_list gt In the previous example
87. E 10 9 DROP_INSTANCE 10 5 DROP _ SCHEDULE 10 10 GRANT_ADMIN 10 6 INTERVAL 10 11 IS_ADMIN_READONLY 10 16 REVOKE_ADMIN 10 7 SET_INSTANCE 10 8 SET_SCHEDULE 10 13 UPDATE_CRAWLER_CONFIG 10 18 UPDATE_SCHEDULE 10 14 PROCESSES initialization parameter 4 4 proxy server 8 18 Q query API 1 3 1 9 9 2 query statistics 8 41 query syntax expansion 1 13 9 3 query tag library 9 9 queuing documents 7 4 R reconfiguration backend 3 10 redo log files sizing 4 2 relevancy boosting 1 11 8 40 limitations 1 11 remote crawler 7 12 cache files 8 16 configuring 3 27 5 9 JDBC connection 8 16 JDBC based 5 7 launcher 5 6 profiles 8 16 RMI based 5 7 scalability 5 8 security 5 8 unregistering 3 29 using 5 6 remote crawler hosts installing 3 26 resource adapters 1 6 return codes see status codes REVOKE_ADMIN procedure 10 7 revoking user privileges 10 7 robots exclusion 1 9 8 23 robots META tag 7 10 robots txt file 1 9 8 23 9 29 robots txt protocol 7 10 rule domain 7 9 path 7 9 S sample query applications 1 8 9 33 schedules creating 10 9 data synchronization 8 34 dropping 10 10 index optimization 8 38 setting 10 13 setting an interval string 10 11 updating 10 14 search attributes 1 10 8 19 searchlets 1 6 secure search 1 6 3 9 secure searching 3 7 SET_INSTANCE procedure 10 8 SET_SCHEDULE procedure 10 13 setting a schedule 10 13 setting an instance
88. INGS C 2 OUS_DEFAULT_CRAWLER SETTINGS C 2 OUS_INSTANCES C 1 OUS_SCHEDULES C 1 WwW Web crawling 9 29 boundary control 7 9 WK_INST default instance 6 4 WK_TEST instance administrator 6 4 wk0Oidxcheck sql 3 10 wkOmigrate sql script 4 14 wkOprefcheck sql 3 10 wkOpref sql file 4 7 7 2 wkOupgrade sql script 4 12 4 14 WKSYS database user 3 27 4 6 4 8 4 14 6 4 8 4 B 2 changing password 4 2 WKSYS WK_QRY package 5 5 WKUSER role 4 6 8 44 X XML DB 1 7 3 7
89. LE_HOME either on the same computer or on a different computer If the new 9 0 3 system is installed in the same computer as the old 9 0 1 system then the database listener port number should be configured to a different number than the old 9 0 1 database This lets both the old and the new database run at the same time a Re create all Oracle Ultra Search 1 0 3 user instance schemas in the new database Also for each table data source created in Oracle Ultra Search 1 0 3 if the base table is located in the local database then you must copy the base table to the new database If the table data source base table is set to a remote database table then you must re create the database link from the new database to the remote database Post Installation Information 4 13 Upgrading Oracle Ultra Search Use the SQL script wkOmigrate sql to run the ETL migration steps three and four The script is located in the 3ULTRASEARCH_HOME admin directory It requires the following input parameters WKSYSPW password of the user WKSYS CONN_STRING database connect string SRC_WKSYSPW password of the source database 9 0 1 database user WKSYS SRC_CONN_STRING source database connect string The fifth step requires the system administrator to re activate all crawling schedules through the Oracle Ultra Search administration tool Note The upgrade script does not roll back the Oracle Ultra Search system to the old
90. OME ultrasearch lib ultrasearch_query jar a ORACLE_HOME lib mail jar a ORACLE_HOME lib activation jar Figure 9 1 shows how your Web query application calls the Oracle Ultra Search Java query API 9 34 Oracle Ultra Search User s Guide Oracle Ultra Search Sample Query Applications Figure 9 1 Calling JavaServer Pages Browser calls JSP page Web server with URL lt gt Oracle Ultra Search http nostpath aliastojsp myapp jsp J2EE Engine Oracle Ultra Search Java Query API Browser index Oracle Ultra Search Developer s Guide and API Reference 9 35 Oracle Ultra Search Sample Query Applications 9 36 Oracle Ultra Search User s Guide 10 Administration PL SQL APIs This chapter provides reference information on PL SQL APIs available for use with Oracle Ultra Search The APIs are grouped as follows Instance Related APIs a Schedule Related APIs a Crawler Configuration APIs The following tables show the contents of each API group Table 10 1 shows the Instance Related APIs Table 10 1 Instance Related APIs Name Function CREATE_INSTANCE create an Oracle Ultra Search instance DROP_INSTANCE drop an Oracle Ultra Search instance GRANT_ADMIN grant instance administrator privileges REVOKE_ADMIN revoke instance administrator privileges SET_INSTANCE operate on an Oracle Ultra Search instance Administration PL SQL APIs 10 1 a Table 10 2 shows the Schedule Related A
91. ORACLE Oracle Ultra Search User s Guide 10g Release 1 10 1 Part No B10731 01 December 2003 Oracle Ultra Search User s Guide 10g Release 1 10 1 Part No B10731 01 Copyright 2003 Oracle Corporation All rights reserved Primary Author Michele Cyran Contributors _Sandeepan Banerjee Stefan Buchta Eddy Chee Chung Ho Chen Will Chin Jack Chung Ray Hachem Kurt Heiss Cindy Hsin Hassan Karraby Yasuhiro Matsuda Colin McGregor Valarie Moore Visar Nimani Steve Yang David Zhang The Programs which include both the software and documentation contain proprietary information of Oracle Corporation they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright patent and other intellectual and industrial property laws Reverse engineering disassembly or decompilation of the Programs except to the extent required to obtain interoperability with other independently created software or as specified by law is prohibited The information contained in this document is subject to change without notice If you find any problems in the documentation please report them to us in writing Oracle Corporation does not warrant that this document is error free Except as may be expressly permitted in your license agreement for these Programs no part of these Programs may be reproduced or transmitted in any form or by any means electronic or mechanical for any purpose without
92. Oracle Ultra Search used centralized search to gather data on a regular basis and update one index that cataloged all searchable data This provided fast searching but it required that the data source to be crawlable before it could be searched Oracle Ultra Search now also provides federated search which allows multiple indexes to perform a single search Each index can be maintained separately By querying the data source at search time search results are always the latest results User credentials can be passed to the data source and authenticated by the data source itself Queries can be processed efficiently using the data s native format To use federated search you must deploy an Oracle Ultra Search search adapter or searchlet and create an Oracle source A searchlet is a Java module deployed in the middle tier inside OC4J that searches the data in an enterprise information system on behalf of a user When a user s query is delegated to the searchlet the searchlet runs the query on behalf of the user Every searchlet is a JCA 1 0 compliant resource adapter See Also Federated Sources on page 8 30 Secure Search Oracle Ultra Search supports secure searches which return only documents satisfying the search criteria that the search user is allowed to view For secure searches each indexed document should be protected by an access control list ACL During searches the ACL is evaluated If the user performing the search has permi
93. PIs Table 10 2 Schedule Related APIs Name Function CREATE_SCHEDULE create a crawler schedule DROP_SCHEDULE drop a crawler schedule INTERVAL generate a schedule interval string SET_SCHEDULE execute resume or stop a schedule UPDATE_SCHEDULE update a crawler schedule a Table 10 3 shows the Crawler Configuration APIs Table 10 3 Crawler Configuration APIs Name Function IS_ADMIN_READONLY check if a crawler configuration setting is read only SET_ADMIN_ make a read only crawler configuration READONLY UPDATE_CRAWLER_ update crawler configurations CONFIG 10 2 Oracle Ultra Search User s Guide Instance Related AP Is Instance Related APIs This section provides reference information for using the instance related APIs CREATE_INSTANCE Use this procedure to create an Oracle Ultra Search instance Syntax OUS_ADM CREATE_INSTANCE inst_name IN VARCHAR2 schema_name IN VARCHAR2 password IN VARCHAR2 DEFAULT NULL lexer IN VARCHAR2 DEFAULT NULL stop_list IN VARCHAR2 DEFAULT NULL data_store IN VARCHAR2 DEFAULT NULL snapshot IN NUMBER DEFAULT ous_adm NO i inst_name The name of the instance schema_name The name of the schema password The password for the schema lexer The Oracle Text index lexer preference stop list The Oracle Text index stoplist preference data_store The Oracle Text index datastore preference snapshot If OUS_ADM is set to YES create an instance for th
94. SQL Plus to the Oracle backend database and registers the remote crawler host 1 Locate the correct ORACLE_HOME The Oracle Ultra Search middle tier is installed under a common directory known as ORACLE_HOME If you have installed other Oracle products prior to the Oracle Ultra Search middle tier then you could have multiple ORACLE_ HOMEs on your host The registration script requires that you enter the ORACLI HOME directory in which the Oracle Ultra Search middle tier is installed ti 2 Locate the WKSYS super user password You must run the registration script as the WKSYS super user or as a database user that has been granted super user privileges 3 Start SOL Plus Be sure to run the correct version of SQL Plus because multiple versions can reside on the same host if you have previously installed some Oracle products On UNIX platforms make sure that the correct values for PATH ORACLE_HOME and TNS_ADMIN variables are set On Windows platforms choose the correct menu item from the Start menu After you have identified how to run the correct SQL Plus client you must log on to the Oracle Ultra Search database To do this you might need to configure an Oracle Net service setting for the Oracle Ultra Search database Installing and Configuring Oracle Ultra Search 3 27 Installing the Backend on Remote Crawler Hosts See Also Oracle Net Services Administrator s Guide for informat
95. SSO login mode an d the non SSO login mode Privileges to Administer an Oracle Ultra Search Instance In non SSO mode only database users can login to the admin tool a If the login database user has the super user privilege then the user can administer all Oracle Ultra Search instances across the default subscriber and any other subscribers a If the login database user only has the admin privilege on a particular Oracle Ultra Search instance then the user can administer the instance regardless of whether the instance is associated with the default subscriber or any other subscribers In SSO mode only SSO users can login to the admin tool a Ifthe SSO user belongs to the default subscriber then the following is true If the SSO user has the super user privilege then the user can administer all Oracle Ultra Search instances across the default subscriber and any other subscribers for example instances 1 2 3 4 If the SSO user has the admin privilege on a particular Oracle Ultra Search instance for example instance1 within the default subscriber then the user can administer the instance instance 1 that is associated with the default subscriber a Ifthe SSO user belongs to a subscriber then the following is true If the SSO user has the super user privilege then the user can administer only Oracle Ultra Search instances within the subscriber to which he belongs For example if the user from subscriber A has the super user
96. Search Security Considerations When Using Restricting Access to a Data Source An Oracle Ultra Search data source can be protected by a single administrator specified ACL This ACL specifies which users and groups are allowed to view the documents belonging to that data source Introduction to Oracle Ultra Search 1 7 Oracle Ultra Search Features Oracle Ultra Search uses the Oracle Server s ACL evaluation engine to evaluate permissions when queries are performed by search users This ACL evaluation engine is a feature of Oracle XML DB If an Oracle Ultra Search query attempts to retrieve a document that is protected by an administrator specified ACL the ACL is evaluated and subsequently cached The duration an ACL is cached is controlled by an XML DB configuration parameter For more information consult the Oracle XML DB Developer s Guide The xdbconfig sysconfig acl max age parameter must be modified The value is a number in seconds that determines how long ACLs are cached Since ACLs are cached it is important to remember that changes to an administrator specified ACL may not propagate immediately This only applies to database sessions that existed before the change was made See Also Oracle XML DB Developer s Guide and Configure a Secure Oracle Ultra Search Installation on page 3 6 Sample Query Applications Oracle Ultra Search includes fully functional sample query applications to query and display search results T
97. Search Admin Privilege Model in the Hosted Environment ccceee 6 4 Admin Privilege Mod l 03 22 scicececscitee sri ctilitaceis iio t e eR E A EE a 6 6 Resources Protected by Oracle Ultra Search s sssssessssssssessissessessesrisnnsnetessesnesnsntesresnesnenne 6 8 Authorization and Access Enforcement cccccccssesscessesscesseesseseeeseesceseecseeseecseesaecaecaecseeeseenses 6 8 How Oracle Ultra Search Leverages Security Services cccccccccsesesesesesceesesescsesneeseseees 6 8 How Oracle Ultra Search Leverages the Identity Management Infrastructure 6 9 Oracle Ultra Search Extensibility and Security 0ccccccc ccc seses cs ceeesescscseesesescseseneseseeeees 6 9 Configuring a Security Framework for Oracle Ultra Search cccccecccsce cscs tetsteesesneteneeeees 6 9 Configuring Security Framework Options for Oracle Ultra Search sessssesssieseseesses 6 10 Configuring Oracle Identity Management Options for Oracle Ultra Search 00 6 10 Configuring Oracle Ultra Search Security 0 0 00 ccc cece ce ee ceeseeeeeesesesseeeeseseseseseeeeeseseesenees 6 10 Understanding the Oracle Ultra Search Crawler and Data Sources Overview of the Oracle Ultra Search Crawlet ccccccccceeseeseecssseseesesesssssssesesesesssssesesesessesesees 7 2 Crawler Setting Sisene eepe e a a titel ee Sade ra cine E e a i 7 2 Crawler Data Sources i2 csc cis5 ties sisasievecieansiesieeteisins Stsess
98. Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts For server xml under the lt application server gt tag add the following lt application name UltrasearchAdmin path SORACLE_ HOME ultrasearch webapp ultrasearch_admin ear gt lt application name UltrasearchQuery path SORACLE_ HOME ultrasearch ultrasearch_query ear gt lt application name UltrasearchPortlet path SORACLE_ HOME ultrasearch webapp ultrasearch_portlet ear gt Note These lines let OC4J know that it must deploy the Oracle Ultra Search EAR file as well as define where this EAR files is Ultrasearch_admin ear contains the Oracle Ultra Search administration tool Web application The sample ear file contains the sample query JSP pages After OC4J deploys sample ear you can see the SORACLE_HOME ultrasearch sample directory Use the JSPs in this directory to create your own query Web pages For more information on this directory see Testing the Oracle Ultra Search Sample Query Applications on page 3 25 For application xml under the lt orion application gt tag add the following library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H library path SORACLE_H E ultrasearch lib
99. Tier on Web Server Hosts Oracle Application Server Infrastructure The Oracle Ultra Search Oracle Application Server query API uses the data source functionality of the J2EE container Under directory ORACLE_HOME j2ee 0C4d_ Portal config edit the file data sources xml Under tag lt data sources gt add the following lt data source class oracle jdbc pool OracleConnectionCacheImp1 name UltraSearchDSs location jdbc UltraSearchPooledDS username username password password url jdbc oracle thin database_host oracle_port oracle_sid gt Where username and passwordare the Oracle Ultra Search instance owner s database user name and password dat abase_host is the host name of the back end database computer oracle_port is the port to the user s Oracle database and oracle_sidis the SID of the user s Oracle database In addition to user name password and JDBC URL data sources xm1 also allows configuration of the connection cache size as well as the cache scheme The following tag specifies the minimum and maximum limits of the cache size the inactivity time out interval and the cache scheme If you are adding the data source for the default Oracle Ultra Search instance user wk_test then make sure to unlock wk_test first See Also Configuring the Default Oracle Ultra Search Instance on page 3 10 lt data source class oracle jdbc pool OracleConnectionCacheImp1 name UltraSearchDSs location jdbc UltraSear
100. Ultra Search User s Guide Understanding the Oracle Ultra Search Crawler and Data Sources This chapter contains the following topics Overview of the Oracle Ultra Search Crawler Crawler Settings Crawler Data Sources Document Attributes Crawling Process for the Schedule Data Synchronization Web Crawling Boundary Control Oracle Ultra Search Remote Crawler Oracle Ultra Search Crawler Status Codes See Also Tuning Query Performance on page 5 3 Understanding the Oracle Ultra Search Crawler and Data Sources 7 1 Overview of the Oracle Ultra Search Crawler Overview of the Oracle Ultra Search Crawler The Oracle Ultra Search crawler is a Java process activated by your Oracle server according to a set schedule When activated the crawler spawns processor threads that fetch documents from various data sources These documents are cached in the local file system When the cache is full the crawler indexes the cached files using Oracle Text This index is used for querying Note An empty index is created when an Oracle Ultra Search instance is created You can alter the index using SQL The existing preferences such as language specific parameters are defined in the SORACLE_HOME ultrasearch admin wk0pref sq file Crawler Settings Before you can use the crawler you must set its operating parameters such as the number of crawler threads the crawler timeout threshold the database connect stri
101. Ultra Search User s Guide 8 Understanding the Oracle Ultra Search Administration Tool The Oracle Ultra Search administration tool lets you manage Oracle Ultra Search instances This chapter helps guide you through the screens on the Oracle Ultra Search administration tool It contains the following topics a Oracle Ultra Search Administration Tool Logging On to Oracle Ultra Search Logging On and Managing Instances as SSO Users a Instances Page a Crawler Page a Web Access Page Attributes Page a Sources Page a Schedules Page Queries Page Users Page a Globalization Page Oracle Ultra Search Administration Tool The Oracle Ultra Search administration tool is a J2EE compliant Web application You can use it to manage Oracle Ultra Search instances To use the administration Understanding the Oracle Ultra Search Administration Tool 8 1 Oracle Ultra Search Administration Tool tool log on as either a database user an Enterprise Manager super user a Portal user or an SSO user through any browser Note The Oracle Ultra Search administration tool and the Oracle Ultra Search query applications are part of the Oracle Ultra Search middle tier However the Oracle Ultra Search administration tool is independent from the Oracle Ultra Search query application Therefore they can be hosted on different computers to enhance security or scalability With the administration tool you can do the fol
102. Ultra Search welcome page http hostname domainname port ultrasearch index html The sample query applications are shipped as a deployed J2EE Web application sample ear This component depends on a J2FE container to host the Web pages a JDBC driver and Java Mail API for displaying email results After the sample ear file is deployed by the Oracle Containers for J2EE OC4J you see a set of JSP files that demonstrate the query API usage The sample query applications include a sample search portlet The sample Oracle Ultra Search portlet demonstrates how to write a search portlet for use in Oracle Application Server Portal When the user issues a query in any of the query applications a hit list containing query results is returned The user can select a document to view from the hit list A hit list can include HTML documents files database table content archived emails or Oracle Application Server items The Oracle Ultra Search sample query applications also incorporate an email browser for reading and browsing emails The Oracle Ultra Search administration tool and the Oracle Ultra Search sample query applications are part of the Oracle Ultra Search middle tier However the Oracle Ultra Search Developer s Guide and API Reference 9 33 Oracle Ultra Search Sample Query Applications Oracle Ultra Search administration tool is independent from the Oracle Ultra Search sample query applications Therefore they can be hosted
103. _RETURN_NULL_ SCHEME H x lt ED_WAIT_SCHEME 2 and 3 Note DYNAMIC_SCHEME 1 F FIXED_RETURN_NULL_SCHEM Eal lt data source class oracle jdbc pool OracleConnectionCacheImp1 name UltraSearchDS location jdbc UltraSearchPooledDSs username user password pass url jdbc oracle thin hostname 1521 oracle_sid min connections 2 max connections 30 inactivity timeout 30 gt lt property name cacheScheme value 1 gt lt data source gt Pin the query package in memory Pin frequently used packages in the shared memory pool When a package is pinned it remains in memory no matter how full the pool gets or how frequently you access the package You can pin packages using the supplied package DBMS_SHARED_POOL The PL SQL package used for Oracle Ultra Search query is WKSYS WK_ORY Tuning and Performance 5 5 Using the Remote Crawler See Also PL SQL Packages and Types Reference Using the Remote Crawler Without the Oracle Ultra Search remote crawler you must run the Oracle Ultra Search crawler on the same host as the Oracle Database For large data sets you can improve performance by running the Oracle Ultra Search crawler on one or more separate hosts from the Oracle Database The Oracle Ultra Search crawler is a pure Java application and it communicates with the Oracle Database through JDBC Because the Oracle Ultra
104. a 9 21 Document Attributes and Properties se ssssesssseseritssssstesrtesstsstisssertttsseestesnteestenteesnten tt 9 21 Crawler Agent Functionality seenen oon i e e EEE E S 9 21 Data Source Type Registration ccccccccescsceneescsesnsneseseseeneescececesesssesnansneseseeneeneseeeeney 9 21 Data Source Registration c cccscscseesescssssesceceesescscsnsneseseseeseseseececesesssnsnansneseseeeseneseeeeney 9 22 Data Source Attribute Registration ccccccceescscsesteteseseeneseseececenesesesnensneseseeeeneneeeeeenes 9 22 User Implemented Crawler Agent ccccccccscssecsestsneeseseeneesescecesesssesnansessseeeeeneseeeaney 9 23 Interaction Between the Crawler and the Crawler Agent ccccccsscesteteeeeeteteseeeenes 9 23 Crawler Agent APIs and Classes c ccccccceccesssesssestsneeseseeneeseececesesesesnansneseseeeeneneseeeenes 9 24 Sample Agent Piles six ci6s sete ses jechtae ccsihi ees cda tees celebs cage ced es a eaaa Ea Maat ae sae hea ts ites donaa 9 24 Setting up the Sample Crawler Agent ccccseccescesesesesceneesesesnsneseseseeceseseseecanenesesesnsneneneees 9 24 Compiling and Building the Agent Jar File ccccecccscesescececenesesesneneeseseeneeseeeeeenes 9 24 Creating a Data Source Types dressen ia gerne eles EEE EENE OE acres 9 25 Defining Data Source Parameters cccccccecccsceessseeteneesesesnesesceeecesesesesnaneneseseeeesseseeeeney 9 25 Defining a Data Source of this Type ccccccccesscsestsn
105. able records the new values of the id and the name into the k1 and k2 columns An F is inserted into the mark column to signal the crawler that work needs to be done for this row For example CREATE TABLE employees id NUMBER name VARCHAR2 10 CREATE OR REPLACE TRIGGER wk ins 5 18 Oracle Ultra Search User s Guide Table Data Source Synchronization AFTER INSERT ON employees FOR EACH ROW BEGIN INSERT INTO WKSLOG k1 k2 mark VALUES new id new name F END UPDATE Trigger Statement Every time a row is updated in the employees base table the UPDATE trigger inserts two rows into the log table The first row in the log table records the old values of the id and the name into the k1 and k2 columns An F is inserted into the mark column to signal the crawler that work needs to be done for this row The second row in the log table records the new values of the id and the name into the k1 and k2 columns For example Tuning and Performance 5 19 Table Data Source Synchronization CREATE OR REPLACE TRIGGER wkSupd AFTER UPDATE ON employees FOR EACH ROW BEGIN INSERT INTO WKSLOG k1 k2 mark VALUES old id old name F INSERT INTO WKSLOG k1 k2 mark VALUES new id new name F END DELETE Trigger Every time a row is deleted from the employees base table the DELETE trigger inserts a row into the log table The row in the log table records the old values of the id and the name into the k1 and
106. acle Ultra Search User s Guide A access control lists 1 6 access URL 7 3 9 20 9 30 administration groups 1 12 authentication 1 13 single sign on 1 13 6 4 8 3 B backend reconfiguration 3 10 boundary control of Web crawling 7 9 boundary rule 7 9 C cache files remote crawler 8 16 caching documents 7 5 character set change reconfiguration after 3 10 codes URL status 7 12 crawler 7 2 classpath B 1 crawler agents 7 3 crawling process 7 3 data sources 7 2 overview 7 2 parameters 8 2 8 43 read only schedule configuration 10 16 10 18 remote crawler 8 16 settings 7 2 8 12 statistics 8 17 URL status codes 7 12 crawler agent API 9 19 Index functionality 9 21 sample agent files 9 24 setting up 9 24 smart agent 9 21 standard agent 9 20 crawler agent API 1 3 crawler agents 1 5 crawling depth 7 11 CREATE_INSTANCE procedure 10 3 CREATE_SCHEDULE procedure 10 9 creating aschedule 10 9 creating a schedule interval string 10 11 creating aninstance 10 3 CTXSYS user 4 4 D data groups 8 3 8 39 data harvesting mode 1 11 8 34 8 36 data sources 8 21 email 8 27 file 8 28 synchronizing 7 3 table 8 24 synchronization 5 17 user defined 7 3 8 32 Web 8 21 data sources xml file 3 22 DB_CACHE_SIZE parameter 5 3 DBMS_JOB package 1 3 default instance 6 4 display URL 7 3 8 23 8 25 8 26 8 29 9 20 9 30 document attributes 1 10 7 3 domain r
107. addressed to mailing 9 26 Oracle Ultra Search User s Guide Oracle Ultra Search J ava Email API lists that have been indexed by the Oracle Ultra Search system The API can also be used to build your own custom query application The application user interface logic is entirely controlled in the JSP Therefore you can customize the look and feel to your needs Email documents contain valuable information but they are not structured to find specific relevant information easily Oracle Ultra Search lets you retrieve and index emails on a server that supports the IMAP4 protocol An email source is a data source that derives its content from emails sent to a specific email address When the Oracle Ultra Search crawler searches an email source the crawler collects all emails that have the specific email address in any of the To or Cce email header fields Note Oracle Ultra Search stores copies of all retrieved emails in the local file system of the Oracle Ultra Search server installation A possible application of an email source is where an email source represents all emails sent to a mailing list In such a scenario multiple email sources are defined where each email source represents an email list Oracle Ultra Search email crawling and rendering is built on top of the JavaMail API using Sun Microsystems reference implementation of JavaMail This enables Oracle Ultra Search to provide a Java API for accessing index
108. ading Oracle Ultra Search to Oracle Collaboration Suite Release 1 on page 4 11 for details 2 Install the latest Oracle Collaboration Suite release and use Oracle Collaboration Suite Upgrade Assistant to upgrade both Oracle Ultra Search middle tier and backend See the Oracle Collaboration Suite installation guide that is appropriate for your platform Upgrading Oracle Ultra Search to Oracle Collaboration Suite Release 1 Oracle Ultra Search supports the following upgrades Upgrade from Oracle Ultra Search 1 0 3 to 9 0 3 Upgrade from Oracle Ultra Search 9 0 2 to 9 0 3 Upgrade from Oracle Ultra Search 9 2 to 9 0 3 Upgrade is based on the backend only Upgrade on the middle tier is not supported Install the 9 0 3 middle tier in a separate Oracle home Upgrade from Oracle Ultra Search 1 0 3 to 9 0 3 Upgrading from Oracle Ultra Search 1 0 3 Oracle9i Database 9 0 1 to 9 0 3 requires running the upgrade script and performing some manual steps The Oracle Ultra Search upgrade script first verifies the version of the current system then upgrades the system and migrates user data User data includes all dictionary and table data such as information about the metadata data sources mappings crawler schedules authentication and query statistics Post Installation Information 4 11 Upgrading Oracle Ultra Search All crawler schedules and jobs created in the older version are disabled before data and system migration When migrat
109. aitewins Lea Wai asain Aad anid an AA OE TEN 10 7 SET_INSTANGEirsta ts atin oiritieereeieias cnn letions so a a ae asses aaah o 10 8 Schedule Related APIs c cc ccc ccccccccceccasececsecevsecevesceseseessccssuedutescoevehcesscssseroveecdbeutovecrsseedenssebeueseees 10 9 CREA TEZSCHEDULE sree tusedatci token tiie a uiesonaueden hres 10 9 DROP SCHEDULE isis s scscosgssitescdens dututebesbescunatenccsiutbnscalsedbghsns ceesbeavesrulssab bot slasdaebbietetdetasbe iieii 10 10 INTERVA ernten aa a aea a toate bina a a a a 10 11 SET SCM EDULE coios oami onatn n a a eaaa itt ea EE RAEE aE iene 10 13 UPDATE SCHEDULE aena o e E e AN a A E ES ait 10 14 Crawler Configuration APIS seinn giagia iaa E e ara E Naai e Eepe aS Trasa EREE 10 16 IS AD DMIN READONLY ciiisean i e eiiie 10 16 SETADMINEREADON EY ree seit orao en n E Er ERARE 10 17 UPDATE_CRAWLER_CONFIG ccccssssssesesesssesesescesensesnsnecesesseanecesensnananecesenanananesesenanans 10 18 Loading Metadata into Oracle Ultra Search Launching the Loading Tool ienie a aa iae a aei rana aen rir a a aa aieia A 1 Loading Documents and Relevance Scores cc cccccce cs sceeses cece ceeeeseecececeseeseesececesessneseeeeenes A 2 TheInput XME Piles ac eie gss eanna bccn title otet anti ds inten aus ES Enee A 2 Example of the Document Relevance Boosting XML File cccccescsesesceeeesssesnenenens A 3 Loading Search Attribute LOVs and LOV Display Names cece teeeeeees A 3 The LOV XML Bile edgar
110. amed language which is a java util Locale object The display name for the language is provided by Java as a property of the object itself through the getDisplayName method The following example shows all the languages in mybookstore instance using their English display names lt US iterLanguages instance mybookstore gt lt language gt lt language getDisplayName Locale ENGLISH gt lt US iterLanguages gt 9 14 Oracle Ultra Search User s Guide Oracle Ultra Search Query Tag Library lt iterLOV gt Tag Show All Values Defined for a Search Attribute Attribute Name Description instance name This is a mandatory attribute to refer to the object defined by the instance tag locale locale This determines the display name fetched using this tag attributeName attname The name of the attribute whose LOV is being fetched in this LOV attributeType string The type of the attribute whose LOV is being fetched in this number date LOV This is needed because attribute name does not uniquely identify an attribute in the instance This tag is an iteration tag It loops through all the values in a search attribute s LOV In each loop it defines a scripting variable named value which is either a java lang String java util Date or java math BigDecimal object depending on the attribute type It also defines a string variable named displayname which is the localized display name of the val
111. ample lt taglib uri WEB INF ultrasearch taglib tld prefix US gt The Oracle Ultra Search tag library definition TLD file can be found in SORACLE_ HOME ultrasearch sampl e query WEB INF ultrasearch taglib tld after sample ear has been deployed It is also packaged with ultrasearch_ query jar under the name MI Query Tag Descriptions ETA INF taglib tld The following section describes each Oracle Ultra Search tag its attributes and action Examples are shown without any static HTML which can be inserted to format the output lt instance gt Tag Connecting to the Oracle Ultra Search Instance This tag establishes a connection to an Oracle Ultra Search instance Some basic parameters must be established for this tag to work such as JDBC connection string schema user name password Oracle Ultra Search instance name and so on Oracle Ultra Search Developer s Guide and API Reference 9 11 Oracle Ultra Search Query Tag Library Attribute Name instanceld name username password url dataSourceName instanceName tablePagePath emailPagePath filePagePath Description Names the instance defined by this tag This name is then used by other Oracle Ultra Search tags to specify the instance being searched Creates a database connection Creates a database connection Gets the URL used to create a JDBC connection This attribute is optional if dataSourceName is specified
112. an append the correct JSP file name at the end of the URL root Installing and Configuring Oracle Ultra Search 3 25 Installing the Backend on Remote Crawler Hosts The sample query application is shipped as ORACLE HOME ultrasearch ultrasearch_query ear The URL root for the query is http hostname domainname port ultrasearch query The URL root for the Oracle Application Server query is in http hostname domainname port ultrasearch query Portlet is shipped as ORACLE_HOME ultrasearch webapp ultrasearch_ portlet ear Installing the Backend on Remote Crawler Hosts The Oracle Ultra Search remote crawler allows multiple crawlers to run in parallel on different hosts However all remote crawler hosts must share common resources such as common directories and a common Oracle Ultra Search database Installing the Backend on Remote Crawler Hosts The Oracle Ultra Search remote crawler is part of the Oracle Ultra Search backend Therefore the installation procedure is the same as installing the Oracle Ultra Search backend On each remote crawler host the Oracle Ultra Search backend is installed under a common directory known as ORACLE_HOME You should have been prompted by the Oracle Universal Installer to enter this directory The remote ORACLE_HOME directory is referred to as REMOTE_ORACLE_HOME If you choose not to install the Oracle HTTP Server during the Oracle Appli
113. ance3 to an SSO user in subscriber B Resources Protected by Oracle Ultra Search All publicly crawled data is publicly accessible The following resources are protected by Oracle Ultra Search Access control list ACL aware crawled data is protected in other words it is private to users named by the ACL All passwords are protected User defined data source parameters are protected Authorization and Access Enforcement There are three possible entry points to Oracle Ultra Search The database This contains all data All data and metadata is protected with row level security All passwords are encrypted The Oracle Ultra Search administration tool This does not contain crawled data You must authenticate with SSO or database authentication The Oracle Ultra Search query tool This contains crawled data Unauthenticated users can see only public data Authenticated users can see public data and ACL protected information Users must authenticate themselves to see private information How Oracle Ultra Search Leverages Security Services Oracle Ultra Search uses the following to leverage security services Oracle Ultra Search uses secure socket layers SSL the industry standard protocol for managing the security of message transmission on the Internet This is used for securing RMI connections HTTPS crawling and secure JDBC JAZN Oracle Application Server Containers for J2EE OC4J implements a Java authentication and aut
114. arch ACL You can add more than one group and user to the ACL for the data source The option to choose is only available if the instance is security enabled See Also Oracle Database SQL Reference for more on format models Editing Table Sources On the main Table Sources page click Edit to change the name of the table source You can change add or delete table column and search attribute mappings change the display URL template or column and view values of the table source settings Table Sources Comprised of More Than One Table If a table source has more than one table then a view joining the relevant tables must be created Oracle Ultra Search then uses this view as the table source For example two tables with a master detail relationship can be joined through a SELECT statement on the master table and a user implemented PL SQL function that concatenate the detail table rows Limitations With Database Links The following restrictions apply to base tables or views on a remote database that are accessed over a database link by the crawler a Ifthe text column of the base table or view is of type BLOB or CLOB then the table must have a ROWID column A table or view might not have a ROWID column for various reasons including the following a A view is comprised of a join of one or more tables a A view is based on a single table using a GROUP BY clause 8 26 Oracle Ultra Search User s Guide Sources Page T
115. arch repository to find all Oracle Ultra Search instances with the installation that the administration user has privileges to administer The administration user chooses the Oracle Ultra Search instance from the list See Also Oracle Internet Directory Administrator s Guide Single Sign On Authentication The Oracle Ultra Search administration tool supports three modes of logging on depending on the type of user You can log on as A single sign on SSO user managed in the Oracle Internet Directory OID and authenticated with the SSO server A local database schema user in the Oracle Ultra Search database non SSO mode A Portal user An Enterprise Manager user Note Single Sign On SSO is available only with the Oracle Identity Management infrastructure See Also Logging On to Oracle Ultra Search on page 8 3 Query Syntax Expansion Oracle Ultra Search translates each user query into a database query This process is called query syntax expansion The expansion logic determines relevancy recall of the search results The Oracle Ultra Search default expansion boosts the relevancy of those documents that matches the user s query as a part of their title The query syntax expansion can be customized with the query API See Also Customizing the Query Syntax Expansion on page 9 3 Introduction to Oracle Ultra Search 1 13 Oracle Ultra Search System Configuration Oracle Ultra Search System Configuration
116. as defined in the wk0pref sq l file After the instance is created the lexer can no longer be changed Stoplist Specify the name of a stoplist you want to use during indexing The default stoplist is wksys wk_stoplist as defined in the wk0pref sql file Try to avoid modifying the stoplist after the instance has been created a Storage Specify the name of the storage preference for the index of your instance The default storage preference is wksys wk_storage as defined in the wk0pref sq l file After the instance is created the storage preference cannot be changed See Also a Oracle Text Reference for more information on these creating and modifying lexers stoplists and storage Managing Stoplists on page 4 7 Creating a Snapshot Instance A snapshot instance is a copy of another instance Unlike a regular instance a snapshot instance is read only it does not synchronize its index to the search domain After the master instance re synchronizes to the search domain the snapshot instance becomes out of date At that point you should delete the snapshot and create a new one 8 8 Oracle Ultra Search User s Guide Instances Page Note The snapshot and its master instance cannot reside on the same database A snapshot instance is useful for the following purposes Query Processing Two Oracle Ultra Search instances can answer queries about the same search domain Therefore in a set amount of time
117. ase SID WK_TABLESPACE tablespace for Oracle Ultra Search WK_TEMPTABLESPACE temporary tablespace CONN_STRING database connect string 4 12 Oracle Ultra Search User s Guide Upgrading Oracle Ultra Search ORACLE_HOME the path of Oracle home JAVA_EXE_PATH Java executable file path PATH_SEPARATOR Java classpath separator use for UNIX or for Windows The sixth step requires the system administrator to re activate all crawling schedules through the Oracle Ultra Search administration tool Oracle Ultra Search Extract Transform Load Migration Extract transform load ETL migration extracts the useful subset of configuration data from the source installation transforms necessary data and loads or merges this data into a new installation of Oracle Ultra Search This approach might require more disk space but it offers the following benefits a No destabilization of the source installation a Stability of target installation a No installer integration requirement With the ETL approach data migration involves the following five steps 1 Install the new system for example 9 0 3 ina new ORACLE_HOME 2 Re create user instance schemas and related database objects 3 Re create user instances 4 Restore data 5 Rebuild index The first two steps in the ETL approach must be done manually a Install Oracle Ultra Search 9 0 3 in a separate ORAC
118. ation for Oracle Database Real Application Clusters Tuning and Performance 5 13 Oracle Ultra Search on Real Application Clusters Configuring Storage Access The disk of any node in a Real Application Clusters system can be shared cluster file system or not shared raw disk For Real Application Clusters on a cluster file system CFS the cache files generated by the crawler on any node are visible to any Oracle instance and can be indexed by any Oracle instance that performs index synchronization If the disk is not shared then the crawler must run on one particular Oracle instance to ensure that all cache files can be indexed This is due to the nature of Oracle Text indexing where rows inserted into one table by different sessions go to the same pending queue and whoever initiates index synchronization attempts to index all of the inserted rows Because of this limitation on a CFS Oracle Ultra Search is configured to launch the crawler on any database instance If it is not on a CFS then Oracle Ultra Search launches the crawler on the database instance where INSTANCE_NUMBER 1 The Oracle Ultra Search administrator can configure which instance runs the crawler with the following PL SQL API WK_ADM SET_LAUNCH_INSTANCE instance_name connect_url where instance_name is the name of the launching instance or the database name if it is to be launched on any node and connect_ur1 is the connect descriptor For c
119. ation schedules in every Oracle Ultra Search instance 3 Launch Oracle Collaboration Suite release1 installer and perform the infrastructure install 4 Specify the directory of the Oracle Application Server 9 0 2 infrastructure as the Oracle home 5 The Oracle Universal Installer then detects a previously installed database and automatically upgrades the infrastructure database and the Oracle Ultra Search backend Upgrade from Oracle Ultra Search 9 2 to 9 0 3 Because Oracle Ultra Search 9 2 uses the same database schema as Oracle Ultra Search 9 0 2 the upgrade procedure is the same See Also Upgrade from Oracle Ultra Search 9 0 2 to 9 0 3 on page 4 14 Configuring the Query Application The Oracle Ultra Search query application is deployed automatically with the Oracle Ultra Search installation However because Oracle Ultra Search allows multiple instances using different schema users the query application is not configured for how to connect to the database automatically Database connection is configured by creating a data source in OC4J not to be confused with an Oracle Ultra Search data source This is done by editing the data sources xml file Step 1 Edit the data sources xml File The data sources xml file is the OC4J connection management facility The Oracle Ultra Search query application uses OC4J to connect to the database This is different from the administration tool because the query user is not a database u
120. base Concepts a Oracle Database Administrator s Guide a Oracle Database Performance Tuning Guide Oracle Enterprise Manager Concepts Many books in the documentation set use the sample schemas of the seed database which is installed by default when you install Oracle Database Refer to Oracle Database Sample Schemas for information on how these schemas were created and how you can use them yourself Printed documentation is available for sale in the Oracle Store at http oraclestore oracle com xix To download free release notes installation documentation white papers or other collateral please visit the Oracle Technology Network OTN You must register online before using OTN registration is free and can be done at http otn oracle com membership If you already have a user name and password for OTN then you can go directly to the documentation section of the OTN Web site at http otn oracle com docs index htm To access the database documentation search engine directly please visit http tahiti oracle com Conventions This section describes the conventions used in the text and code examples of this documentation set It describes Conventions in Text a Conventions in Code Examples a Conventions for Windows Operating Systems Conventions in Text We use various conventions in text to help you more quickly identify special terms The following table describes those conventions and provides examples of t
121. boundary rules checking ensures that submitted URLs comply with the URL boundary rules of the Web data source You can allow or disallow URL boundary rules checking Relevancy Boosting Relevancy boosting lets administrators override the search results and influence the order that documents are ranked in the query result list This can be used to promote important documents to higher scores It also makes them easier to find See Also Document Relevancy Boosting on page 1 11 There are two methods for locating URLs for relevancy boosting locate by search or manual URL entry Locate by Search To boost a URL first locate a URL by performing a search You can specify a host name to narrow the search After you have located the URL click Information to edit the query string and score for the document Manual URL Entry If a document has not been crawled or indexed then it cannot be found in a search However you can provide a URL and enter the relevancy boosting information with it To do so click Create and enter the following 1 Specify the document URL You must assign the URL to a data source This document is indexed the next time it is crawled 2 Enter scores in the range of 1 to 100 for one or more query strings When a user performs a search using the exact query string the score applies for this URL 8 40 Oracle Ultra Search User s Guide Queries Page The document is searchable after the document is loaded for
122. bute whose LOV is being fetched in this number date LOV This is needed because attribute name does not uniquely identify an attribute in the instance Each occurrence of the lt fetchAttribute gt adds to the list of attributes passed to the getResult invoked by the lt getResult gt tag The following example shows the same search in lt getResult gt tag but fetching title and publication date attributes of each book lt US getResult resultId searchresult instance mybookstore query queryLocale document Language from 1 to 20 gt lt US fetchAttribute attributeName title attributeType string gt lt US fetchAttribute attributeName publication date attributeType date gt lt US getResult gt lt showHitCount gt Tag Show Estimated Hit Count After the search is performed the result must be rendered If withCount true is in the lt US getResult gt tag then the result contains a count of total hits and lt showHitCount gt tag can be used to display it Attribute Name Description result name This refers to the resultId specified in the lt US getResult gt tag This tag outputs the hit count to the page The following shows the hit count of the a search result lt US showHitCount result searchresult gt Oracle Ultra Search Developer s Guide and API Reference 9 17 Oracle Ultra Search Query Tag Library lt iterResult gt Tag Render the Results This tag i
123. c but you can specify more than one path rule for each host For example on the same host you can include path files host doc and exclude path files host doc unwanted With these mechanisms only URL links that meet the filtering criteria are processed However there are other criteria that users might want to use to filter URL links For example a Allow URLs with certain file name extensions a Allow URLs only from a particular port number a Disallow any PDF file if it is from a particular directory The possible criteria could be very large which is why it is delegated to a user implemented module that can be used by the crawler when evaluating an extracted URL link URL Link Rewriting For some applications due to security reasons the URL crawled is different from the one seen by the end user For example crawling is done on an internal Web site behind a firewall without security checking but when queried by an end user a corresponding mirror URL outside the firewall must be used A display URL is a URL string used for search hit display This is the URL used when users click the search hit link An access URL is a URL string used by the crawler for crawling and indexing An access URL is optional If it does not exist then the crawler uses the display URL for crawling and indexing If it does exist then it is used by the crawler instead of the display URL for crawling 9 30 Oracle Ultra Search User s Guide Oracl
124. c4jMount ultrasearch admin_sso OC4J_Portal Oc4jMount ultrasearch admin_sso OC4J_Portal Oc4jMount ultrasearch admin oc4J_Portal Oc4jMount ultrasearch admin OC4J_Portal 3 Oracle Ultra Search sample pages require JDBC connections to the database as the instance owner Due to JServ limitations in the Oracle9i release the user name password and connection string used to create the JDBC connection are hard coded inside the sample JSP code To configure the JSP to query a specific instance edit the JSP source code and replace the user name password and connection string values All sample JSP source code is in the OC4J applications directory The following files contain user name password and connection string values 9i gsearch jsp 9i display jsp 9i gsearchf jsp 9i gutil jsp 9i mail jsp Note The Oracle9i JSP files are being deprecated It is not necessary to configure them if you do not plan to use them Configuring the Administration Tool with Single Sign On Server Note Single sign on is available only with the Oracle Identity Management infrastructure Installing and Configuring Oracle Ultra Search 3 17 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts To configure the Oracle Ultra Search administration tool with the Oracle Single Sign On SSO server you must also follow these steps in addition to the configuration in Configuring the Middle Tier with Oracle HTTP Serv
125. cation Server installation then you must perform the following steps manually for remote crawling a Locate SREMOTE_ORACLE_ HOME ultrasearch tools remotecrawler scripts unix define_ env ona UNIX system or REMOTE_ORACLE_ HOME ultrasearch tools remotecrawler scripts winnt define_ env bat on a Windows system a Replace 3ORACLE_HOME with the value of the REMOTE_ORACLE_HOME environment variable a Replace s_jreLocation with the directory path of a Java runtime environment JRE version 1 2 2 and higher You should specify the root directory of the JRE 3 26 Oracle Ultra Search User s Guide Installing the Backend on Remote Crawler Hosts a Replace s_jreJDBCclassfile with the full path and file name of the Oracle JDBC Thin driver version 12 Configuring the Remote Crawler The remote crawler requires a communication channel between the backend database and the remote crawler host There are two modes of communication RMI and JDBC Configuration of the remote crawler differs depending on which mechanism you use The primary difference is that the JOBC based mechanism requires you to supply a database user or role during the registration process See Also Using the Remote Crawler on page 5 6 more information on the RMI and JDBC mechanisms The registration process is done by running a SQL script on the Oracle Ultra Search remote crawler host The SQL script connects over
126. ccount and set the password to be WK_TEST The password expires after the installation If the password is changed to anything other than WK_TEST then you must also update the cached schema password using the administration tool Edit Instance page after you change the password in the database WKSYS This is a database super user WKSYS can grant super user privileges to other users such as WK_TEST All Oracle Ultra Search database objects are installed in the WKSYS schema Note The WKUSER role is required to host instances Oracle Ultra Search Admin Privilege Model in the Hosted Environment In a hosted environment one enterprise for example an application service provider makes Oracle Ultra Search available to other enterprises and stores information for them The enterprise performing the hosting is called the default subscriber and the enterprises that are hosted are called subscribers 6 4 Oracle Ultra Search User s Guide About Oracle Ultra Search Security Note This is available with the Oracle Application Server release and the Oracle Collaboration Suite release This is not available with the Oracle Database release The default subscriber and its search base are specified in the following attributes of the Oracle Internet Directory entry cn Common cn Products cn OracleContext a orclDefaultSubscriber m orclSubscriberSearchBase In a non hosted environment in whic
127. ce The option to choose is only available if the instance is security enabled Oracle Sources You can create edit or delete Oracle sources You can choose federated or Oracle Application Server Portal crawlable data sources A federated source is a repository that maintains its own index Oracle Ultra Search can issue a query and the repository can return query results Oracle Ultra Search also supports the crawling and indexing of Oracle Application Server Portal installations This enables searching across multiple portal installations Oracle Portal Sources Oracle Ultra Search can only crawl public Oracle AS Portal sources See the Oracle Application Server Portal Configuration Guide for how to set up public pages To create Portal sources you must first register your portal with Oracle Ultra Search To register your portal 1 Provide a name and portal URL base The portal name is used to identify this portal entry in the Oracle Portal List page The URL base is the beginning portion of the portal homepage This include host name port number and DAD After it is created the portal URL base is not updatable Click Register Understanding the Oracle Ultra Search Administration Tool 8 29 Sources Page Portal Oracle Ultra Search attempts to contact the Oracle Application Server Portal instance and retrieve information about it 2 Choose one or more page groups for indexing A portal data source is created for each page g
128. chPooledDS username wk_test password wk_test url jdbc oracle thin localhost 1521 isearch min connections 3 max connections 30 inactivity timeout 30 gt lt property name cacheScheme value 1 gt lt data source gt 3 22 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Note The URL of the JDBC data source can be provided in the form of jdbc oracle thin hostname port sidor in the form of a TNS keyword value syntax such as Jdbcioracle thin DESCRIPTION BALANCE yes ADDRESS_LIST ADDRESS PROTOCOL TCP HOST cls02a PORT 3999 ADDRESS PROTOCOL TCP HOST cls02b PORT 3999 CONNECT_DATA SERVICI NAME acme us com LOAD S S Eal There are three types of caching schemes T a DYNAMIC_SCHEME 1 a FIXED WAIT SCHEME 2 a FIXED _RETURN_NULL_SCHEME 3 See Also Oracle Application Server Containers for J2EE Security Guide Oracle Database The data sources xml file is located in the ORACLI HOME oc43 3j2ee 0C4J_SEARCH config directory GI Editing the ultrasearch properties File The ORACLE_ HOME ultrasearch webapp config ultrasearch properties file contains configuration information used by Oracle Ultra Search middle tier You do not need to edit this file because it is automatically configured by the Oracle installer GI H
129. cle Ultra Search is installed as part of the Oracle Database Server installation which uses Oracle Universal Installer Using the Oracle Universal Installer Insert the Oracle Installation CD and start the Oracle Universal Installer Follow the installation wizard instructions to perform an Oracle Database Server install For more information please refer to Chapter 3 Installing and Configuring Oracle Ultra Search Along with the Oracle Database Server the Oracle Universal Installer also installs the Oracle Ultra Search backend and the Oracle Ultra Search middle tier Note if you have not already done so unlock the Oracle Ultra Search schema user 2 2 Oracle Ultra Search User s Guide Installation 1 Log in to the database as a DBA user for example as sys 2 Unlock the Oracle Ultra Search schema wksys and set its password alter user wksys account unlock identified by desired_password 3 Unlock the Oracle Ultra Search wk_test schema Its password is wk_test alter user wk_test account unlock identified by wk_test Accessing the Ultra Search Administration Application You must have an Oracle Ultra Search instance During installation an instance was already created for you so you need only configure it by following the directions in Configuring the Default Oracle Ultra Search Instance on page 3 10 Upon installation the default Oracle Ultra Search instance wk_inst is available it is built on the default w
130. cle database and retrieves the contents of a table for the crawler to collect and index The sample agents are fully functional and can be customized to adapt to other database based data sources These agents performs the following tasks a Read data source parameters Connect to the database that contains the data source a Initialize fetching document URL and attributes from the data source a Fetch document URL and attributes from the data source a Disconnect from the data source Crawler Agent Overview A crawler agent does the following a Authenticates the crawler for accessing the data source Oracle Ultra Search Developer s Guide and API Reference 9 19 Oracle Ultra Search Crawler Agent API a Provides access to the data source document through a HTTP URL display URL a Provides the metadata of the document in the form of document attributes Maps each document attribute to a common attribute name used by end users a Provides a flattened view of the data source such that documents are retrieved one by one in a streaming fashion a Instructs the crawler to parse the URL document for standard metadata like author and title if necessary a Optionally provides the list of URLs that have changed since a given time stamp a Optionally provides an access URL in addition to the display URL for the processing of the document From the crawler s perspective the agent retrieves the list of URLs from the target data sou
131. connect string must be to the current node a dbc_all same as JDBC_NODE but in case of RAC with CFS true this JDBC string should include all the RAC nodes hint use TNS syntax Installing and Configuring Oracle Ultra Search 3 5 Installing the Oracle Ultra Search Backend See Also Changing Oracle Ultra Search Schema Passwords on page 4 2 for information on changing the wk sys password See your installation guide for information on setting environment variables Post Installation Tasks for the Oracle Ultra Search Backend This section covers Oracle Ultra Search backend post installation tasks Enabling Oracle Ultra Search to Process Binary Files The Oracle Ultra Search crawler uses the Oracle Text INSO filter ct xhx for processing of binary files These are non text non HTML files like PDF files Microsoft Word files and so on For Oracle Ultra Search to be able to use the INSO filter the shared library path environment variable must contain the SORACLE_ HOME ctx 1ib path At installation the Oracle Installer automatically sets the variable to include SORACLE_HOME ctx 1lib However if after the installation you restart the database then you must manually set your shared library path environment variable to include ORACLE_HOME ctx 1ib before starting the Oracle process You must restart the database to pick up the new value for filtering to work For example on UNIX set the LD_LIBRARY_P
132. crawling with one of the following methods 1 Add stopwords to the instance stoplist Choosing to add stopwords to the instance stoplist does not affect any documents already crawled or indexed This operation is not an expensive operation For example to add the stopword web to the instance stoplist log on as the owner of the instance in SQL Plus and run the following statement ALTER INDEX wk doc_path_idx rebuild parameters add stopword web 2 Replace the instance stoplist after initial crawling Defining a new stoplist and replacing the instance stoplist with it invalidates the entire index If you choose this method you must force the Oracle Ultra Search crawler to recraw all documents in the index To do this click Process All Documents in the Edit Schedule page This is a very expensive operation Therefore this option should be the last resort Upgrading Oracle Ultra Search Oracle Ultra Search is shipped with the Oracle Database the Oracle Application Server and the Oracle Collaboration Suite To upgrade Oracle Ultra Search from a previous release to the most recent release you must apply different procedures based on the product you are using This section contains the following topics a Pre Upgrade Steps Upgrading Oracle Ultra Search Shipped with Oracle Database Upgrading Oracle Ultra Search Shipped with Oracle Application Server Upgrading Oracle Ultra Search Shipped with Oracle Collaboration Suite
133. ctly or extend if necessary The class is DocAttributes with a constructor that has no argument The agent might decide to create a pool of UrlData objects and cycle through them during crawling In the most simple implementation the agent creates one DocAttributes object repeatedly resets and populates the data and returns this object LovInfo The crawler agent uses this interface to submit attribute LOV definitions DataSourceParams The crawler agent uses this interface to read and write data source parameters AgentException The crawler agent uses this exception class when an error occurs CrawlerAgent This interface lets the crawler communicate with the user defined data source The crawler agent must implement this interface Sample Agent Files The sample agent files are located in the ORACLE_ HOME ultrasearch extension directory You can view the sample agent source code using your preferred text editor There is a SampleAgent_readme htm file and a SampleAgent java file These are for the sample crawler agent implementation using agent APIs Setting up the Sample Crawler Agent This section describes how to set up the sample crawler agent Compiling and Building the Agent Jar File The Java source code for the sample agent first must be compiled into class files and put into a jar file in the SORACLE_HOME ultrasearch lib agent directory where SORACLE_HOME is the Oracle home directory where the Oracle Ultra Sea
134. d Note The crawler cannot use a proxy server that requires proxy authentication You can also set domain exceptions Use this page to enter authentication information global to all data sources Note The data source specific authentication take precedence over this global authentication HTTP Authentication Specify the user name and password for the host and realm for which HTTP authentication is required Oracle Ultra Search supports both basic and digest authentication HTML Forms Register HTML forms that you want the Oracle Ultra Search crawler to automatically fill out during Web crawling HTML form support requires that HTTP cookie functionality is enabled You can register HTML forms manually or with the form registration wizard If the HTML form contains JavaScript then the wizard might fail and you will need to use manual registration 8 18 Oracle Ultra Search User s Guide Attributes Page Note The Oracle Ultra Search crawler will choose the form to use based on the form s URL and the form name URL parameters are not included during matching thus they are truncated during form registration Attributes Page When your indexed documents contain metadata such as author and date information you can let users refine their searches based on this information For example users can search for all documents where the author attribute has a certain value The list of values
135. d An Oracle Ultra Search instance must know the password of the database user in which it resides The instance cannot get this information directly from the database During instance creation Oracle provides the database user password and the instance caches this information Understanding the Oracle Ultra Search Administration Tool 8 11 Crawler Page If this database user password changes then the password that the instance has cached must be updated To do this enter the new password and click Apply After the new password is verified against the database it replaces the cached password Crawler Page The Oracle Ultra Search crawler is a Java application that spawns threads to crawl defined data sources such as Web sites database tables or email archives Crawling occurs at regularly scheduled intervals as defined in the Schedules Page With this page you can do the following Configure the Settings Crawler Threads Specify the number of crawler threads to be spawned at run time Number of Processors Specify the number of central processing units CPUs that exist on the server where the Oracle Ultra Search crawler will run This setting determines the optimal number of document conversion threads used by the system A document conversion thread converts multiformat documents into HTML documents for proper indexing Automatic Language Detection Not all documents retrieved by the Oracle Ultra Search crawler specify
136. dards naming_ convention html nsdnv 14z1 http itweb oraclecorp com aboutit network npe standards naming_ convention html nsdnv 14 The question mark in the URL indicates that the rest of the strings are input parameters The duplicate hits are essentially the same page with different side menu expansion Ideally the same query should yield only one hit http itweb oraclecorp com aboutit network npe standards naming_ convention html Dynamic page index control applies to the whole data source So if a Web site has both kinds of dynamic pages you need to define them separately as two data sources in order to control the indexing of those dynamic pages See Also a Oracle Ultra Search URL Rewriter API on page 9 29 a Using Crawler Agents on page 7 3 a Crawler Page on page 8 12 for information on default languages Table Sources A table source represents content in a database table or view The database table or view can reside in the Oracle Ultra Search database instance or in a remote database Oracle Ultra Search accesses remote databases using database links 8 24 Oracle Ultra Search User s Guide Sources Page See Also Limitations With Database Links on page 8 26 Creating Table Sources To create a table source click Create Table Source and follow these steps 1 Specify a table source name and the name of the database link schema and table Click Locate Table Specify settings for your tabl
137. database management system SQL SQL Plus and PL SQL Organization This document contains xvii xviii What s New in Oracle Ultra Search This section describes new features and provides pointers to additional information Chapter 1 Introduction to Oracle Ultra Search This chapter provides an overview of Oracle Ultra Search and describes the system configuration Chapter 2 Getting Started with Oracle Ultra Search This chapter provides an example scenario that shows installation and use of Oracle Ultra Search Chapier 3 Installing and Configuring Oracle Ultra Search This chapter describes how to install and configure Oracle Ultra Search Chapter 4 Post Installation Information This chapter provides post installation information such as how to configure the Oracle Database server for Oracle Ultra Search and how to manage stoplists It also describes how to upgrade to the most recent Oracle Ultra Search release Chapter 5 Tuning and Performance This chapter describes various ways to tune Oracle Ultra Search and improve performance These include tuning the Web crawling process tuning query performance using the remote crawler using Oracle Ultra Search on Real Application Clusters and table data source synchronization Chapter 6 Security in Oracle Ultra Search This chapter describes the architecture and configuration of security for Oracle Ultra Search Chapter 7 Understanding the Oracle Ultra S
138. database user to connect password database user password rw wait time in seconds between attempts to re establish JDBC connections a ra maximum number of attempts to re establish JDBC connections a kw wait time in milliseconds between keep alive signals You must edit the contents of the runall_jdbc script and specify the values for each parameter before running it 6 Launch the remote crawler from the administration tool and verify that it is running The state of the schedule is listed in the Schedules page The remote crawler launching process takes up to 90 seconds to change state from LAUNCHING to FAILED if failure occurs To view the schedule status click the crawler status in the schedules list To view more details especially in the event of failure click the schedule status itself This brings up a detailed schedule status The RMI based remote crawler fails to launch if any one of the following requirements are not met a The RMI registry is not running and listening on the port specified at installation 5 12 Oracle Ultra Search User s Guide Oracle Ultra Search on Real Application Clusters a The RMI daemon is not running and listening on the port specified at installation The necessary Java objects have not been successfully registered with each RMI registry The JDOBC based remote crawler fails to launch if any one of the following requirements are not met a The JDBC launch
139. ding the Oracle Ultra Search Administration Tool 8 33 Schedules Page Schedules Page Use this page to schedule data synchronization and index optimization Data synchronization means keeping the Oracle Ultra Search index up to date with all data sources Index optimization means keeping the updated index optimized for best query performance See Also Synchronizing Data Sources on page 7 3 Data Synchronization The tables on this page display information about synchronization schedules A synchronization schedule has one or more data sources assigned to it The synchronization schedule frequency specifies when the assigned data sources should be synchronized Schedules are sorted first by name Within a synchronization schedule individual data sources are listed and can be sorted by source name or source type Creating Synchronization Schedules To create a new schedule click Create New Schedule and follow these steps 1 Name the schedule 2 Pick a schedule frequency and determine whether the schedule should automatically accept all URLs for indexing or examine URLs before indexing For initial planning purposes you might want the crawler to collect URLs without indexing After crawling is done you can examine document URLs and status remove unwanted documents and start indexing You can also associate the schedule with a remote crawler profile You can set the frequency to Manual Launch In this case the interval remains
140. dresses heading enter the complete URL of the location of the Ultra Appliance intranet Web site demo http otn oracle com products ultrasearch gettingstarted Click the Add button to add the Ultra Appliance to the list of Web addresses Click Next Create Web Source Step 2 URL Boundary Rules Accept the default values and click Next Create Web Source Step 3 Document Types Specify the types of document you would like Oracle Ultra Search crawler to crawl Under the Document Types header select HTML Microsoft Word Document PDF Document After you make each selection click the gt gt button to add the document types to the list of document types for crawling Click Next Getting Started with Oracle Ultra Search 2 9 Crawl and Index Ultra Appliance s Intranet Documents d Create Web Source Step 4 Accept the default values and click Finish The Ultra Appliance Web site demo should be added to the Web Source List 6 Schedule the Ultra Search Crawler Select the Schedule tab and click the Create new schedule button a Create Schedule Step 1 of 3 screen In the Name field enter Ultra Appliance Click the Proceed to step 2 button b Create Schedule Step 2 of 3 screen Select Every 1 week s on Monday starting at 0100 hours Under the Indexing option heading select Automatically accept all URLS for indexing Under the Remote Crawler Profiles select database host from the drop down list
141. ds or tens of thousands of links referring to it This is an example of how URL looping can occur Monitor the crawler statistics in the Oracle Ultra Search administration tool to determine which URLs and Web servers are being crawled the most If you observe an inordinately large number of URL accesses to a particular site or URL then you might want to do one of the following a Exclude the Web Server This prevents the crawler from crawling any URLs at that host You cannot limit the exclusion to a specific port on a host a Reduce the Crawling Depth This limits the number of levels of referred links the crawler will follow If you are observing URL looping effects on a particular host then you should take a visual survey of the site to find out an estimate of the depth of the leaf pages at that site Leaf pages are pages that do not have any links to other pages As a general guideline add three to the leaf page depth and set the crawling depth to this value Be sure to restart the crawler after altering any parameters in the Crawler Page Your changes take effect only after restarting the crawler Tuning Query Performance This section contains suggestions on how to improve the performance of the Oracle Ultra Search query Query performance is generally affected by response time and throughput a Tune the DB_CACHE_SI1ZE initialization parameter The database buffer cache keeps frequently accessed data read from datafile
142. e 3 10 Monitoring Oracle Ultra Search Components with Oracle Enterprise Manager You can use Enterprise Manager s Grid Control to monitor Oracle Ultra Search components Using Grid Control you can set up notification rules to send out email notification automatically whenever a schedule status reaches certain severity states For more information on the using Grid Control to monitor Oracle Ultra Search components see the Oracle Enterprise Manager Concepts guide Crawler Recrawl Policy You can update the recrawl policy to process documents that have changed or to process all documents In previous releases process all documents did not help when the crawling scope had been narrowed For example if crawling depth was reduced from seven to five the PDF mimetype was deleted or a host inclusion rule was removed then you had to remove the affected documents manually in a SQL Plus session With this release all crawled URLs are subject to crawler setting enforcement not just newly crawled URLs See Also Editing Synchronization Schedules on page 8 35 Federated Search Traditionally Oracle Ultra Search used centralized search to gather data on a regular basis and update one index that cataloged all searchable data This provided fast searching but it required that the data source to be crawlable before it could be searched Oracle Ultra Search now also provides federated search which allows multiple indexes to perform a single sea
143. e J2EE 1 2 standard You should not have to change this file to deploy it The following is the file structure of sample ear Extract the archived file by running the following command jar tf ultrasearch_query ear META INF application xml META INF orion application xml query war welcome war Installing and Configuring Oracle Ultra Search 3 19 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts rewriter SampleRewriter java agent SampleAgent java agent README html All the query JSP pages are contained in query war This file is a servlet 2 2 compliant Web application Deploy it alone with any servlet 2 2 engine The context root for query war is ultrasearch query It is defined in the META INF application xml of the sample ear file You can change it by editing this file The following are the Java libraries needed for Oracle Ultra Search sample query application SORACLE_HOME ultrasearch webapp config SORACLE_HOME jdbc lib classes12 jar SORACLE_HOME jdbc lib nls_charset12 zip SORACLE_HOME ldap jlib ldapclnt9 jar SORACLE_HOME 1lib xmlparserv2 jar SORACLE_HOME 1lib activation jar SORACLE_HOME lib mail jar Oracle Ultra Search query applications also use the connection pooling functionality of J2EE container You must define a container authenticated data source This data source must return an Oracle connection Oracle recommends using the Java class equal to oracle jdbc pool OracleConnectio
144. e J2EE compliant Web applications Oracle Ultra Search Query API Oracle Ultra Search provides a Java API for querying indexed data The API methods retrieve and display query results Because it is written in Java it is compatible with a large spectrum of Web application servers that support any Java based technology such as JSP version 1 1 and higher The API uses JDBC connection pooling for scalability The Java API does not impose any HTML rendering elements The application can completely customize the HTML interface For example a Basic search form a Advanced search form Query result display a Help page a Feedback page a Register URL You embed Oracle Ultra Search query functionality in your Web application with the supplied Oracle Ultra Search Java query API The API supports two methods a Methods that retrieve query result data only a Methods that retrieve HTML code containing query result data 9 2 Oracle Ultra Search User s Guide Customizing the Query Syntax Expansion The data only methods do not return any HTML and can be used when you require full control over the HTML code to be rendered The methods that retrieve HTML code support features such as allowing you to embed query input boxes and result lists in your Web application Some features of the Oracle Ultra Search Java query API include the following a Lets you retrieve query results a Lets you set query properties such as the total number of h
145. e Ultra Search URL Rewriter AP For regular Web crawling there are only display URLs available But in some situations the crawler needs an access URL for crawling the internal site while keeping a display URL for the external use For every internal URL there is an external mirrored one For example http www acme ga us com 9393 index html http www acme com index html When the URL link http www acme qa us com 9393 index html is extracted and before it is inserted into the queue the crawler generates a new display URL and a new access URL for it Access URL http www acme ga us com 9393 index html Display URL http www acme com index html The extracted URL link is rewritten and the crawler crawls the internal Web site without exposing it to the end user Another example is when the links that the crawler picks up are generated dynamically and can be different depending on referencing page or other factor even though they all point to the same page For example http compete3 acme com rt rt wwv_media show p_type text p_id 4424 p_ currcornerid 281 amp p_textid 4423 amp p_language us http compete3 acme com rt rt wwv_media show p_type text p_id 4424 p_ currcornerid 498 amp p_textid 4423 amp p_language us Because the crawler detects different URLs with the same contents only when there is sufficient number of duplication the URL queue could grow to a huge number of URLs causing excessive URL link
146. e based Web application Oracle Ultra Search lets you specify a URL to display the data retrieved on a browser rendered by a screen of a Web application corresponding to the data in the database tables The URL points to a screen in the Web application corresponding to the data in the database This is available for table data sources file data sources and user defined data sources See Also Using Crawler Agents on page 7 3 Document and Search Atiributes Document attributes or metadata describe the properties of a document Each data source has its own set of document attributes The value is retrieved during the crawling process and then mapped to one of the search attributes and stored and indexed in the database This lets you query documents based on their attributes Document attributes in different data sources can be mapped to the same search attribute Therefore you can query documents from multiple data sources based on the same search attribute The list of values LOV for a search attribute can help you specify a search query If attribute LOV is available then the crawler registers the LOV definition which includes attribute value attribute value display name and its translation See Also Synchronizing Data Sources on page 7 3 Metadata Loader Oracle Ultra Search provides a command line tool to load metadata into an Oracle Ultra Search database If you have a large amount of data this is probably faster than usi
147. e object defined by the instance tag locale locale This determines the display name fetched using this tag Oracle Ultra Search Developer s Guide and API Reference 9 13 Oracle Ultra Search Query Tag Library This tag loops through all the search groups in the instance referred to by the instance tag attribute In each loop it defines a scripting variable named group which is an oracle ultrasearch query Group object It also defines a string variable named displayname which is the localized name of the group The following example shows all the groups in mybookstore instance using their English display names lt US iterGroups instance mybookstore locale lt Locale ENGLISH gt gt lt group gt displayname gt lt US iterGroups gt A lt iterLanguages gt Tag Show All Search Languages Similar to the showAttributes tag the showLanguages tag iterates through all the languages defined in an instance Because each language is defined by a java util Locale object their display names are not handled by Oracle Ultra Search Therefore this tag does not define the displayname scripting variable Attribute Name Description instance name This is a mandatory attribute to refer to the object defined by the instance tag This tag is an iteration tag It loops through all the search languages in the instance referred to by the instance tag attribute In each loop it defines a scripting variable n
148. e or more available sources and click gt gt After a data source has been assigned to a group it cannot be assigned to any other group To undo assignments of a data source select one or more scheduled sources and click lt lt Update crawler recrawl policy You can update the recraw policy to the following Process Documents That Have Changed This is maintenance crawling Only documents that have changed are recrawled and indexed For Web data sources if there are new links in the updated document then they are followed For file data sources new files are collected if its parent directory has changed Process All Documents The crawler recrawls the data source For example suppose you want to crawl only text and HTML on a Web site Later you also want to crawl Microsoft Word and Adobe PDF documents You must modify the document types for the source edit the schedule to select Process All Documents then rerun the schedule so that the crawler picks up PDF and doc document types for this data source The crawler treats every document as if it has been changed which means each document is fetched and processed again Understanding the Oracle Ultra Search Administration Tool 8 35 Schedules Page Upon relaunching the schedule the following rules determine which URLs will be recrawled Tf the previous crawl did not finish for example you stopped the crawling or the database tablespace was full then the crawle
149. e snapshot Administration PL SQL APIs 10 3 Instance Related APIs Example OUS_ADM CREATE_INSTANCE Scott instance scott tiger 10 4 Oracle Ultra Search User s Guide Instance Related AP Is DROP_INSTANCE Use this procedure to drop an Oracle Ultra Search instance Syntax OUS_ADM DROP_INSTANCE inst_name IN VARCHAR2 i inst_name The name of the instance to drop Example OUS_ADM DROP_INSTANCE Scott instance Administration PL SQL APIs 10 5 GRANT_ADMIN GRANT_ADMIN Syntax Example Use this procedure to grant instance administrator privileges to the specified user either for the current instance or all instances OUS_ADM GRANT_ADMIN user_name IN VARCHAR2 user_type IN NUMBER DEFAULT DB_USER scope IN NUMBER DEFAULT CURRENT_INSTANCE grant_option IN NUMBER DEFAULT NO_OPTION i user_name The name of the user to whom the administrator privilege should be assigned user_type The user type OUS_ADM DB_USER database user OUS_ADM LDAP_US lightweight SSO user GI g scope The scope of the granting CURRENT_INSTANCE or ALL_INSTANCE GI grant_options Options for granting privileges NO_OPTION or WITH_GRANT which allows the grantee to grant the privilege to other users OUS_ADM GRANT_ADMIN scott ous_adm DB_USER ous_adm ALL_INSTANCE 10 6 Oracle Ultra Search User s Guide Instance Related AP Is REVOKE_ADMIN Syntax Example Use this proced
150. e source such as the default language and the primary key column You can also specify the column where final content should be delivered and the type of data stored in that column for example HTML plain text or binary For information on default languages see Crawler Page on page 8 12 Verify the information about your table source Decide whether or not to use the Oracle Ultra Search logging mechanism to optimize the crawling of table data sources When crawling is enabled only newly updated documents are revisited during the crawling process You can enable logging for Oracle tables enable logging for non Oracle tables or disable the logging mechanism If you enable logging then you are prompted to create a log table and log triggers Oracle SQL statements are provided for Oracle tables If you are using non Oracle tables then you must manually create a log table and log triggers Follow the examples provided to create the log table and log triggers After you have created the table enter the table name in Log Table Name Map table columns to search attributes Each table column can be mapped to exactly one search attribute This lets the search engine seamlessly search data from the table source Specify the display URL template or column for the table source This step is optional Oracle Ultra Search uses a default text viewer for table data sources If you specify display URL then Oracle Ultra Search uses the Web URL defined
151. earch Crawler and Data Sources This chapter explains how the crawler works It also describes crawler settings data sources document attributes data synchronization and the remote crawler Chapter 8 Understanding the Oracle Ultra Search Administration Tool This chapter describes how to use the Oracle Ultra Search administration tool to configure and schedule the Oracle Ultra Search crawler Chapter 9 Oracle Ultra Search Developer s Guide and API Reference This chapter explains the following Oracle Ultra Search APIs query API crawler agent API email API and URL rewriter API It also provides related API information such as details about the sample query applications the query tag library and query syntax expansion customization Chapter 10 Administration PL SQL APIs This chapter details some of Oracle Ultra Search s PL SQL APIs for administration including those for crawler configuration crawler scheduling and instance administration Appendix A Loading Metadata into Oracle Ultra Search This appendix describes the command line tool for loading metadata into an Oracle Ultra Search database Appendix B Altering the Crawler Java Classpath This appendix explains why and how to alter the crawler Java classpath Appendix C Oracle Ultra Search Views This appendix shows the various views available with Oracle Ultra Search Related Documentation For more information see these Oracle resources a Oracle Data
152. earch admin index jsp 2 Enter the User Name and Password to Login to Oracle Ultra Search For this example Login WK_TEST Password WK_TEST The login screen is displayed in Figure 2 1 Figure 2 1 Oracle Ultra Search Login Screen ORACLE E PRERE A Login to Oracle Ultra Search User Name xx_tesr Password feeca Login Help 3 Select an Oracle Ultra Search Instance screen Select the Instances tab on the browser view Select the WK_INST instance from the pull down menu and click Apply 4 Configure the Oracle Ultra Search crawler settings Getting Started with Oracle Ultra Search 2 7 Crawl and Index Ultra Appliance s Intranet Documents Select the Crawler tab on the browser view Use the following values for crawler settings Crawler Threads 20 Number of Processors 1 Automatic Language Detection No Default Language English Crawling Depth No Limit Crawler Timeout Threshold 30 Default Character Set ISO Latin 1 Temporary Directory Location and Size tmp 5 Crawler Logging tmp English Database Connect String Leave this field unchanged 2 8 Oracle Ultra Search User s Guide Crawl and Index Ultra Appliance s Intranet Documents 5 Create a Web Data Source Select the Sources tab on the browser view Under the Web Sources header click the Create web source button a Create Web Source Step 1 Enter Ultra Appliance as the Source name Under the Starting Ad
153. ecesescseecenssesesesnsnsnesesceeeesceesnanenens 1 12 Integration with Oracle Internet Directory ccccceseseesesesesceseseseecesesesesesnsenesesceeesesssesnanenens 1 12 Oracle Ultra Search Administration Groups in Oracle Internet Directory 1 12 Authorization of the Administration Privileges ccccseseseseseeceseteseeceneesesesneteneseees 1 12 Single Sign On Authentication senioren nii n e e eK e ST AES 1 13 Query Syntax EXpansio eese tainei ieina ienn enedes ar atie apab ere ten iaht insent kene tedes oihin te 1 13 Oracle Ultra Search System Configuration 0 cccc cc ccccceeeececseeeeeseececesenseseececesensnseeseeeses 1 14 Getting Started with Oracle Ultra Search OV ELVIS W E E E A A Sele vee ab T E beste Loli E Sova AE caida ela T E EA E E 2 1 Installat ome ienas E a E a eA n tae a EEOAE OE EAEE 2 2 Using the Oracle Universal Installer nieno ienie eR ESEE AERE Sa aE Ra 2 2 Accessing the Ultra Search Administration Application se sssssssssrtsssserttssssesttsnteestenteess 2 3 Setting up the Sample Query Application sssesssssssessersessessessestersissrsnentessesnesnentessennnsnnnteseess 2 4 Setting up the Ultra Appliance Demo ess ssssssssssssississessertissisnesntstensisnesntntentenesnentessennnsneneeneens 2 4 Crawl and Index Ultra Appliance s Intranet Documents e eeseseseeeseeiesesieririeriseesisesresesresess 2 6 Crawl and Index Ultra Appliance s Database Docume
154. ed emails The API is known as the Oracle Ultra Search Java Email API This API lets you retrieve information such as email header information email body content and attachments of an email Use this API to embed Oracle Ultra Search email browsing functionality into JavaServer Page JSP or servlet based Web applications Oracle Ultra Search ships a fully functional JSP Web application that directly uses this API to render indexed emails Because the source code is viewable you can use it as an example for building your own customized email browser JavaMail Implementation Oracle Ultra Search requires a JavaMail 1 1 compliant implementation The reference implementation by Sun Microsystems is JavaMail version 1 2 This reference implementation is shipped with Oracle Ultra Search Oracle Ultra Search Developer s Guide and API Reference 9 27 Oracle Ultra Search J ava Email API Java Email API The Oracle Ultra Search Java Email API is encapsulated in the oracle ultrasearch query package Sample Mailing List Browser Application Files The sample mailing list browser applications files are located in the SORACLE_ HOME ultrasearch sample query directory You can directly view the sample mailing list browser application source code using your preferred text editor The following tables describe all sample mailing list browser application files README file and stylesheets File Description SampleAgent_readme html Readme
155. ed more information If so where Are the examples correct Do you need more examples What features did you like most about this manual If you find any errors or have any other suggestions for improvement please indicate the title and part number of the documentation and the chapter section and page number if available You can send comments to us in the following ways Electronic mail infodev_us oracle com FAX 650 506 7227 Attn Server Technologies Documentation Manager Postal service Oracle Corporation Server Technologies Documentation 500 Oracle Parkway Mailstop 40p11 Redwood Shores CA 94065 USA If you would like a reply please give your name address telephone number and electronic mail address optional If you have problems with the software please contact your local Oracle Support Services XV xvi Audience Preface Oracle Ultra Search User s Guide describes how to configure and use Oracle Ultra Search and Ultra Search APIs This preface contains these topics a Audience Organization a Related Documentation a Conventions a Documentation Accessibility Oracle Ultra Search User s Guide is intended for database administrators and application developers who perform the following tasks Install and configure Oracle Ultra Search a Administer Oracle Ultra Search instances a Develop Oracle Ultra Search applications To use this document you should have experience with the Oracle
156. eed to be adjusted The Oracle Ultra Search administration tool is a J2EE 1 2 standard Web application Therefore it can be installed and run on a separate host from the Oracle Ultra Search backend However you might want to install and run this on the same host as the Oracle Ultra Search backend Regardless of your choice allocate enough memory for the J2EE engine Oracle recommends using the Oracle HTTP Server with the Oracle J2EE container Allocate enough memory for the HTTP Server as well as the JDK that runs the J2EE engine Sufficient Disk Space Because customer requirements vary widely Oracle cannot recommend a specific amount of disk space However as a general guideline the minimal requirements are as follows Approximately 3GB of disk space for the Oracle Application Server infrastructure or database and the Oracle Ultra Search backend 15MB of disk space for the Oracle Ultra Search middle tier on top of the Web server s disk requirements a For each remote crawler host the same amount of disk space as needed to install the Oracle Ultra Search backend a Disk space for a large TEMPORARY tablespace As a general guideline create a TEMPORARY tablespace as large as possible depending on the RAM on your host a Disk space for the Oracle Ultra Search instance user s tablespace a The Oracle Ultra Search instance user is a database user that you must explicitly create All data that is collected and processed as
157. eeeeees 3 25 Installing the Backend on Remote Crawler Hosts cccccssesescscsesesesescseneseeseseseseeseseees 3 26 Installing the Backend on Remote Crawler Hosts ccccscscsssssessesetesesceseseseecenenesesssnaneneneees 3 26 Configuring the Remote Crawler ccccccscccsscsesssieescesesesescenesesesesnsneneseseecesessseesanenesesesnsneneneses 3 27 Unregistering a Remote Crawlet ccccccccsccscscsesseseescscecesescenesesesesnsnensseseseesesescesanenesesesnseneneees 3 29 Configuring Oracle Ultra Search in a Hosted Environment 0 0 0 0 c ccc eneeeeee 3 29 Preconfiguration Tasks for a Hosted Environment cccccsssscsssssesesecsesssesesesssenseeseees 3 30 Configuring Oracle Ultra Search in the Subscriber Context ccccccccsseseeneneesesneneeneees 3 30 Post Installation Information Changing Oracle Ultra Search Schema Passwords 0 0 c ccccccesesessesssssseseesesesessessesesesesessesees 4 2 Configuring the Oracle Server for Oracle Ultra Search eee eee eeseseereeeae tees 4 2 Step 1 Tune the Oracle Database invsiiccsccisiscsstesteissscestsctsesdasteedsatbivbesssntucrsoeenptucstessuebnasbesdesteaades 4 2 vi Step 2 Create and Assign the Temporary Tablespace to the CTXSYS User ccccee 4 4 Step 3 Create a Large Tablespace for Each Oracle Ultra Search Instance User 4 4 Step 4 Create and Configure New Users for Oracle Ultra Search Instances ccce 4 5 Steps Alter the I
158. efault value This tag looks up the document attribute value and renders it on the page If the attribute was not fetched as part of the search result then nothing is output to the page 9 18 Oracle Ultra Search User s Guide Oracle Ultra Search Crawler Agent API The following example shows the title and publication dates of all documents in a search result lt US iterResult result searchresult instance mybookstore gt lt US showAttributeValue attributeName title attributeType string default No Title gt lt US showAttributeValue attributeName publication date attributeType date gt lt US iterResult gt Oracle Ultra Search Crawler Agent API You can implement a crawler agent to crawl and index a proprietary document repository such as Lotus Notes or Documentum In Oracle Ultra Search the proprietary repository is called a user defined data source The module that enables the crawler to access the data source is called a crawler agent The agent collects document URLs and associated metadata from the user defined data source and returns the information to the Oracle Ultra Search crawler which enqueues it for later crawling The crawler agent must be implemented in Java using the Oracle Ultra Search crawler agent API Oracle Ultra Search provides a sample implementation of user defined crawler agents using the Oracle Ultra Search agent API Upon invocation this sample agent connects to a specified Ora
159. eir searches Searches can be limited to document attributes and data groups Attributes Search attributes can be mapped to HTML metatags table columns document attributes and email headers Some attributes such as author and description are predefined and need no configuration However you can customize your own attributes To set custom search attributes to expose to the query user use the Attributes Page Data Groups Data source groups are logical entities exposed to the search engine user When entering a query the search engine user is asked to select one or more data groups to search from A data group consists of one or more data sources To define data groups use the Queries Page Online Help in Different Languages Oracle Ultra Search provides context sensitive online help which can be viewed in different languages You can change the language preferences in the Users Page Logging On to Oracle Ultra Search The following users can log on to the Oracle Ultra Search administration tool a Single Sign on SSO users These users are managed by the Oracle Internet Directory OID and are authenticated by the SSO server The Oracle Ultra Search administration tool identifies all Oracle Ultra Search instances to which the SSO user has access This is available only if you have the Oracle Identity Management infrastructure installed a Database users non SSO These users exist in the database on which Oracle Ultra Search
160. election of metadata returned in query result See Also a Oracle Ultra Search Query API on page 9 2 a Oracle Ultra Search Java API Reference The URL rewriter is a user supplied Java module for implementing the Oracle Ultra Search UrlRewriter interface It is used by the crawler to filter or rewrite extracted URL links before they are put into the URL queue URL filtering removes unwanted links and ULR rewriting transforms the URL link This transformation is necessary when access URLs are used See Also a Web Sources on page 8 21 a Oracle Ultra Search URL Rewriter API on page 9 29 a Oracle Ultra Search Java API Reference Robots Exclusions Robots exclusion lets you control which parts of your sites can be visited by robots If robots exclusion is enabled default then the Web crawler traverses the pages based on the access policy specified in the Web server robots txt file For example when a robot visits htt p www foobar com it checks for http www foobar com robots txt If it finds it the crawler analyzes its Introduction to Oracle Ultra Search 1 9 Oracle Ultra Search Features contents to see if it is allowed to retrieve the document If you own the Web sites then you can disable robots exclusions However when crawling other Web sites you should always comply with robots txt by enabling robots exclusion See Also Web Sources on page 8 21 Display URL Support When gathering information from a databas
161. ents a choice of two or more options within brackets or braces Enter one of the options Do not enter the vertical bar Horizontal ellipsis points indicate either a That we have omitted parts of the code that are not directly related to the example a That you can repeat a portion of the code Vertical ellipsis points indicate that we have omitted several lines of code not directly related to the example You must enter symbols other than brackets braces vertical bars and ellipsis points as shown Italicized text indicates placeholders or variables for which you must supply particular values Uppercase typeface indicates elements supplied by the system We show these terms in uppercase in order to distinguish them from terms you define Unless terms appear in brackets enter them in the order and with the spelling shown However because these terms are not case sensitive you can enter them in lowercase Brackets enclose one or more optional Example DECIMAL digits precision ENABLE DISABLE ENABLE DISABLE COMPRESS NOCOMPRESS CREATE TABLE AS subquery SELECT coll employees col2 coln FROM SQL gt SELECT NAME FROM VSDATAFILE NAME f s1 dbs tbs_01 dbf f s1 dbs tbs_02 dbf s1 dbs tbs_09 dbf 9 rows selected acctbal NUMBER 11 2 acct CONSTANT NUMBER 4 3 CONNECT SYSTEM system_password DB_NAME database_name SELECT last_name employee_id FROM empl
162. er and OC4J on page 3 14 1 For OC4J configuration modify the following OC4J configuration files application xml and default web site xml in SORACLE_ HOME 3j2ee 0C4J3_Portal config For application xml under lt orion application gt tag add the following lt library path SORACLE_HOME jlib repository jar gt lt library path SORACLE_HOME jlib jndi jar gt lt library path SORACLE_HOME jlib ldapjclnt9 jar gt lt library path SORACLE_HOME j2ee home jazn jar gt lt library path SORACLE_HOME j2ee home jaas jar gt For default web site xml under lt web site gt tag add the following lt web app application UltrasearchAdmin name admin root ultrasearch admin_sso gt Modify modOC4J configuration files Add the following to mod_oc4j conf Oc4jMount ultrasearch admin_sso OC4J_Portal Confirm the following SORACLE_HOME Apache Apache conf httpd conf includes oracle_apache conf SORACLE_HOME Apache Apache conf oracle_apache conf includes ultrasearch conf SORACLE_HOME ultrasearch webapp config ultrasearch conf has the following content add alias for ultra search online help and welcome page Alias ultrasearch doc private nli ora9ias ultrasearch doc Alias ultrasearch private nli ora9ias ultrasearch sample lt IfModule mod_osso c gt lt Location ultrasearch admin_sso gt require valid user authType Basic lt Location gt lt IfModule gt
163. er is not running a The JDBC launcher is running but the connect user or role specified is incorrect After a remote crawler is launched verify that it is running with one or more of the following methods a For RMI based crawling check for active Java processes on the remote crawler host A simple way to confirm that remote crawler is running on the remote crawler host is to use an operating system command such as ps on UNIX systems Look for active Java processes a For JDBC based crawling check that the launcher is up and running and that there are no errors When you start the JOBC based launcher it will output text to standard output You may optionally redirect output to a file Monitor this output for any errors a Monitor the contents of the schedule log file If the remote crawler is running successfully then you should see the contents of the schedule log file changing periodically The schedule log file is located in the shared log directory Oracle Ultra Search on Real Application Clusters Oracle Ultra Search can crawl on one fixed node or on any node depending on the storage access configuration of the Real Application Clusters system PL SQL APIs are provided to specify which node should run the crawler if needed For Oracle Ultra Search administration and the Oracle Ultra Search query application you can configure the connection string to connect to any node of Real Application Clusters See Also The document
164. er the following o java MetaLoader db database_connection_string u user_name p password i instance_name type loader_type f input_file Where a db is the database connection string a u is the database schema user name a p is the database schema password a i is the Oracle Ultra Search instance name a type is the loader metadata type lov or doc a f is the input metadata XML filename For example suppose you use the tool to load attribute LOVs specified in the XML file test xml with the following arguments a Database connection string dlsun576 5521 isearch a Schema user name wk_test Schema password welcome a Oracle Ultra Search instance name wk_inst The following statement launches the loader program o java MetaLoader db dlsun576 5521 isearch u wk_test p welcome i wk_inst type lov f test xml Loading Documents and Relevance Scores To use the loader tool to add documents and their relevancy boosting scores into Oracle Ultra Search the parameter t ype value should be doc The Input XML File The document URL and relevance boosting scores are defined in an XML file You can define one or more documents to be boosted Each document can have one or more boosting score pairs The definition of the XML file is stored in the XML schema A 2 Oracle Ultra Search User s Guide Loading Search Attribute LOVs and LOV Display Names See Also XML Schema for Document Relevance Boosting on
165. ere is an example of the ultrasearch properties file connection driver oracle jdbc driver OracleDriver connection url jdbc oracle thin ldap dlsun8888 cn oracle com 3060 iasdb cn or aclecontext oracle net encryption_client REQUESTED oracle net encryption_types_client RC4_56 DES56C RC4_40 DES40C oracle net crypto_checksum_client REQUESTED oracle net crypto_checksum_types_client MD5 oid app_entity_cn ml6bi sgtcnsun03 cn oracle com domain us oracle com Installing and Configuring Oracle Ultra Search 3 23 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Where connection driver specifies the JDBC driver you are using connection url specifies the database to which the middle tier connects Oracle Ultra Search supports following formats a _ host port SID where host is the full host name of the Oracle base instance running Oracle Ultra Search port is the listener port number for the Oracle Database instance and SID is the Oracle Database instance ID a HA aware string for example TNS keyword value syntax Here is an example connection url string connection url jdbc oracle thin ultrasearch us oracle com 1521 myInstance oracle net encryption_client oracle net encryption_types_ client oracle net crypto_checksum_client and oracle net crypto_checksum_types_client control the properties of the secure JDBC connection made to the database See Oracle Database JDBC Developer s Guide and Reference for more
166. ersion 1 1 the Oracle Ultra Search tag library better separates the dynamic Java development effort from the static HTML development effort and enables Web developers who are unfamiliar with Java to incorporate search functionality into their applications The Oracle Ultra Search tag library provides a subset of the features in the Java Query API Advanced features such as custom query expansion and URL submission are not available as tags The main features of the tag library are the Oracle Ultra Search Developer s Guide and API Reference 9 9 Oracle Ultra Search Query Tag Library following ability to retrieve search attributes groups languages and LOVs for rendering the advance query form and ability to iterate through the resulting hit set and retrieve document attributes and properties for rendering the result page The tag library is summarized in following table Tag Description Attributes instance This tag establishes a connection to an Oracle Ultra Search instance showAttributes For an advanced query use this tag to show the list of attributes available showGroups For an advanced query use this tag to show the list of groups showLanguages For an advanced query use this tag to show the list of languages defined in the instance showLOV Show all values defined for a search attribute getResult Perform the search 9 10 Oracle Ultra Search User s Guide instanceld username password URL da
167. es host doc and exclude path files host doc unwanted 4 Specify the types of documents the Oracle Ultra Search crawler should process for this file source HTML and plain text are default document types that the crawler always processes 8 28 Oracle Ultra Search User s Guide Sources Page 5 Oracle Ultra Search displays file data sources in text format However if you specify display URL for the file data source then Oracle Ultra Search uses the URL to display the file data source With display URL for file data sources the URL uses network protocols such as HTTP or HTTPS to access the file data source To generate display URL for the file data source specify the prefix of the original file URL and the prefix of the display URL Oracle Ultra Search replaces the prefix of the file URL with the prefix of the display URL For example if your file URL is file home operation doc file doc and the display URL is https webhost client doc file doc then you can specify the file URL prefix to file home operation and the display URL prefix to https webhost client 6 Specify the ACL access control list policy for the data source When a user performs a search the ACL controls which documents the user can access The default is no ACL with all documents considered searchable and visible Alternatively you can specify using the Oracle Ultra Search ACL You can add more than one group and user to the ACL for the data sour
168. es the other is data source specific Understanding the Oracle Ultra Search Administration Tool 8 19 Attributes Page Mappings To define your own attribute enter the name of the attribute in the text box select string date or number and click Add You can add or delete LOV entry and display name for search attributes Display name is optional If display name is absent then LOV entry is used in the query screen Note LOV is only represented as string type If LOV is in date format then you must use DD MM YYYY to enter the LOV To update the policy value click Manage LOV for any attribute A data source specific LOV can be updated in three ways a Update the LOV manually The crawler agent can automatically update the LOV during the crawling process a New LOV entries can be automatically added by inspecting attribute values of incoming documents Caution If the update policy is agent controlled then the LOV and all translated values are erased in the next crawling This section displays mapping information for all data sources For user defined data sources mapping is done at the agent level and document attributes are automatically mapped to search attributes with the same name initially Document attributes and search attributes are mapped one to one For each user defined data source you can edit the global search attribute to which the document attribute is mapped For Web file
169. es more computational processing power and is generally slower than other types of queries Scoring Classes There are three ways documents are matched against an end user query string These three ways are known as scoring classes Documents are scored and ranked higher if they satisfy the requirements for a higher class Within each class documents are also ranked differently depending on how well they match the conditions of that scoring class Class 1 is the most heavily weighted class The score is derived from the number of occurrences of a precise phrase in a document A document that has more instances of the precise phrase have a higher score than another document that has fewer occurrences of the precise phrase Class 2 is the next more heavily weighted class In this class the closer the tokens appear in a document the higher the score becomes For example an end user query string Oracle Applications Financials can result in three documents found None of the three documents contain the precise phrase Oracle Applications Financials However document X contains the all three tokens Oracle Applications and Financials in the same sentence separated by other words Document Y contains the individual tokens in the same paragraph but in different sentences Document Z contains the same three tokens but each token resides in different paragraphs In this scenario document X has the highest score because the tokens are clo
170. escape character in a command prompt is the caret Your prompt reflects the subdirectory in which you are working Referred to as the command prompt in this manual C oracle oradata gt xxiii Convention Meaning Example Special characters HOME_NAME The backslash special character is sometimes required as an escape character for the double quotation mark special character at the Windows command prompt Parentheses and the single quotation mark do not require an escape character Refer to your Windows operating system documentation for more information on escape and special characters Represents the Oracle Database home name The home name can be up to 16 alphanumeric characters The only special character allowed in the home name is the underscore C gt exp scott tiger TABLES emp QUERY WHERE job SALESMAN and sal lt 1600 C gt imp SYSTEM password FROMUSER scott TABLES emp dept C gt net start OracleHOME_NAMETNSListener xxiv Convention Meaning Example ORACLE_HOME In releases prior to Oracle8i release 8 1 3 Go to the ORACLE_BASE ORACLE _ and ORACLE_ when you installed Oracle Database HOME rdbms admin directory BASE components all subdirectories were located under a top level ORACLE_HOME directory that by default used one of the following names a C orant for Windows NT a C orawin98 for Windows 98 This release complies with Optimal Flexible A
171. eteseseesesescececeresesesnensneseseeneneneseeeenes 9 26 Oracle Ultra Search Java Email APD oo eeecseeseneeseeeseeeeseeessesesseecseecnceecseeecsescneaeseseseees 9 26 JavaMail Implementations arigi a arela eaaa Seear e a E e Ea t a Se eSEE esae 9 27 Java Ema WAPT aeea a itches aa aAA elada aeaa Arora aE aA TE anena En aa Sonah ASe 9 28 Sample Mailing List Browser Application Files s ssssssssrtsestsrtssstertissnerstssssestesstesstenteess 9 28 Setting up the Sample Mailing List Browser Application sss sessssssertsssssrstssssesttsstesstesteess 9 29 Oracle Ultra Search URL Rewriter API sessssssetsestterttstserttssstesttsntesstestesssesttesseestesnteestenteens 9 29 URE Toit Pater nn Eneee Gibsonia nd cde aE Aa EE EDE SETE ESE Eea SE PERE ERS 9 29 xi 10 xii RE Eink ROWING ssr i eeii a E E EE E E R EREE 9 30 Creating and Using a URL R wWwrit t seenuesirerre iodi osain na REE EED E 9 32 Oracle Ultra Search Sample Query Applications s sesssssssssisssssesssestsrteseessestentesnnsneseeneeseess 9 33 Sample Query Applications issnin pEi oran ersteteeets 9 34 JavaServer Page Concepts kssr danyen e R R E 9 34 Administration PL SQL APIs Tnstapce Related APIs ioneina EAEE E EAA A AA AEE EEEE 10 3 CREATE INSTANGE erositi et ireira a ea iiaa ia eeaeee ete a an 10 3 DIONA INEA WAIN O sos EEA EEN E AE AREETA 10 5 GRAN FA DMINnssiesess stastsnsse ct atlbandehdisienutbdi E E a aaee a a ven deere 10 6 REVOKE ADMIN c
172. exing Documents Queuing and Caching Documents Figure 7 1 on page 7 6 and Figure 7 2 on page 7 7 illustrate an instance of the crawling cycle in a sequence of nine steps The example uses a Web data source although the crawler can also crawl other data source types Figure 7 1 illustrates how the crawler and its crawling threads are activated It also shows how the crawler queues hypertext links to control its navigation This figure corresponds to Steps 1 to 5 Figure 7 2 illustrates how the crawler caches Web pages This figure correspond to Steps 6 to 8 The steps are the following 7 4 Oracle Ultra Search User s Guide Crawling Process for the Schedule 6 7 8 Oracle spawns the crawler according to the schedule you specify with the administration tool When crawling is initiated for the first time the URL queue is populated with the seed URLs Figure 7 1 Crawler initiates multiple crawling threads Crawler thread removes the next URL in the queue Crawler thread fetches the document from the Web The document is usually an HTML file containing text and hypertext links Crawler thread scans the HTML file for hypertext links and inserts new links into the URL queue Duplicate links already in the document table are discarded Crawler caches the HTML file in the local file system Figure 7 2 on page 7 7 Crawler registers URL in the document table Crawler thread starts over by repeating Step 3 Fetching a doc
173. export the shared directories on host X using the UNIX export command Then use the UNIX mount command on hosts Y2 and Y3 to mount the exported directories For host Y1 you must purchase a third party NFS client for Windows and use that to mount the shared directories If host X is a Linux server you can create Samba shares and thereby mount those shares on Windows without needing any third party software If for some reason there is no shared file system between the database and remote crawler hosts you can instruct the remote crawler to transfer all cache and mail archive data across JDBC to the database host The files are then saved locally on the database host You can choose this option by selecting through JDBC connection for the Cache file access mode setting in the next step Configure the remote crawler with the administration tool To edit the remote crawler profile navigate to the Crawler Remote Crawler Profiles page and click Edit for the remote crawler profile you want to edit Edit that profile by manually entering all mount points for the shared crawler resources that you defined Tuning and Performance 5 9 Using the Remote Crawler 4 Cache and mail archive directories If the backend database host and remote crawler host are on a shared file system such as NFS select through mounted file system for the Cache file access mode setting Then specify values for the following parameters a Mount point for cache d
174. g User Defined Data Source Types ccccccsceieesesseseseeesceesesesnensneneseeneneeseeeenes 8 32 Creating User Defined Sources ssanie r E Ee TE a EEE AE 8 33 Schedules Pages sccciieiee rao E E ER REER E A E E E 8 34 Data SynchroniZation sran e a ee a i st EEA ET EE A E E EE E TRR 8 34 Creating Synchronization Schedules s sssessessssessssesstistessestestessisnesientesresnnsteneesseseesees 8 34 Updating Sched ules isinsin niee a a aa a Reece Ai 8 34 Editing Synchronization Schedules ssssssesessssseessesisssestessestertessesnestentestesnestentesseseesees 8 35 Launching Synchronization Schedules c ccsccscescscscesesesesteneeseseeceesesseceeseassesnaeneneees 8 37 Synchronization Status and Crawler Progress s esssseesesierisesstertessseritssssestesntesstentenss 8 37 Ind x Optimizations lesiestes ties cseaeudsesietsasheesestiessasueteasscieasansencbessahdsetesnpanadactsnsstatasicnrnsdesens 8 38 Queries Page ici aaiiatrishscvatinit iil atis iis a a a e aa eaae ae eE GAOR ES least 8 39 Data GroupS ssit tacecittied eni a e e NE eed vines Se EN N 8 39 URL StibmiSsiOn e A T a e a PRS Ai ee he an a a a ce ee 8 40 Rel vanicy BOOStin Ss 4 2 0 acrrsran inei E EA E RE N E ERENER 8 40 Query Statistics eie a oana e oad ebboncpdes adver se aden ea KE SAES SETE Saaai 8 41 Configurations sinnte aere e aae asha MAAS IG aoe aaee eae ARAL wl aed 8 42 Users P ge eiiie rak aeran iesaki kisaa rads Paara TAAA eain Kasan alee 8
175. g site ous_adm DISABLE SCHEDULE OUS_ADM UPDATE_SCHEDULE marketing site OUS_ADM RECRAWL_POLICY ous_adm RECRAWL_ON_EVERYTHING 10 14 Oracle Ultra Search User s Guide Schedule Related AP Is In this example 1001 is the ID of a remote crawler OUS_ADM UPDATE_SCHEDULE marketing site ous_adm CRAWLER_ID 1001 Administration PL SQL APIs 10 15 Crawler Configuration AP Is Crawler Configuration APIs This section provides reference information for using the crawler configuration APIs IS_ADMIN_READONLY Syntax Example Use this function to check whether a crawler configuration setting is read only or not IS_ADMIN_READONLY returns 1 if the configuration is read only 0 if it is not OUS_ADM IS_ADMIN_READONLY config_name IN NUMBER crawler_id IN NUMBER DEFAULT LOCAL_CRAWLER return number config_name The name of the crawler configuration Possible values are Configuration Name Description CC_CACHE_DIRECTORY crawler cache directory path CC_CACHE_SIZE size of the cache in megabytes CC_CACHE_DELETION enable disable removing cache files after indexing CC_LOG_DIRECTORY crawler log file location crawler_id The ID of the crawler whose configuration you are checking This may be set either to LOCAL_CRAWLER or the ID of a remote crawler If OUS_ADM 1IS_ADMIN_READONLY ous_adm CC_CACHE_DIRECTORY then end if
176. h middle tier be installed on its host By using several remote crawler hosts and carefully allocating schedules to specific hosts you can achieve scalability and load balancing of the entire crawling process 5 8 Oracle Ultra Search User s Guide Using the Remote Crawler Installation and Configuration Sequence 1 Make sure that you have installed the Oracle Ultra Search Backend server component as well as a Server component on each host that is to be used to run remote crawlers See Also Chapter 3 Installing and Configuring Oracle Ultra Search Understand the cache and mail archive directories All remote crawlers must cache crawled data into a common file system location that is accessible by the backend Oracle Ultra Search database Likewise when crawling Email sources all emails must be saved in a common central location The simplest way to achieve this is by ensuring that the cache and mail archive directories seen by the remote crawler uses are mounted through NFS to point to the cache and mail directories used by the Oracle Ultra Search backend database For example your Oracle Ultra Search installation might consist of four hosts one database server host X running Solaris on which the Oracle Ultra Search backend is installed one remote crawler host host Y1 running on Windows one remote crawler host host Y2 running on Solaris and one remote crawler host host Y3 running on Linux In this scenario
177. h provide access to a vast variety of content repositories in a single gateway Each one of these repositories has its own security model that determines whether a particular end user can access a particular document Because Oracle Ultra Search provides access to data from multiple repositories existing security information in each repository must be carefully supported to avoid unauthorized access This section describes the security architecture of Oracle Ultra Search Security is implemented at the following levels a User authentication This is the identification of a user through LDAP and Oracle Internet Directory at Oracle Ultra Search front end interfaces a User entitlement This determines whether a user can access information about a particular item in the results list It is implemented by access control lists ACLs Oracle Ultra Search provides mapped security to third party repositories by retrieving the access control list for each document at the time of indexing and storing them in Oracle Ultra Search Oracle Ultra Search does not need any connection with the repository itself to validate access privileges 6 2 Oracle Ultra Search User s Guide About Oracle Ultra Search Security a Secure communications All content crawling indexing and querying is encrypted using secure socket layer SSL a worldwide standard for encryption over the HTTP protocol HTTPS For Oracle Ultra Search to access secure Web sites y
178. h there are no subscribers the enterprise installing Oracle Ultra Search is the default subscriber All Oracle Ultra Search administration groups super user and instance administrator groups are created under Default Subscriber Oracle Context for example cn OracleContext dc us dc oracle dc com in the Oracle Internet Directory Information Tree DIT Figure 6 1 shows an example of the Oracle Internet Directory topology of a hosted environment There are two subscribers A and B and the default subscriber Each subscriber has its own super user privilege group associated with it There are four Oracle Ultra Search instances created in the Oracle Ultra Search back end install 1 Instance 1 is associated with the default subscriber Instance 2 and Instance 3 are associated with Subscriber A Instance 4 is associated with Subscriber B Each Oracle Ultra Search instance has its instance administration group associated with it Figure 6 1 Oracle Internet Directory Topology of a Hosted Environment Subscribe B Ultra Search Install 1 Subscribe A Ultra Search Install 1 Default Subscriber Ultra Search Install 1 Super users instance 2 inst admins Security in Oracle Ultra Search 6 5 About Oracle Ultra Search Security Admin Privilege Model This section describes the privilege model of Oracle Ultra Search administration tool in the hosted environment The model applies to both the
179. he Query Application 0 0 0 0 cece cecseseeseseseesseseeeseesssesesesessssesesesesesesesees 4 15 Step 1 Edit the data sources xml File eieae eee re oa Ea aE es 4 15 Step 2 Deploy Multiple Query Applications Against Multiple Instances ccce 4 16 Tuning and Performance Tuning the Web Crawling Proc SS isn onisiisissnniansis inikas aa aveai aies 5 2 Web Crawling Strat g yoii ae isee era eea sae EKER EEEE ceendsbdbeas soaatahaen ER k inar aT REEE 5 2 Monitoring the Crawling PECES inniit senii onasin ieina teesis aae or tarnarii i eda S pe anarei 5 2 WIRE o0 11 ao E E A A 5 2 Tuning Query Performance ssessessissessesssetississesssntesresntsnstistisnesntntensesntsntntensinnesnentessesnesnenteneent 5 3 Using the Remote Crawler a nsrnuccnissnisienieiire Sinua ie e ia ie E ii 5 6 RMI Based Remote Crawling nor meere aE A e EE Ea EAE ee Aa EA ERAT 5 7 JDBC Based Remote Crawling iepener arsaa rE E Ka akreo e ve aE Eeit 5 7 Security With Remote Crawlers cccscsccesessssssescscecesescsesnsesessseenesesescecesesesssesnansneseseeneeseseeeenes 5 8 Scalability and Load B lancing ss sinisieesirionnsinesi ietin stons se sebibirbsneoensh asies ia ie ia Eiee 5 8 Installation and Configuration Sequence ccceccccccsssestsnseseseeneneescecesesesssestseneseseeneeseseseenes 5 9 Oracle Ultra Search on Real Application Clusters 0 cccccccscesssssesesesesenseesessseseseseees 5 13 6 Configuring Storage ACCOSS cccscscccs
180. he SSO Oracle Internet Directory Configuring Oracle Ultra Search in the Subscriber Context For each subscriber run the following scripts to configure Oracle Ultra Search in the Oracle Internet Directory subscriber context The script does the following a Creates the reference objects in the subscriber context Creates default privilege group entry in the subscriber context Updates the subscriber information in the Oracle Ultra Search metadata repository Script usage ORACLE_HOME ultrasearch setup usca sh action add_subscriber user OID_user_DN password orcladmin_password subscriber subscriber_DN The Oracle Internet Directory user must have the iASAdmins privilege Before you run the script make sure you have the execute permission on the script and setup the ORACLE_HOME environment variable The following example configures Oracle Ultra Search in the subscriber dc us dc oracle dc com ORACLE_HOME ultrasearch setup usca sh action add_subscriber user cn orcladmin password welcomel subscriber dc us dc oracle dc com To drop the subscriber first perform the following script to remove Oracle Ultra Search entries from the Oracle Internet Directory subscriber context ORACLE_HOME ultrasearch setup usca sh action remove_subscriber user OID_user_DN password orcladmin_password subscriber subscriber_DN 3 30 Oracle Ultra Search User s Guide 4 Post Installation Information This chapter conta
181. he best way to know if a remote table or view can be safely crawled by Oracle Ultra Search is to check for the existence of the ROWID column To do so run the following SQL statement against that table or view using SQL Plus SELECT MIN ROWID FROM table_name view_name The base table or view cannot have text columns of type BFILE RAW Email Sources An email source derives its content from emails sent to a specific email address When the Oracle Ultra Search crawler searches an email source it collects all emails that have the specific email address in any of the To or Cc email header fields The most popular application of an email source is where an email source represents all emails sent to a mailing list In such a scenario multiple email sources are defined where each email source represents an email list To crawl email sources you need an IMAP account At present the Oracle Ultra Search crawler can only crawl one IMAP account Therefore all emails to be crawled must be found in the inbox of that IMAP account For example in the case of mailing lists the IMAP account should be subscribed to all desired mailing lists All new postings to the mailing lists are sent to the IMAP email account and subsequently crawled The Oracle Ultra Search crawler is IMAP4 compliant When the Oracle Ultra Search crawler retrieves an email message it deletes the email message from the IMAP server Then it converts the email message con
182. he sample query applications include a sample search portlet The sample Oracle Ultra Search portlet demonstrates how to write a search portlet for use in Oracle Application Server Portal This same portlet is installed as a feature of the Oracle Application Server Portal product See Also Oracle Ultra Search Query API on page 9 2 Sample Search Portlet Oracle Ultra Search provides a search portlet that can be embedded in Oracle Application Server Portal pages It is implemented as a JavaServer Page application The Oracle Ultra Search search portlet supports most of the functionality provided by the Query API Complete Sample application See Also a The Oracle Application Server Portal documentation for more information about portlets a Oracle Ultra Search Sample Query Applications Readme for more information about the Query API Complete Sample application 1 8 Oracle Ultra Search User s Guide Oracle Ultra Search Features Query API URL Rewrite Oracle Ultra Search offers a flexible query API to incorporate search functionality to your sites The query API includes the following functionality Three attribute types string number and date a Multivalued attributes a Display name support for attributes attribute list of values LOV and data groups a Document relevancy boosting a Arbitrary grouping of attribute query operator using operators AND OR with control over attribute operator evaluation order a S
183. heir respective owners Contents Send US Your Comments 00 cccescesccsesscsssssssesessesvssusscssessesussessessessesssessessessessees Xv POT AGO PERAE A EEE E AANEEN AEA xvii ATION Ce EEEE AAE EASE EESE TRS EEA EEAS EAEE EEEE xvii 10a ei a ESEE A E EE waists xvii Related DGGumenta tom seerne a e a leds com vk Seals E xix CONVENU ONS Er AEA I A Tes ca sees Ses an tas saat oes cas aeeas bsa Sted Tac cas ats Soe vat nea baus tag Ses AA Recess eget te XX Documentation Accessibility srei reit iess iee REKE EEA EE ERE TSE E Eee XXV What s New in Oracle Ultra Search 00 seeseieeesseresssesrrsssssrrrssssrerssessressessresssrrresssrreress xxvii Oracle Ultra Search Release Information ccccccccccccesccccesssecscssscsssecesessecesesseessessessssesesssesseesess XXxXi Introduction to Oracle Ultra Search Overview Of Oracle Ultra Search cccccccccccssesssessecsecssecsecssessecssssaecessesecessseseeeseececeseeeseeseeesecaes 1 2 Oracle Ultra Search Components ccceecc cs cseseseesesesssssseesesesssssseesesesesssssesesesessseeseseseseseeeeees 1 2 Oracle Ultra Search Crawler ccccccccssccssessecsscssecscessesecessesecescsseseseeseecseeeeecseessecseesaecaecsaecaeeasenees 1 2 Oracle Ultra Search Backend inei e tide ebevecieede T a Taaa 1 3 Oracle Ultra Search Administration Tool eseessssssssesesssessseresstsrsesssstststststterststsrtessesesesesseeees 1 3 Oracle Ultra Search APIs and Sample Applications
184. heir use Convention Bold Italics XX Meaning Example Bold typeface indicates terms that are When you specify this clause you create an defined in the text or terms that appearin index organized table a glossary or both Italic typeface indicates book titles or Oracle Database Concepts emphasis Ensure that the recovery catalog and target database do not reside on the same disk Convention Meaning Example UPPERCASE monospace fixed width font lowercase monospace fixed width font lowercase italic monospace fixed width font Uppercase monospace typeface indicates elements supplied by the system Such elements include parameters privileges datatypes RMAN keywords SQL keywords SQL Plus or utility commands packages and methods as well as system supplied column names database objects and structures user names and roles Lowercase monospace typeface indicates executables filenames directory names and sample user supplied elements Such elements include computer and database names net service names and connect identifiers as well as user supplied database objects and structures column names packages and classes user names and roles program units and parameter values Note Some programmatic elements use a mixture of UPPERCASE and lowercase Enter these elements as shown Lowercase italic monospace font represents placeholders or variables Con
185. horization service JAAS provider called JAZN This 6 8 Oracle Ultra Search User s Guide Configuring a Security Framework for Oracle Ultra Search provides application developers with user authentication authorization and delegation services to integrate into their application environments See Also Configure a Secure Oracle Ultra Search Installation on page 3 6 How Oracle Ultra Search Leverages the Identity Management Infrastructure Oracle Ultra Search uses the SSO server and Oracle Internet Directory to leverage the Oracle Identity Management infrastructure With the SSO server you can log on once for all components and the Oracle Ultra Search administrative interface allows user management operations on either database users or SSO users Authenticated SSO users never see the Oracle Ultra Search logon screen Instead they can immediately choose an instance The Oracle Ultra Search administration tool and the query tool use SSO Oracle Internet Directory OID is Oracle s native LDAP v3 compliant directory service built as an application on top of the Oracle database Oracle Internet Directory hosts the Oracle common identity All Oracle Ultra Search instances are registered with Oracle Internet Directory See Also Integration with Oracle Internet Directory on page 1 12 Oracle Ultra Search has native identity management therefore in the absence of the identity management infrastructure Oracle Ultra Search uses nati
186. i xxxii 1 Introduction to Oracle Ultra Search This chapter contains the following topics a Overview of Oracle Ultra Search a Oracle Ultra Search Components a Oracle Ultra Search Features Oracle Ultra Search System Configuration Introduction to Oracle Ultra Search 1 1 Overview of Oracle Ultra Search Overview of Oracle Ultra Search Oracle Ultra Search is built on the Oracle Database and Oracle Text technology that provides uniform search and locate capabilities over multiple repositories Oracle databases other ODBC compliant databases IMAP mail servers HTML documents served up by a Web server files on disk and more Oracle Ultra Search uses a crawler to collect documents You can schedule the crawler to suit the Web sites that you want to search The documents stay in their own repositories and the crawled information is used to build an index that stays within your firewall in a designated Oracle database Oracle Ultra Search also provides APIs for building content management solutions In addition Oracle Ultra Search offers the following A complete text query language for text search inside the database a Full integration with the Oracle Database and the SQL query language a Advanced features like concept searching and theme analysis a Attribute mapping to facilitate attribute search across disparate repositories Indexing of all popular file formats 150 Full globalization including suppo
187. ication Server and Oracle Collaboration Suite also have the capability to crawl and make searchable Portal s own repository Oracle Application Server includes a Single Sign On SSO server SSO users can log on once for all components of the Oracle Application Server product and the Oracle Ultra Search administrative interface allows user management operations on either database users or SSO users Authenticated SSO users never see the Oracle Ultra Search logon screen Instead they can immediately choose an instance If the SSO user does not have permissions to manage Oracle Ultra Search set in the Users Page then the SSO user receives an error SSO is available only with the Oracle Identity Management infrastructure See Also http portalstudio oracle com Extensible Crawler and Crawler Agents You can define edit or delete your own data sources and types in addition to the ones provided You might implement your own crawler agent to crawl and index a proprietary document repository such as Lotus Notes or Documentum which contain their own databases and interfaces The proprietary repository is called a user defined data source The module that enables the crawler to access the data source is called a crawler agent See Also a Oracle Ultra Search Crawler Agent API on page 9 19 a Oracle Ultra Search Java API Reference Introduction to Oracle Ultra Search 1 5 Oracle Ultra Search Features Federated Search Traditionally
188. iddle Tier with Oracle HTTP Server and OC4J on page 3 14 to configure your existing Web server You can also deploy Oracle Ultra Search Web applications using Oracle Enterprise Manager See Also a Oracle Database Administrator s Guide for more information on Enterprise Manager The Troubleshooting appendix in Oracle Application Server 10g Installation Guide for more information on Oracle Application Server Configuration Assistants Configuring the Middle Tier with Oracle HTTP Server and OC4J Note For Oracle Database Oracle Containers for J2EE OC4J is configured by default You can still configure the HTTP Server and OC4J but they will be in a different ORACLE_HOME To deploy Oracle Ultra Search Web applications you must have a J2EE 1 2 container Oracle recommends using Apache Web server and OC4J See Also Deploying the Oracle Ultra Search EAR File on a Third Party Middle Tier on page 3 19 if you use a third party J2EE container or servlet engine 1 For OC4J configuration modify the following OC4J configuration files server xml application xml and default web site xml in SORACLE_ HOME j2ee 0C4J_Portal config The configuration of OC4J works with Oracle Ultra Search J2EE applications See Also Oracle Application Server Containers for J2EE documentation for more information on deploying EAR and WAR applications and for the more advanced functionality of OC4J 3 14 Oracle Ultra
189. ing documents are stored in the cache directory Every time the preset size is reached crawling stops and indexing starts In previous releases the cache file was always deleted when indexing was done You can now specify not to delete the cache file when indexing is done This option applies to all data sources The default is to delete the cache file after indexing See Also Crawler Page on page 8 12 URL Boundary Rules Include Port Number Inclusion or Exclusion You can set URL boundary rules to refine the crawling space You can now include or exclude Web sites with a specific port For example you can include www oracle com but not www oracle com 8080 By default all ports are crawled See Also Creating Web Sources on page 8 22 Xxix XXX Hostname Prefix Allowed in Web Data Source URL Boundary Specification In previous releases you could only specify suffix inclusion rules For example crawl only URLs ending with oracle com You can now also specify prefix rules For example crawl oracle com but not stores oracle com See Also Creating Web Sources on page 8 22 Default Oracle Ultra Search Instance and Schema Oracle Ultra Search automatically creates a default Oracle Ultra Search instance based on the default Oracle Ultra Search test user So you can test Oracle Ultra Search functionality based on the default instance after installation See Also Configuring the Default Oracle Ultra Search Instance on pag
190. initialization parameter value a Restart the database so that the new initialization parameter takes effect Configure the Oracle Oracle Internet Directory SSL link To establish a secure connection between database and Oracle Internet Directory please follow the instructions in the following books a Configuring Oracle Internet Directory for SSL Chapter 13 Secure Sockets Layer SSL and the Directory in the Oracle 9 2 release of the Oracle Internet Directory Administrator s Guide a Configuring the database for SSL Chapter 15 Managing Enterprise User Security Part II Task 1 Task 3 in the Oracle Database 9 2 release of the Oracle Advanced Security Administrator s Guide Secure search functionality requires that the Oracle Ultra Search database is Oracle version 9 2 0 4 or higher and that the Oracle Ultra Search database is linked to a compatible instance of Oracle Internet Directory This is necessary because Oracle Ultra Search utilizes XML DB functionality which requires a certain version of Oracle XML DB in turn requires a live link to Oracle Internet Directory through which it retrieves all LDAP principal information The Oracle OID link must be running at all times for secure search to work To set up this link configure the Oracle Database to use Oracle Identity Management Step 2 Restart the Oracle listener In the previous step you configured the Oracle Database to use Identity Management That
191. ins Siseietnl dedasendeealaceainins Mtensoiedcaussssetiabes 7 2 Using Crawler Agents siaren acetone aa han taxiaen indi Sienna 7 3 Synchronizing Data SOULCES ccccceccscssesesceesesesssnsneseseseeseseseececesescsesnananeseseseesesessecesesescseaanenees 7 3 vii Display URL and Access URL cccceccsesessesesessstetescscecesescscensnsseseseeesesescecenesesesesnsneseseseeceseseseeeenes 7 3 Document Attrib utes i 25 cis ccieistes cloths ale oth ide a e teenies a a irii 7 3 Crawling Process for the Schedulle ccccccc csc cceeeeececseseseseececesesensnececesesssseesecesesensneseeeenes 7 4 Queuing and Caching Documentais siera ar aE r S a n E ae AE REN 7 4 Indexing Documents moet oi E T R TEE RO E A E R 7 7 Data Synchronization ccc ccc in eeraa d e ea a e E a aaa 7 8 Web Crawling Boundary Control 0 0ccc ccc ceeeececsceeseaeecececsssnseesececessnsnesececssessneseeecenes 7 9 URL Boundary Rule sists ttnetiions tite Enare and nuh daaioneasdtin darian ieee 7 9 robots txt Protocol and robots META Tag ccccccssesesssnsteesesteneteseececesescsesnasnesesesneeneseeeeees 7 10 Craw ling Depths ne n a Ee AE ea A ein tists E tea te ana ken S 7 11 URL REWMI e Pesin iradis io iaa E E EE E EPEa EE E E N 7 11 URL Redirection and Boundary Rule Enforcement ss se sesssesssssrttssserttssssesttsntesstesteessestee 7 11 Oracle Ultra Search Remote Crawler c cccccccccccscscesssesesseneseseececescsescenenesesesnsuenesesceeesesssesane
192. ins the following topics Changing Oracle Ultra Search Schema Passwords Configuring the Oracle Server for Oracle Ultra Search Managing Stoplists Upgrading Oracle Ultra Search Configuring the Query Application Post Installation Information 4 1 Changing Oracle Ultra Search Schema Passwords Changing Oracle Ultra Search Schema Passwords There are two Oracle Ultra Search system schemas created during installation WKSYS and WKPROXY You can update the schema password in the following way For the Oracle Database Release After the database is installed all user schema accounts are locked To log on as user WKSYS or WKPROXY unlock WKSYS or WKPROXY by running the following statement as the SYSTEM or SYS database user ALTER USER WKSYS ACCOUNT UNLOCK IDENTIFIED BY desired_password For the Oracle Application Server or the Oracle Collaboration Suite Release After the infrastructure database is installed all user schema passwords are randomized To log on as user WKSYS or WKPROXY change the WKSYS or WKPROXY schema password by following the link Change Schema Password from the Oracle Enterprise Manager Infrastructure page See Also Oracle Enterprise Manager Grid Control Installation and Basic Configuration Configuring the Oracle Server for Oracle Ultra Search The operations described in this section are database administration operations They can be performed using Oracle Enterprise Manager or SOQL Plus Step 1
193. ion on how to configure a service setting After SOL Plus is running log on to the database using the schema and password that you located in Step 2 4 Invoke the registration script Start up SQL Plus as the WKSYS super user and enter the following full_path_of_registration_script The registration script for RMI based remote crawling is the following SREMOTE_ORACLE_ HOME ultrasearch tools remotecrawler scripts lt platform gt register sql The registration script for JOBC based remote crawling is the following SREMOTE_ORACLE_ HOME ultrasearch tools remotecrawler scripts lt platform gt register_jdbc sql For example if the value for REMOTE_ORACLE_HOME on a UNIX host is home oracle9i then enter the following at the SQL Plus prompt to register an RMI based remote crawler home oracle9i ultrasearch tools remotecrawler scripts unix register sql Likewise if you are running SQL Plus on Windows and REMOTE_ORACLE_ HOME is in d Oracle Oracle9i then enter the following at the SQL Plus prompt to register a JOBC based remote crawler d Oracle Oracle9i ultrasearch tools remotecrawler scripts winnt register_ jdbc sql The RMI based registration script prompts you for three variables RMI_HOSTNAME The remote hostname This is where the RMI registry daemon will run RMI_REGISTRY_PORT The port that the RMI registry is listening on ORACLE_HOME The Oracle home located in S
194. ion is complete the system administrator should re activate the crawling schedule to re index the document You do not need to reconfigure the system or re enter any data You can still query documents that were crawled and indexed by the previous version Oracle Ultra Search Migration Approaches There are two approaches to migrate user data the in place approach and the ETL extract transform load approach With the in place approach the current ORACLE_HOME is used With the ETL approach a new ORACLE_HOME is created Oracle Ultra Search In Place Migration In place migration upgrades existing configurations and user data to the latest Oracle Ultra Search release Upgraded files are left in place and the source installation is modified The benefit to this approach is that it might conserve disk space With the in place approach data migration involves the following six steps 1 Back up user data 2 Deinstall previous database objects 3 Install new database objects 4 Re create user instances 5 Restore data 6 Rebuild index Use the SQL script wk Oupgrade sq1 to run the in place migration steps one through five listed in the preceding section The script is located in the SULTRASEARCH_HOME admin directory It requires the following input parameters SYSPW password of the user SYS WKSYSPW password of the user WKSYS HOST database host computer PORT database port number ORACLE_SID datab
195. ions Tuning the Web Crawling Process Tuning Query Performance Using the Remote Crawler Oracle Ultra Search on Real Application Clusters Table Data Source Synchronization Tuning and Performance 5 1 Tuning the Web Crawling Process Tuning the Web Crawling Process The Oracle Ultra Search crawler is a powerful tool for discovering information on Web sites in an organization s intranet This feature is especially relevant to Web crawling The other data sources for example table or email data sources are defined such that the crawler does not follow any links to other documents that you might not be aware of Web Crawling Strategy Your Web crawling strategy can be as simple as identifying a few well known sites that are likely to contain links to most of the other intranet sites in your organization You could test this by crawling these sites without indexing them After the initial crawl you have a good idea of the hosts that exist in your intranet You could then define separate Web sources to facilitate crawling and indexing on individual sites However in reality the process of discovering and crawling your organization s intranet is an interactive one characterized by periodic analysis of crawling results and modification to crawling parameters to direct the crawling process somewhat For example if you observe that the crawler is spending days crawling one Web host then you might want to exclude crawling at that host
196. irectory path as seen by the remote crawler a Mount point for mail archive path as seen by the remote crawler if you are using the Oracle Ultra Search mailing list feature Otherwise if there is no shared file system between the remote crawler host and the backend database host you must select through JDBC connection for the Cache file access mode setting Then specify values for the following parameters a Local cache directory as seen by local crawlers on the backend database host a Local mail archive directory as seen by local crawlers on the backend database host Crawler log directory It is not necessary that the remote crawler log directory be an NFS mount a central location accessible by the backend Oracle Ultra Search database However it is beneficial to do so if you want to be able to monitor all crawler logs local as well as all remote crawlers in one central location Additionally you must specify the following crawler parameters before you can begin crawling a number of crawler threads that the remote crawler uses for gathering documents number of processors on the remote crawler host a initial Java heap size maximum Java heap size Java classpath It is not usually necessary to specify this classpath The classpath that remote crawler processes use is inherited from the RMI subsystem The RMI subsystem classpath is configured by the scripts used to launch it You will need to modify the classpath onl
197. is added to the alias list then email sent to that address are treated as if they were sent to list company com Specify the ACL access control list policy for the data source When a user performs a search the ACL controls which documents the user can access The default is no ACL with all documents considered searchable and visible You can add more than one group and user to the ACL for the data source File Sources A file source is the set of documents that can be accessed through the file protocol on the local machine To edit the name of a file source click Edit Creating File Sources To create a new file source do the following 1 Specify a name for the file source and the default language 2 Designate files or directories to be crawled If a URL represents a single file then the Oracle Ultra Search crawler searches only that file If a URL represents a directory then the crawler recursively crawls all files and subdirectories in that directory 3 Specify inclusion and exclusion paths to modify the crawling space associated with this file source This step is optional An inclusion path limits the crawling space An exclusion path lets you further define the crawling space If neither path is specified then crawling is limited to the underlying file system access privileges Path rules are host specific but you can specify more than one path rule for each host For example on the same host you can include path fil
198. is installed with the Oracle Ultra Search backend during the database server install This is also part of the database client install The Oracle Ultra Search middle tier is installed and configured with Oracle J2EE container OC4J See Also Oracle Application Server 10g Administrator s Guide for information on how to change the Infrastructure Services for example a different Oracle Internet Directory or Metadata Repository used by an Oracle Ultra Search middle tier Web Applications Concepts The Oracle Ultra Search administration tool and the Oracle Ultra Search query applications are J2EE compliant Web applications These are three tier architecture applications Figure 3 1 shows the relationship between the browser the first tier the Web server and the servlet engine the middle tier and the Oracle Database the third tier The Web server accepts requests from the browser and forwards the requests to the servlet engine for processing The Oracle Ultra Search middle tier then communicates with the Oracle database through the JDBC as in Figure 3 1 Installing and Configuring Oracle Ultra Search 3 11 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts You can use any browser to access the Oracle Ultra Search administration tool or Oracle Ultra Search sample query application The URLs are described in the following section Figure 3 1 Oracle Ultra Search Architecture Oracle Ultra
199. ists of the following steps 1 Get the next URL from the URL queue Web crawling stops when the queue is empty 2 Fetch the contents of the URL 3 Extract URL links from the contents 4 Insert the links into the URL queue The generated new URL link is subject to all existing host path and mimetype inclusion and exclusion rules There are two possible operations that can be done on the extracted URL link a Filtering removes the unwanted URL link a Rewriting transforms the URL link URL Link Filtering Users control what type of URL links are allowed to be inserted into the queue with the following mechanisms supported by the Oracle Ultra Search crawler a robots txt file on the target Web site for example disallow URLs from the cgi directory a Hosts inclusion and exclusion rules for example only allow URLs from www acme com Oracle Ultra Search Developer s Guide and API Reference 9 29 Oracle Ultra Search URL Rewriter AP a File path inclusion and exclusion rules for example only allow URLs under the archive directory a Mimetype inclusion rules for example only allow HTML and PDF files a Robots metatag NOFOLLOW for example do not extract any link from that page a Black list URL for example URL explicitly singled out not to be crawled Note All URLs must pass domain rules before being checked for path rules Path rules let you further restrict the crawling space Path rules are host specifi
200. ith links to the Springmaster 2000 Cantaloupe Tray Problem table 2 14 Oracle Ultra Search User s Guide 3 Installing and Configuring Oracle Ultra Search This chapter contains the following topics Oracle Ultra Search Requirements Installing the Oracle Ultra Search Backend Configuring the Default Oracle Ultra Search Instance Installing the Oracle Ultra Search Middle Tier on Web Server Hosts Installing the Backend on Remote Crawler Hosts Configuring Oracle Ultra Search in a Hosted Environment Oracle Ultra Search Requirements This section describes the Oracle Ultra Search system requirements Hardware Requirements Oracle Ultra Search hardware requirements are based on the quantity of data that you plan to process using Oracle Ultra Search Oracle Ultra Search uses Oracle Text as its indexing engine and the Oracle database as its repository See Also Oracle Text Application Developer s Guide and Oracle Database Performance Tuning Guide Sufficient RAM Along with the resource requirements for the database and the Text indexing engine also consider the memory requirements of the Oracle Ultra Search crawler The Oracle Ultra Search crawler is a pure Java program Out of the Installing and Configuring Oracle Ultra Search 3 1 Oracle Ultra Search Requirements box when the crawler is launched the JVM is configured to start with 25MB and grow to 256MB When crawling very large amounts of data these values might n
201. its to return and so on a Lets you set the query session language a Lets you access Oracle Ultra Search tables to retrieve Oracle Ultra Search dictionary data such as all defined data groups and attributes a Lets you customize and generate your query interface and search result screen with procedures that return blocks of HTML code that you can embed into your Web application m Lets the search end user submit URLs to the seed URL list The Oracle Ultra Search Java query API is encapsulated in the oracle ultrasearch query package See Also Tuning Query Performance on page 5 3 Customizing the Query Syntax Expansion Oracle Ultra Search uses the Oracle Text engine to index and search documents When an end user specifies a certain query string Oracle Ultra Search takes that string and transforms it into an Oracle Text query expression This process is called query syntax expansion You can customize Oracle Ultra Search to use your own implementation of the query syntax expansion The default query expansion lets you specify a query syntax similar to most internet search engines The syntax boosts scores for documents that match the user s query in the document title string attribute The syntax for Contains is the same when used on the document content and on string attributes The default query syntax expansion is implemented in the oracle ultrasearch query Contains class To customize query expansion use the oracle ultrasea
202. ivilege or the super user privilege then log on as an Oracle Ultra Search super user or WKSYS and click go to the Users page in the administration tool to grant the appropriate privilege Step 5 Alter the Index Preferences This step is optional 4 6 Oracle Ultra Search User s Guide Managing Stoplists An empty index is created when an Oracle Ultra Search instance is created The existing index preferences such as language specific parameters are defined in the SORACLE_HOME ultrasearch admin wkOpref sq file You can modify these preferences so that all new Oracle Ultra Search instances use the modified preferences or you can alter the index using your own preferences immediately after an instance is created Alter the index using SQL Note The crawler transforms all documents into HTML files with binary document filtering before indexing begins See Also a Oracle Text Application Developer s Guide a Oracle Text Reference Managing Stoplists Every Oracle Ultra Search instance has a stoplist associated with it A stoplist is a list of words that are ignored during the indexing process These words are known as stopwords Stopwords are not indexed because they are deemed not useful or even disruptive to the performance and accuracy of indexing Default Oracle Ultra Search Stoplist During the installation process a default stoplist is created for the Oracle Ultra Search product Subsequently when a
203. k2 columns An F is inserted into the mark column to signal the crawler that work needs to be done for this row For example REATE OR REPLACE TRIGGER wkSdel FTER DELETE ON employees FOR EACH ROW D BEGIN INSERT INTO WKSLOG k1 k2 mark VALUES old id old name F END Synchronizing Crawling of Non Oracle Databases 5 20 For tables in non Oracle remote databases you must perform the following steps 1 Manually create the log table The log table must conform to the rules for log tables described earlier Also it must reside in the same schema and database instance as the base table 2 Create three triggers that record inserts updates and deletes on the base table These triggers must exhibit the same behavior as the triggers described earlier for Oracle tables 3 Associate the log table When you have completed these tasks choose the Enable logging mechanism non Oracle tables option during the creation of an Oracle Ultra Search table data source By choosing that option the Oracle Ultra Search administration tool prompts you for the name of the log table in Oracle Ultra Search User s Guide Table Data Source Synchronization the remote database Oracle Ultra Search associates this log table with the base table Oracle Ultra Search assumes that you have correctly performed steps 1 and 2 Tuning and Performance 5 21 Table Data Source Synchronization 5 22 Oracle Ultra Search User s
204. k_test schema To access the administration application start up the Oracle Ultra Search middle tier SORACLE_HOME bin searchctl start Once the middle tier has been started you can access the Administration GUI by means of an HTML browser pointed to http your_computer com http_port ultrasearch admin index jsp At the end of the installation you should be able to find http_port on the Oracle Universal Installer screen You can also find out the value of http_port by looking at SORACLE_HOME oc44 32ee 0C4J_SEARCH config http web site xml Getting Started with Oracle Ultra Search 2 3 Setting up the Sample Query Application Setting up the Sample Query Application Oracle Ultra Search provides APIs with which you can build customized J2EE query applications Oracle Ultra Search also includes a sample application of this sort To set up the sample query application do the following put in your own values for the host name the port and the SID 1 Add the following entry to the ORACLE_HOME oc44 j2ee 0C4d_ SEARCH config data sources xml file lt data source class oracle jdbc pool OracleConnectionCacheImp1 name UltraSearchDS location jdbc UltraSearchPooledDS username wk_test password wk_test url jdbc oracle thin database_host oracle_port oracle_sia gt 2 If you started the Oracle Ultra Search middle tier in order to access the Oracle Ultra Search administration application you must st
205. le for both a domain rule and a path rule Exclusion rules always override inclusion rules Path rules are always host specific For example an inclusion domain ending with oracle com limits the Oracle Oracle Ultra Search crawler to hosts belonging to Oracle worldwide Anything ending with oracle com is crawled but http www oracle com twis not crawled If you change the inclusion domain to someurl com with a new seed Understanding the Oracle Ultra Search Crawler and Data Sources 7 9 Web Crawling Boundary Control http www someurl com then all oracle com URLs are dropped by the crawler An exclusion domain uk oracle com prevents the crawler from crawling Oracle hosts in the United Kingdom You can also include or exclude Web sites with a specific port By default all ports are crawled You can have port inclusion or port exclusion rules for a specific host but not both All URLs must pass domain rules before being checked for path rules Path rules let you further restrict the crawling space Path rules are host specific but you can specify more than one path rule for each host For example on the same host you can include the path host doc and exclude the path host doc private Note that path rules are prefix based Regular expression based domain and path rules are not supported in the current release The following rules restrict the crawler to only crawl www oracle comand otn oracle com Furthermore only URLs
206. lespace See Also Oracle Database Administrator s Guide for information on how to create a temporary tablespace Step 3 Create a Large Tablespace for Each Oracle Ultra Search Instance User For each Oracle Ultra Search instance you must create a tablespace large enough to contain all data obtained during the crawling and indexing processes This amount is subject to the amount of data you intend to crawl and index However it is often not possible to know in advance how much data you intend to collect Try to obtain an estimate of the cumulative size of all data you want to crawl 4 4 Oracle Ultra Search User s Guide Configuring the Oracle Server for Oracle Ultra Search If you cannot estimate the size then try to allocate as much space as possible If you run out of disk space then Oracle Ultra Search is able to resume crawling after you add more datafiles to the instance tablespace Here is an example of how to create a new tablespace CREATE TABLESPACE lmtbsb DATAFILE u02 oracle data 1mtbsb01 dbf SIZE 150M Pay attention to the STORAGE clause in your CREATE TABLESPACE statement The amount of data to be stored in the tablespace can be very large This can cause the Oracle server to progressively allocate many new extents when more storage space is needed If the extent management clause specifies that each new extent is to be larger than the previous extent that is the PCTINCREASE setting is nonzero then you could enc
207. lication Server Metadata Repository MR creation it can also be installed as a standalone component Installing As Part of Oracle Application Server Metadata Repository Creation The Oracle Ultra Search backend is installed as part of the Oracle Application Server Metadata Repository MR creation You can create the Metadata Repository using the Oracle Universal Installer OUI or you can create the MR on top of an existing customer database using RepCA Repository Creation Assistant For more information refer to the Oracle Application Server 10g Installation Guide and the Oracle Application Server Repository Creation Assistant guide respectively IMPORTANT If you are using RepCA to create the Metadata Repository before you can use Oracle Ultra Search you need to perform the following post install configuration steps Installing and Configuring Oracle Ultra Search 3 3 Installing the Oracle Ultra Search Backend If not already present install JDK 1 4 1 or later on the system on which the MR was installed Go to the ultrasearch admin directory of the RepCA CD Then execute the wkrepca sql script with SQL Plus You will have to connect as the wksys user and pass to the script the path to the JDK 1 4 1 or later Java executable For example sqlplus wksys schema_password repca_cd ultrasearch admin wkrepca sql usr local jdk1 4 bin java This will tell the Oracle Ultra Search backend where to find the Java executable Installi
208. login only super users can grant or revoke the super user privilege to and from other database users In SSO mode a Ifthe login SSO user belongs to the default subscriber then the user can do the following a Grant or revoke the super user privilege to SSO users in the default subscriber a Grant or revoke the super user privilege to SSO users in a particular subscriber a Ifthe login SSO user belongs to a particular subscriber then the user can grant or revoke the super user privilege to users within the same subscriber to which the login user belongs Privileges to Grant or Revoke an Instance Administrator To grant or revoke an instance administrator login to the admin tool as a super user or an instance administrator In non SSO mode database user login only super users or instance administrators can grant or revoke the instance admin privilege to and from other database users In SSO mode The login SSO user can grant or revoke only the instance admin privilege to SSO users within the subscriber the instance as associated with For example Security in Oracle Ultra Search 6 7 About Oracle Ultra Search Security the user can grant the admin privilege on Instance 2 or Instance 3 to an SSO user in subscriber A The login SSO user cannot grant or revoke the instance admin privilege to SSO users within a different subscriber For example the user cannot grant the admin privilege on Instance 2 or Inst
209. lowing Log on to Oracle Ultra Search Create Oracle Ultra Search instances Manage administrative users Define data sources and assign them to data groups Configure and schedule the Oracle Ultra Search crawler Set query options Translate search attributes and LOV and data group display names to different languages Setting Crawler Parameters To configure the Oracle Ultra Search crawler you must do the following Set crawler parameters such as the crawler log file directory To do so use the Crawler Page Set Web access parameters such as authentication and the proxy server To do so use the Web Access Page Define data sources Data sources can be Web pages database tables files email mailing lists Oracle Sources for example Oracle Application Server Portals or federated sources or user defined data sources You can assign one or more data sources to a crawler schedule To define data sources use the Sources Page You can also set parameters for the source such as domain inclusions or exclusions for Web sources or the display URL template or column for table sources 8 2 Oracle Ultra Search User s Guide Logging On to Oracle Ultra Search a Define synchronization schedules The crawler uses the synchronization schedule to reconcile the Oracle Ultra Search index with current data source content To define crawling schedules use the Schedules Page Setting Query Options Use query options to let users limit th
210. m and click Apply 2 12 Oracle Ultra Search User s Guide Issuing a Query e Create Table Source Step 4 Specify the table columns and search attribute mappings you would like Oracle Ultra Search crawler to crawl on the database For this example select RESOLUTION_TEXT for table column with search attribute of Description String CUSTOMER_NAME for table column with search attribute of Author String After you make each selection click the Map button to add the document types to the list of document types for crawling Click Proceed to step 5 f Create Table Source Step 5 Select No display URL and click Finish 3 Schedule the Ultra Search Crawler Refer to Step 6 on page 2 10 in Crawl and Index Ultra Appliance s Intranet Documents for procedures on how to set up a crawler schedule Select Table from the drop down list seen on the Create Schedule Step 3 of 3 screen 4 Execute crawling and indexing as shown in Step 7 on page 2 11 in Crawl and Index Ultra Appliance s Intranet Documents Issuing a Query This section describes the steps you acting as the Ultra Appliance call center agent use Oracle Ultra Search to query the company intranet and database You can query the Ultra Appliance information sources after you have allowed Oracle Ultra Search to crawl and index the Ultra Appliance intranet and database To query the Ultra Appliance intranet 1 Enter the URL for the Oracle Ultra Search locati
211. matically fill out during Web crawling HTML form authentication requires that HTTP cookie functionality is enabled See Also Creating Web Sources on page 8 22 Indexing Control of Dynamically Generated Web Pages The crawler can be configured to not index Web pages that are dynamically generated for example if a URL contains a question mark See Also Creating Web Sources on page 8 22 xxvii xxviii HTTPS Oracle Ultra Search now supports HTTPS HTTP over SSL The Oracle Ultra Search crawler can now crawl HTTPS URLs for example https www foo com See Also Creating Web Sources on page 8 22 Secure Searching Oracle Ultra Search now supports secure searches Secure searches return only documents that the search user is allowed to view Each indexed document can be protected by an access control list ACL During searches the ACL is evaluated If the user performing the search has permission to read the protected document then the document is returned by the query API Otherwise it is not returned Oracle Ultra Search stores ACLs in the Oracle XML DB repository Oracle Ultra Search also uses Oracle XML DB functionality to evaluate ACLs See Also Secure Search on page 1 6 Remote Crawler JDBC Caching Support It is now possible to use the remote crawler without mounting the remote cache directory to the server machine Instead the cache files are sent over the crawler s JDBC connection to the server cache di
212. mple if a schedule for data source 23 of instance 3 is launched at 10 pm July 8th then the log file name is i3ds23 07082200 1log Each successive schedule launching will have a unique log file name If the total number of log files for a data source reaches the system specified limit then the oldest log file will be deleted The number of log files is a scheduler property and applies to all of the data sources assigned to the scheduler Database Connect String The database connect string is a standard JDBC connect string used by the remote crawler when it connects to the database The connect string can be provided in the form of hostname port sid or in the form of a TNS keyword value syntax for example DESCRIPTION ADDRESS PROTOCOL tcp HOST PORT 1521 See Also Oracle Database JDBC Developer s Guide and Reference Understanding the Oracle Ultra Search Administration Tool 8 15 Crawler Page You can update the JDBC connect string to a different format for example an LDAP format However you cannot change the JDBC connect string to point to a different database The JDBC connect string must be set to the database where the middle tier points that is the middle tier and the JDBC should point to the same database Ina Real Application Clusters environment the TNS keyword value syntax should be used because it allows connection to any node of the system For example DESCRIPTION LOAD_ BALANCE yes
213. n Oracle Ultra Search instance is created a copy of the default stoplist is created for the Oracle Ultra Search instance The default stoplist is created under the WKSYS schema The default stoplist name is wk_stoplist This list is defined in the file SORACLE_ HOME ultrasearch admin wk0pref sql which is run at installation Modifying Instance Stoplists Modify the default stoplist by adding or removing stopwords from it However remember that these modifications do not affect existing Oracle Ultra Search instances They only affect Oracle Ultra Search instances that are created after the modifications are made Post Installation Information 4 7 Managing Stoplists Modifying instance stoplists should be done as a last resort Use one of the following methods a Modify the default stoplist before creating the instance a Replace the instance stoplist immediately after creating the instance Replacing the instance stoplist immediately after creating the instance affects only that instance You must first create a user defined stoplist In both cases the result is that the Oracle Ultra Search instance stoplist is modified and defined before initial crawling This means that all documents collected by the Oracle Ultra Search crawler are evaluated against the correct stoplist It is important to modify the stoplist before initial crawling to avoid having to recrawl all documents again Modifying Instance Stoplists Before Ini
214. n an HTML document does not have its character set specified Cache Directory Specify the absolute path of the cache directory During crawling documents are stored in the cache directory Every time the preset size is reached crawling stops and indexing starts 8 14 Oracle Ultra Search User s Guide Crawler Page If you are crawling sensitive information then make sure that you set the appropriate file system read permission to the cache directory You can choose whether or not to have the cache cleared after indexing Crawler Logging Specify the following Level of detail everything or only a summary a Crawler logfile directory a Crawler logfile language The log file directory stores the crawler log files The log file records all crawler activity warnings and error messages for a particular schedule It includes messages logged at startup runtime and shutdown Logging everything can create very large log files when crawling a large number of documents However in certain situations it can be beneficial to configure the crawler to print detailed activity to each schedule log file The crawler logfile language is the language the crawler uses to generate the log file The crawler maintains multiple versions of its log file The format of the log file name is iinstance_iddsdata_source_id MMDDhhmm log where MM is the month DD is the date hh is the launching hour in 24 hour format and mm is the minutes For exa
215. nCacheImp1 for this data source In addition the data source should contain the field location equal to jdbc UltraSearchPooledDS user name password equal to the Oracle Ultra Search instance owner s database user name and password and URL equal to the JDBC connection string in the form of jdbc oracle thin database_host port oracle_sid See Also Editing the data sources xml File on page 3 21 for the data source configuration of the Oracle J2EE container Deploying the Oracle Ultra Search Portlet Oracle Ultra Search Portlet is a Web application contained in the SORACLE_ HOME ultrasearch webapp ultrasearch_portlet ear file This file is compliant to the J2EE 1 2 standard This file is similar to sample ear in terms of file structure Extract the archived file by running the following command jar xvf ultrasearch_portlet ear ultrasearch_portlet ear META INF application xml 3 20 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts query war agent index html All the query JSP pages are contained in query war This file is a servlet 2 2 compliant Web application You can deploy it alone with any servlet 2 2 engine The context root for query war is provider ultrasearch It is defined in the META INF application xml of the ultrasearch_portlet ear file You can change it by editing this file The following Java libraries are needed for the Oracle Ultra Search Portlet
216. nce cccccseeeeeeeees 9 11 lt iterAttributes gt Tag Show All Search Attributes 0 0 cece cece seeeeseeseseseseenees 9 13 lt iterGroups gt Tag Show All Search Group ccccccesseesescsteneeseecesesesescenenesesesnsneneneees 9 13 lt iterLanguages gt Tag Show All Search Languages cccscscescssessseecstenesesesseneeneees 9 14 lt iterLOV gt Tag Show All Values Defined for a Search Attribute 0 cccccsseeecees 9 15 Formulating the Query ccccccccceessssesesesessesesescecesescsesnenssesesesnesssesescesessseecenanssesesnaeneneees 9 15 lt getResult gt Tag Perform Search ccccesescessescesssesescenesesssescsneeseseeeeneseseecenenesesesnaeneneees 9 15 lt fetchAttribute gt Tag Metadata Selection ccccccceseeesesneteesesceseseseesenessesesneneneneees 9 16 lt showHitCount gt Tag Show Estimated Hit Count ccccecccesesceceseseeceneesesesneneeseees 9 17 lt iterResult gt Tag Render the Results cccccccceescesesesssescsneteseseeneseseseecenenesescsneneneseees 9 18 lt showAttributeValue gt Tag Render a Document Attribute 0 0 0 cccceeeeeseeteneeees 9 18 Oracle Ultra Search Crawler Agent API ceseseeceescseseseesesesesssssesesesesssesesesesesesenens 9 19 Crawler Agent OVerView setos epini ienne aoe ei oe Eai a eena i ae aaae e Eae o E en Ea iee 9 19 Standard Agenten er bet acaba Sel raa aa ee ea a Per ERE A E lla Mahaska 9 20 Smart A getitreerde ren enana EPR ESA anA Sea EEan AE AEE R aeae
217. nd is installed Locate the file SORACLE_HOME ultrasearch admin wk0addcpath sql Using SQL Plus run the wkO0addcpath sq1 script as the WKSYS super user or as a database user that has been granted the super user privileges This script only updates the CRAWLER_CONFIG_DEFAULT table You also need to reconfigure your crawlers to get the WK CRAWLER_CONF1IG table updated correctly When prompted specify whether you want to alter the default classpath or an instance specific classpath Altering the default classpath causes all subsequently created instances to use that classpath Existing instances are not modified When prompted enter the Oracle Ultra Search instance name if you are attempting to modify an instance specific classpath If you are modifying the default classpath then you do not need enter anything here When prompted specify whether you want to update the entire classpath or append to it Appending to a classpath adds entries to the beginning of it Usually earlier entries in the classpath override later entries in the case of duplicate classes When prompted enter the new classpath if updating the entire classpath If you are appending one or more directories or library files to the classpath then enter these separated by the classpath separator for the platform where the Oracle Ultra Search backend is installed the colon on UNIX platforms or the semicolon on Windows Altering the Crawler Java Cla
218. ndex Preferences snup ie sents aea ENER 4 6 Mathagine Stoplists is ci cci iis cisccs che ias menga o a R E Slesbbededeussereaba Jovestenas 4 7 Default Oracle Ultra Search Stoplist cccccceccecesescseensnseseseenesesescesenesesssesnaensseseeneeseseeeenes 4 7 Modifying Instance Stoplists cccccsesseesesesseescscecesescsesnesseseseeesesescesesesesssesnenensseseeeeeseseecenes 4 7 Modifying Instance Stoplists Before Initial Crawling cceeseeceeeeteteseeceeesesesnenenenens 4 8 Modifying Instance Stoplists After Initial Crawling ccccseesteseesteteseseeceeesesssnenenenens 4 9 Upgrading Oracle Ultra Search 00 0 0 cece ccc ceeeeseececscsesnseesececssensnececscessnsnesesesesensneseseeeses 4 9 Pre Upsrade Steps sis ieie ninin i eo Beebe ta clas Peesin das eer ER 4 10 Upgrading Oracle Ultra Search Shipped with Oracle Database ccccsseceeteteeeseeees 4 10 Upgrading Oracle Ultra Search Shipped with Oracle Application Servet ccce 4 10 Upgrading Oracle Ultra Search Shipped with Oracle Collaboration Suite 4 11 Upgrading Oracle Ultra Search to Oracle Collaboration Suite Release 1 0 0 4 11 Upgrade from Oracle Ultra Search 1 0 3 to 9 0 3 cccccccccesceecsseneesesssneesesceeenesesesnanenens 4 11 Upgrade from Oracle Ultra Search 9 0 2 to 9 0 3 cccccccccssescsneneesesesnsneeeceeenesssesnenenens 4 14 Upgrade from Oracle Ultra Search 9 2 to 9 0 3 ccccccsccceccecessescsneteesesesneesesceeenesssesnanenens 4 15 Configuring t
219. nens 7 12 Oracle Ultra Search Crawler Status Codes s sssesrsssssserttsrtestesrtesstestesssettssstestesntesstsnttesten tet 7 12 8 Understanding the Oracle Ultra Search Administration Tool Oracle Ultra Search Administration Tol cc ccceee cece ceeeeeeecececesenseesececesensneceeecenes 8 1 Setting Crawler Parameterss si ncicicssssstecessschs coteteseeseecusebsstied ara ea anenee ATES Rar ERENS E REAR 8 2 Setting Query QOPUONS iss aie Aeee e aE E SEE ese E AEDE EE EONS 8 3 VAN nE DLON hT RATA ANENE testes NAET E ET AREE A AT ET 8 3 Data Groups srona niii a a E AA A ENE E E ERE E O EAA 8 3 Online Help in Different Languages ss ssssssesssisresrerresrissessetessissenentestisntsnentesnesnnnnntensens 8 3 Logging On to Oracle Ultra Search s ssssssstsrtsrstssttssstestessterntesssestesntesstentessntentesseestesnterntenterst 8 3 Logging On and Managing Instances as SSO Users sessesessssrsisrtssstrtesstettesssesttsnteestenteess 8 5 Logging On to Oracle Ultra Search cccccccceccecccceesescecsceeseseecececsssnseececscesensnesececssessneseseceses 8 5 Granting Privileges to SSO Users ccccccccccescsesccceseseececscsssnseececscsssssnesececessnsnesecscesensneseseeanes 8 5 Instances Page iscscsedF saci pesscacssssesestoscssavessulsoastadssSedsosesithjvsteusunacobescasaaidossbsadassabsatdbslassdaraeessacctaketadeues 8 6 Creating an Instante a i a eiieeii t EEE Ee EE EEEE EE EAEE tE ETE TEE Aa C SEESE 8 7 Creating a
220. ng and the default character set To do so use the Crawler Settings Page in the administration tool See Also Crawler Page on page 8 12 Crawler Data Sources In addition to the Web access parameters you can define specific data sources on the Sources page in the administration tool You can define one or more of the following data sources a Web sites a Database tables a Files a Mailing lists a Oracle Application Server Portal page groups a User defined data sources requires crawler agent 7 2 Oracle Ultra Search User s Guide Document Attributes Using Crawler Agents If you are defining a user defined data source to crawl and index a proprietary document repository or management system such as Lotus Notes or Documentum then you must implement a crawler agent as a Java class The agent collects document URLs and associated metadata from the proprietary document source and returns the information to the Oracle Ultra Search crawler which enqueues it for later crawling For more information on defining a new data source type see the User Defined sub tab in Sources page in the administration tool Synchronizing Data Sources You can create synchronization schedules with one or more data sources attached to it Synchronization schedules define the frequency at which the Oracle Ultra Search index is kept up to date with existing information in the associated data sources To define a synchronization schedule use the So
221. ng Into an Existing Database Oracle Application Server MR can be installed on top of an existing database this is the preferred way of installing Oracle Ultra Search into an existing database Although doing so carries the overhead of installing every other Oracle Application Server component s schema it does give you the benefits of Oracle Application Server infrastructure well defined High Availability practices automated IM integration and re association and so on Nevertheless you can install just Oracle Ultra Search into an existing database Let s assume ORACLE_HOME is where you want to install the Oracle Ultra Search backend To install into an existing database follow these steps 1 If SORACLE_HOME ultrasearch already exists back it up by renaming it for example SORACLE_HOME ultrasearch old Copy repca_dir ultrasearch to ORACLE_HOME ultrasearch where repca_dir represents the topmost directory on the RepCA CD containing the ultrasearch directory Change the current directory to SORACLE_HOME ultrasearch admin If the Oracle Ultra Search schema wksys already exists on the target database then uninstall it by executing sqlplus nolog SORACLE_HOME ultrasearch admin wk0deinstall sql sys syspw cstr Following is the meaning of each parameter Next execute the SOL Plus script wkOsetup sql For example 3 4 Oracle Ultra Search User s Guide Installing the Oracle Ultra
222. ng the HTML based administration tool The loader tool supports the following types of metadata a Search attribute list of values LOVs and display names a Document relevance boosting and document loading 1 10 Oracle Ultra Search User s Guide Oracle Ultra Search Features See Also Appendix A Loading Metadata into Oracle Ultra Search Document Relevancy Boosting You can override the search results and influence the order that documents are ranked in the query result list with document relevancy boosting This can promote important documents to higher scores and make them easier to find Relevancy boosting assigns a score to a document for specific queries entered by the search user Note The document still has a score computed by Oracle Text if you enter a query that is not one of the boosted queries Relevancy boosting has the following limitations a Comparison of the user s query against the boosted queries uses exact string match This means that the comparison is case sensitive and space aware Therefore a document with a boosted score for Ultra Search is not boosted when you enter ultrasearch a Relevancy boosting requires that the query application pass in the search term in the Query API getResult method call The sample applications are designed to pass the basic search terms as the boost term Advanced search criteria based on search attributes are ignored See Also Queries Page on page
223. nstance mode UPDATABLE READ OUS_SCHEDULES This view displays schedule data for the current instance Oracle Ultra Search Views C 1 OUS_DEFAULT_CRAWLER_SETTINGS Column Name SCH_ID SCH_NAME SCH_INTERVAL SCH_LAST_RUN SCH_NEX SCH_STATUS SCH_ERROR SCH_CRAWLER_LOG_NAME SCH_CRAWLER_ID SCH_FORCE_RECRAWL SCH_CRAWL_HOME Type NUMBER VARC VARC DATE DATE VARC VARC VARC HAR2 100 HAR2 30 HAR2 100 HAR2 4000 HAR2 100 NUMBER NUMBI VARC ER HAR2 30 Description Schedule ID Schedule name Schedule interval Time stamp of last run Next scheduled execution Schedule status Schedule error if failed Log file name of current running crawler ID of the crawler to execute this schedule Force recrawl of the data source Schedule crawling mode OUS DEFAULT CRAWLER SETTINGS This view shows default crawler settings like crawling depth and log file directory Column Name DCS_NAME DCS_VALUE OUS CRAWLER SETTINGS Type VARCHAR2 30 VARCHAR2 4000 Description Crawler setting name Crawler setting value This view shows crawler setting at the data source level for the current instance Column Name Type Description CWS_DS_ID NUMBER Data source ID CWS_NAME VARCHAR2 30 Crawler setting name CWS_VALUE VARCHAR2 4000 Crawler setting value C 2 Or
224. nts seessesesssieseseessisrisisreresreresresess 2 11 Iss i g a QUery n cans oees Sav cick n aidsa ie K oe sven EEE EA aE EAE e EARE 2 13 Installing and Configuring Oracle Ultra Search Oracle Ultra Search Requirements 00 0 0 cc ceeeeececececeeseecesecesesseesececesensnesececesessneseseeenes 3 1 Hardware Requirements csistersesetissteaasiesectsteiotSecteas sot ton sesd a a aa na aeaa iina 3 1 Software Requirement sisin tasers a ui aA V intensities 3 3 Installing the Oracle Ultra Search Backend cece cece ceeseenecececesenseesesecesessneseeeeenes 3 3 Database Release esses velceiecdtecdisevotledese secs ENEE R aivevcasdcsiubedssevesdsvidivecusccdesee E 3 3 Oracle Application Server Release w i isceisusvecciihiivs cecvets sss seceste nv vessustesvcotsnstteateseseivessebesevenses 3 3 Installing As Part of Oracle Application Server Metadata Repository Creation 3 3 Installing Into an Existing Database cccccccccssseseccccesesescsesesesesescsesesesesescsesenesesescees 3 4 Post Installation Tasks for the Oracle Ultra Search Backend cccseccsccseseseeeenenesesesnetenens 3 6 Enabling Oracle Ultra Search to Process Binary Files cccccseseccsesssesesescseseneneeeeees 3 6 Configure the Oracle Database for Oracle Ultra Search cccccsteseesteneteesesteteeseees 3 6 Configure a Secure Oracle Ultra Search Installation cccccccecceseesesteneteesesteneeeeees 3 6 Backend Reconfiguration After
225. ny data sources as you want The following section explains how to create and edit data sources A Web source represents the content on a specific Web site Web sources facilitate maintenance crawling of specific Web sites Understanding the Oracle Ultra Search Administration Tool 8 21 Sources Page Creating Web Sources To create a new Web source do the following 1 Specify a name for the Web source and a starting address This is the URL for the crawler to begin crawling The starting address can be HTTP or HTTPS Set URL boundary rules to refine the crawling space You can include or exclude hosts or domains beginning with ending with or equal to a specific name For example an inclusion domain ending with oracle com limits the Oracle Ultra Search crawler to hosts belonging to Oracle worldwide Anything ending with oracle com is crawled but http www oracle com tw isnot crawled If you change the inclusion domain to yahoo com with a new seed hnttp www yahoo com then all oracle com URLs are dropped by the crawler An exclusion domain uk oracle com prevents the crawler from crawling Oracle hosts in the United Kingdom You can also include or exclude Web sites with a specific port By default all ports are crawled You can have port inclusion or port exclusion rules for a specific host but not both Exclusion rules always override inclusion rules Specify the types of documents the Oracle Ultra Search crawler sh
226. of hits the average number of results returned by each query Understanding the Oracle Ultra Search Administration Tool 8 41 Queries Page Top 50 Queries This summarizes the 50 most frequent queries in the past 24 hours Query string the query string Average query time the average time to return a result Number of queries the total number of queries in the past 24 hours Number of hits the average number of results returned by each query Frequency the number of queries divided by total number of queries over all query strings Percentage of ineffective queries the number of ineffective queries divided by total number of queries over all query strings Top 50 Ineffective Queries This summarizes the 50 most frequent queries in the past 24 hours Each row in the table describes statistics for a particular query string Query string the query string Number of queries the total number of queries made in the past 24 hours Percentage of ineffective queries the number of ineffective queries divided by total number of queries for that string Top 50 Failed Queries This summarizes the top 50 queries that failed over the past 24 hours A failed query is one where the search engine end user did not locate any query results The columns are Configuration Query string the query string Number of queries the total number of queries made in the past 24 hours Frequency the percentage occurrence of a failed query Cumulative fre
227. on For example http your_computer com http_port ultrasearch query search jsp 2 Enter Springmaster 2000 in the Search For field You should see the output displayed in Figure 2 2 Getting Started with Oracle Ultra Search 2 13 Issuing a Query Figure 2 2 Oracle Ultra Search Search Output Ultra Search Advanced Search Help Submit URL Search For Springmaster 2000 Submit Reftigerator fin Cantaloupe Everyone loves cantaloupe Most of us can t get enough of it In fact studies show that most people would Score 74 Last modified 2003 10 21 18 19 04 0 Page size 77268 From UltraAppliance Cantaloupe Cantaloupe Everyone loves cantaloupe Most of us can t get enough of it In fact studies show that most people would Score 74 Last modified 2003 10 21 18 19 09 0 Page size 1971 From UltraAppliance Everyone loves cantaloupe CANTALOUPE Everyone loves cantaloupe Most of us can t get enough of it In fact studies show that most people would Score 74 Author user Last modified 2003 10 21 18 19 07 0 Page size 99328 From UltraAppliance Review Drafts for Oracle Ultra Search UltraAppliance Product Documentation Springmaster 2000 Refrigerator Cantaloupe html Cantaloupe pdf Cantaloupe doc Score 67 Last modified 2003 10 21 18 19 11 0 Page size 1725 From UltraAppliance To query the Ultra Appliance database a Enter Cantaloupe Wrong Color in the Search For field You should output similar to the display in Figure 2 2 w
228. on different computers to enhance security or scalability If you do not want to use the sample query applications you can build your own query application by directly invoking the Oracle Ultra Search Java Query API Because the API is coded in Java you can invoke the API methods from any Java based application such as from a Java servlet or a JavaServer Page as in the case of the provided sample query applications For rendering emails that have been crawled and indexed you can also directly invoke the Oracle Ultra Search Java email API methods Sample Query Applications The sample query applications are located in the SORACLE_ HOME ultrasearch sample directory JavaServer Page Concepts As mentioned earlier you can use JSP code and the supplied Java APIs to create your Web application Typically your Web application runs in an application server such as Oracle Application Server The application server typically runs on a separate computer from the Oracle server for performance and scalability reasons The Oracle server holds the Oracle Ultra Search indexes JSP applications are compiled into Java servlets at runtime The compiled servlets run in one or more Java Virtual Machine processes The JSP application communicates with the Oracle server through the Oracle JDBC driver As in any Java application you must include the following files in your servlet engine classpath to use the Java query and email APIs a ORACLE_H
229. on is necessary when access URLs are used The UrlRewriter provides the following possible outcomes for links There is no change to the link The crawler inserts it as it is a Discard the link There is no insertion a Anew display URL is returned replacing the URL link for insertion a A display URL and an access URL are returned The display URL may or may not be identical to the URL link The generated new URL link is subject to all existing host path and mimetype inclusion and exclusion rules You must put the implemented rewriter class in a jar file and provide the class name and jar file name here Understanding the Oracle Ultra Search Administration Tool 8 23 Sources Page If Index Dynamic Page is set to Yes then dynamic URLs are crawled and indexed For data sources already crawled with this option setting Index Dynamic Page to No and recrawling the data source removes all dynamic URLs from the index Some dynamic pages appear as multiple search hits for the same page and you may not want them all indexed Other dynamic pages are each different and need to be indexed You must distinguish between these two kinds of dynamic pages In general dynamic pages that only change in menu expansion without affecting its contents should not be indexed Consider the following three URLs http itweb oraclecorp com aboutit network npe standards naming_ convention html http itweb oraclecorp com aboutit network npe stan
230. onfig property name userName value wk_test gt lt config property name passwors value wk_test gt lt config property name instanceName value wk_test gt lt connector factory gt After editing oc4j ra xm1 restart the OC4J instance If you do not see errors upon restart then the searchlet has been successfully instantiated and bound to JNDI Deploying and Binding the Federator Searchlet The Federator searchlet interacts with other searchlets to provide a single point of search against multiple repositories For example the Federator searchlet can invoke multiple Oracle Ultra Search searchlets to simultaneously query against multiple Oracle Ultra Search instances In the same manner the Federator searchlet can invoke searchlets for Oracle Files Email and so on The Federator searchlet is configured and managed with the Oracle Ultra Search administration tool under the Federated Sources tab The Federator searchlet is packaged as federator rar and is shipped under the SORACLE_HOME ultrasearch adapter directory The deployment procedure for federator rar is similar to the deployment of the Oracle Ultra Search searchlet To deploy the Federator searchlet in OC4J standalone use admin jar as follows java jar admin jar ormi lt hostname gt lt admin gt lt welcome gt deployment file federator rar name FederatorSearchlet To instantiate the searchlet the Federator searchlet requires four configuration
231. onnect to the remote crawler host through the RMI registry port specified at installation time If successful then the ActivationClient receives a remote reference to a Java object running on the remote host This remote Java object is known as the ActivatableCrawlerLauncher a The ActivationClient then instructs the ActivatableCrawlerLauncher to launch the Oracle Ultra Search crawler on the remote host The ActivatableCrawlerLauncher launches the Oracle Ultra Search crawler as a separate Java process on the remote host The RMI registry and daemon ports are inflexible Therefore if you have other RMI services running on the same host you will not be able to use RMI based remote crawling Also you cannot run two RMI based launchers because they will both conflict on the RMI ports JDBC Based Remote Crawling JDBC based remote crawling requires that the launcher be up and running a When a crawling schedule is activated the Oracle Ultra Search scheduler sends a message to the launcher a Ifthe launcher is running and properly connected to the database as the appropriate launch user or role then it can receive the launch messages Tuning and Performance 5 7 Using the Remote Crawler Otherwise the message times out after 30 seconds and launch failure is reported a The launcher then deciphers the launch message and spawns an Oracle Ultra Search crawler as a separate Java process on the remote host The launcher maintains
232. onnection to a single database instance the descriptor can be in the short form host port SID or the connect descriptor Oracle Net keyword value pair For example DESCRIPTION ADDRESS _ LIST ADDRESS PROTOCOL TCP HOST cls02a PORT 3999 CONNECT_DATA SERVICE_NAME acme us com To connect to any database instance the full database connect descriptor must be used For example DESCRIPTION LOAD_BALANCE yes ADDRESS _ LIST ADDRESS PROTOCOL TCP HOST cls02a PORT 3999 ADDRESS PROTOCOL TCP HOST cls02b PORT 3999 CONNECT_DATA SERVICE_ NAME acme us com See Also Oracle Database JDBC Developer s Guide and Reference for configuration details 5 14 Oracle Ultra Search User s Guide Oracle Ultra Search on Real Application Clusters You cannot configure Oracle Ultra Search to launch the crawler on any node ona non cluster file system To query on the existing launching instance configuration use the following PL SOL API WK_ADM GET_LAUNCH_INSTANCE RETURN VARCHAR2 This returns the name of the launching instance or the database name if any node can launch the crawler Remote Crawler File Cache The Oracle Ultra Search remote crawler requires that the remote file system be mounted on the Oracle instance for indexing For cluster file system Real Application Clusters the file system of the remote computer should be NFS mounted to all nodes of the system For non cluster file
233. op and restart it with the query application now configured in order to use the query application SORACLE_HOME bin searchctl stop SORACLE_HOME bin searchctl start Once the middle tier has been started you can access the query application by means of an HTML browser pointed to http your_computer com http_port ultrasearch query search jsp At the end of the installation you should be able to find http_port on the Oracle Universal Installer screen You can also find out the value of http_port by looking at SORACLE_HOME oc44 32ee 0C4J_SEARCH config http web site xml Setting up the Ultra Appliance Demo 2 4 You need to access and set up the Ultra Appliance demo before you begin this exercise To access the Ultra Appliance intranet site go to http otn oracle com products ultrasearch gettingstarted Oracle Ultra Search User s Guide Setting up the Ultra Appliance Demo To set up the Ultra Appliance company database 1 Copy the appliances sql script shown in the Example 2 1 into a text editor and save this file as appliances sql Place the appliances sql script into your database server computer Upload the appliance sql file to your database schema WK_TEST by executing the following commands prompt gt sqlpls WK_TEST WK_TEST SQLPLUS gt appliance sql SQLPLUS gt commit SOLPLUS gt exit Getting Started with Oracle Ultra Search 2 5 Crawl and Index Ultra Appliance s Intranet Documents
234. or later crawling See Also Oracle Ultra Search Crawler Agent API on page 9 19 To define a new data source you first define a data source type to represent it Creating User Defined Data Source Types To create edit or delete data source types click Manage Source Types To create a new type click Create New Type 1 Specify data source type name description and crawler agent Java class file or jar file name The crawler agent Java classpath is predefined at installation time The agent collects the list of document URLs and associated metadata from the proprietary document source and returns it to the Oracle Ultra Search crawler which enqueues the information for later crawling The agent class file or jar file must be located under SORACLE_HOME ultrasearch lib agent 8 32 Oracle Ultra Search User s Guide Sources Page 2 Specify parameters for this data source type If you add parameters you must enter the parameter name and a description Also you must decide whether to encrypt the parameter value Edit data source type information by changing the data source type name description crawler agent Java class jar file name or parameters Creating User Defined Sources To create a user defined data source select the type and click Go 1 Specify a name default language and parameter values for the data source For information on default languages see the Crawler Page Specify the authentication set
235. or limit the crawling depth Monitoring the Crawling Process URL Looping Monitor the crawling process by using a combination of the following methods a Monitoring the schedule status with the administration tool a Monitoring the real time schedule progress with the administration tool a Monitoring the crawler statistics with the administration tool a Monitoring the log file for the current schedule URL looping refers to the scenario where for some reason a large number of unique URLs all point to the same document One particularly difficult situation is where a site contains a large number of pages and each page contains links to every other page in the site Ordinarily this would not be a problem because the crawler eventually analyzes all documents in the site 5 2 Oracle Ultra Search User s Guide Tuning Query Performance However some Web servers attach parameters to generated URLs to track information across requests Such Web servers might generate a large number of unique URLs that all point to the same document For example http mycompany com somedocument html p_origin_ page 10 might refer to the same document as http mycompany com somedocument html p_origin_page 13 but the p_origin_page parameter is different for each link because the referring pages are different If a large number of parameters are specified and if the number of referring links is large then a single unique document could have thousan
236. or scheduling Java VM for launching JDK or JRE for crawling t file local file PA gt N LA Mailing IMAP list Server Introduction to Oracle Ultra Search 1 15 Oracle Ultra Search System Configuration 1 16 Oracle Ultra Search User s Guide 2 Getting Started with Oracle Ultra Search Overview This chapter contains the following topics Overview Installation Setting up the Sample Query Application Setting up the Ultra Appliance Demo Crawl and Index Ultra Appliance s Intranet Documents Crawl and Index Ultra Appliance s Database Documents Issuing a Query This chapter provides information about getting started with Oracle Ultra Search It features an example scenario that describes installation and use of Oracle Ultra Search It enables you to create a browser based search application to query data sources For this example a fictional customer service call center named Ultra Appliance Inc is used Ultra Appliance is a major retail company that sells and supports hundreds of different appliances from dozens of different manufacturers in stores nationwide Customers contact the customer service call center every day to receive technical help for and assistance with an appliance Ultra Appliance call center agents must access a variety of online resources to provide the information needed by the customer Getting Started with Oracle Ultra Search 2 1 In
237. ou may need to import certificates into the crawler s trust store and the Oracle Containers for J2EE OC4J JVM s truststore The Oracle Ultra Search administration tool is a Web application that runs within the OC4J JVM Secure portal instances require clients to authenticate with SSL To discover page groups in secure portal instances the Oracle Ultra Search administration tool must make HTTPS network calls By default the OC4J JVM recognizes certificates of well known certificate authorities However if the secure portal instance uses a self signed certificate or a certificate signed by an unknown certificate authority then you must import the portal s certificate into the OC4J JVM s truststore This can be done with the keytool utility provided by Sun Microsystems The OC4J JVM default truststore is located at SORACLE HOME jdk jre lib security cacerts See Also a Sun Microsystems documentation for more information about using Sun s keytool key and certificate management utility for information on customization of the SSL service and for information on truststore management a Oracle Application Server Containers for J2EE documentation for information on configuring OC4J to use a different truststore a Security of Oracle Ultra Search Actual Oracle Ultra Search security is handled by the dictionary data in the Oracle Ultra Search database the administrative user and password data Classes of Users and Their Privilege
238. ould process for this source HTML and plain text are default document types that the crawler always processes Specify the authentication settings This step is optional Under HTTP Authentication specify the user name and password for host realm for which authentication is required Under HTML Forms you can register HTML forms that you want the Oracle Ultra Search crawler to automatically fill out during Web crawling HTML form support requires that HTTP cookie functionality is enabled Click Register HTML Form to register authentication forms protecting the data source Note For the form URL to be crawled you must verify that the URL is not excluded in the robots txt file If so then you must disable robot exclusion for this data source By default Oracle Ultra Search enables robot exclusion Choose either No ACL or Ultra Search ACL for the data source When a user performs a search the ACL access control list controls which documents the user can access The default is no ACL with all documents considered searchable and visible You can add more than one group and user to the ACL for the data source The option to choose is only available if the instance is security enabled 8 22 Oracle Ultra Search User s Guide Sources Page Define edit or delete metatag mappings for your Web source Metatags are descriptive tags in the HTML document header One metatag can map to only one search attribute Override the default crawler
239. ounter the situation where the next extent that the Oracle server wants to allocate is larger than what is available In such a situation indexing halts until new extents can be added to the tablespace To mitigate this problem certain instance specific tables have explicit storage parameter settings The initial extent size next extent size and PCTINCREASE setting are defined for these tables These tables are created when a new instance is created The tables and their storage clause settings are as follows DRSWKSDOC_PATH_IDX I initial extent size 5M next extent size 50M PCTINCEASE 1 DRSWKSDOC_PATH_IDXSK initial extent size 5M next extent size 50M PCTINCEASE 1 If you want greater read and write performance create the tablespace on raw devices Be sure to create a new large tablespace for each Oracle Ultra Search instance user See Also Oracle Database SQL Reference for more information on creating tablespaces and managing storage settings a Oracle Database Administrator s Guide for information on how to create a tablespace Step 4 Create and Configure New Users for Oracle Ultra Search Instances Oracle Ultra Search uses Oracle s fine grained access control feature to support multiple Oracle Ultra Search instances within one physical database This is especially useful for large organizations or application service providers ASPs that want to host multiple disjoint search indexes
240. ources with logging enabled email data sources and some user defined data sources Creating a Regular Instance To create an instance do the following 1 Prepare the database user Every Oracle Ultra Search instance is based on a database user schema with the WKUSER role The database user you create to house the Oracle Ultra Search instance should be assigned a dedicated self contained tablespace This is important if you plan to ever create snapshot instances of this instance To do this create a new tablespace Then create a new database user whose default tablespace is the one you just created See Also Configuring the Oracle Server for Oracle Ultra Search on page 4 2 for information and instructions on configuring database users for Oracle Ultra Search Creating a Snapshot Instance on page 8 8 2 Follow instance creation in the Oracle Ultra Search administration tool Understanding the Oracle Ultra Search Administration Tool 8 7 Instances Page From the main instance creation page click Create Instance and provide the following information a Instance name a Database schema this is the user name from step 1 a Schema password You can also enter the following optional index preferences a Lexer Specify the name of the lexer you want to use for indexing The lexer breaks text into tokens according to your language These tokens are usually words The default lexer is wk sys wk_lexer
241. owing steps 1 Choose the option OracleAS Application Server 10g and click Next 2 Choose the option B Portal and Wireless and click Next 3 On the Configuration Options screen make sure OracleAS Portal is checked This allows the Oracle Portal Configuration Assistant OPCA to configure Oracle HTTP Server and OC4J with Oracle Ultra Search If you uncheck this option then you must follow the instructions under Configuring the Middle Tier with Oracle HTTP Server and OC4J to set up Oracle HTTP Server and OC4J manually 4 Continue with the installation until Oracle Application Server is successfully installed Note If you decide to use a third party J2EE container or a servlet engine then uncheck the option OracleAS Portal on the Configuration Options screen of Oracle Installer and see the Deploying the Oracle Ultra Search EAR File on a Third Party Middle Tier on page 3 19 Upon completion of this step all middle tier files are copied under the ORACLE_HOME Installing and Configuring Oracle Ultra Search 3 13 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts If you checked the OracleAS Portal option on the Configuration Options Oracle Installer screen then the configuration steps in the following section are automatically performed by the Oracle Portal Configuration Assistant OPCA If not then you must manually perform the steps under Configuring the M
242. oyees SELECT FROM USER_TABLES DROP TABLE hr employees Convention Meaning Example lowercase Lowercase typeface indicates SELECT last_name employee_id FROM programmatic elements that you supply employees For example lowercase indicates names sqlplus hr hr of tables columns or files CREATE USER mjones IDENTIFIED BY ty3MU9 Note Some programmatic elements use a mixture of UPPERCASE and lowercase Enter these elements as shown Conventions for Windows Operating Systems The following table describes conventions for Windows operating systems and provides examples of their use Convention Meaning Example Choose Start gt How to start a program To start the Database Configuration Assistant File and directory names C gt File and directory names are not case choose Start gt Programs gt Oracle HOME_ NAME gt Configuration and Migration Tools gt Database Configuration Assistant c winnt system32 is the same as sensitive The following special characters Cc WINNT SYSTEM32 are not allowed left angle bracket lt right angle bracket gt colon double quotation marks slash pipe 1 and dash The special character backslash is treated as an element separator even when it appears in quotes If the file name begins with then Windows assumes it uses the Universal Naming Convention Represents the Windows command prompt of the current hard disk drive The
243. page A 5 Example of the Document Relevance Boosting XML File lt xml version 1 0 encoding UTF 8 gt lt doc_list gt lt doc url http www oracle com data_source_name Data Source A gt lt term score 100 gt database lt term gt lt term score 90 gt internet lt term gt lt term score 80 gt software lt term gt lt doc gt lt doc url http www st us oracle com data_source_name Data Source B gt lt term score 100 gt Sever Technology lt term gt lt term score 100 gt ST Web site lt term gt lt term score 95 gt st lt term gt lt doc gt lt doc_list gt In the previous example the document URL http www oracle com is loaded to the data source Data Source A This is defined in Oracle Ultra Search with relevance boosting term database and score 100 term internet and score 90 term software and score 80 Note The data source name is the original data source name not the data source display name Loading Search Attribute LOVs and LOV Display Names To use loader tool to add LOV entries and display names to Oracle Ultra Search the parameter type value should be lov The LOV XML File The LOV entries and display names are defined in a XML file You can define one or more search attribute LOVs in the XML file Both default LOV and data source specific LOVs are put in the XML file The definition of the XML file is stored in the XML schema See Also XML Schema for LOVs and LOV
244. part of crawling and indexing is stored in this user s schema Asa general guideline create the tablespace as large as the total amount of data that you want to index For example if you approximate that the total amount of data to be crawled and indexed is 10GB then create a tablespace that is at least 10GB for the Oracle Ultra Search instance user Make sure to assign that tablespace as the default tablespace of the Oracle Ultra Search instance user 3 2 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Backend Software Requirements The Oracle Ultra Search middle tier components are Web applications Therefore they require a Web server to run Oracle recommends using Oracle Application Server Installing the Oracle Ultra Search Backend The Oracle Ultra Search backend consists of the following a Oracle Ultra Search database schema data dictionary and PL SQL packages a Oracle Ultra Search crawler Java program plus supporting files libraries and so forth a Oracle Ultra Search remote crawler a crawler residing on a remote Oracle home Database Release The Oracle Ultra Search backend is installed as part of the Oracle Database Server install which is accomplished using the Oracle Universal Installer OUI For more information please refer to the Oracle Universal Installer Concepts Guide Oracle Application Server Release The Oracle Ultra Search backend is installed as part of the Oracle App
245. pecifies that the token immediately following it must appear in all documents included in the search result The minus operator specifies that the token immediately following it cannot appear in any document included in the search result The asterisk specifies a wildcard search It matches zero or more characters A token starting with the asterisk is ignored The asterisk can only be specified at the end right side or middle of a token For example hel o and hell use the asterisk correctly but ello is unacceptable The following table summarizes the rules for the Oracle Ultra Search end user query syntax 9 4 Oracle Ultra Search User s Guide Customizing the Query Syntax Expansion Note All end user query strings are encased in square braces For example the end user query string Oracle Applications is notated as Oracle Applications Rule Description Single word search Entering one word finds documents that contain that word For example searching for Oracle finds all documents that contain the word Oracle anywhere in that document Note Searching for Oracle is not equivalent to Oracle Multiple word search Entering more than one word finds documents that each contain any of those words in any order For example searching for Oracle Applications finds documents that contain Oracle or Applications or Oracle Applications Compulsory inclusion Attaching a in
246. quency the cumulative percentage occurrence of all failed queries See Also Tuning Query Performance on page 5 3 You can configure the query application and the federation engine with several parameters including the maximum number of hits and enabling relevancy boosting 8 42 Oracle Ultra Search User s Guide Users Page Users Page Preferences Super Users Use this page to manage Oracle Ultra Search administrative users You can assign a user to manage an Oracle Ultra Search instance You can also select a language preference This section lets you set preference options for the Oracle Ultra Search administrator You can specify the date and time format The pull down menu lists the following languages a English a Brazilian Portuguese a French German a Italian Japanese a Korean a Simplified Chinese Spanish a Traditional Chinese You can also select the number of rows to display on each page A user with super user privileges can perform all administrative functions on all instances including creating instances dropping instances and granting privileges Only super users can access this page Single sign on SSO users can use a delegated administrative service DAS list of values to add another SSO user as a super user These users are authenticated by the SSO server before allowing access Database users can add another database user as a super user Understanding the Oracle Ultra
247. quired type xsd string gt lt xsd attribute name display_name use required type xsd string gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd attribute name lang use required gt lt xsd simpleType gt lt xsd restriction base xsd string gt lt xsd length value 5 gt lt xsd pattern value a zA Z 2 a zA 2 2 gt lt xsd restriction gt lt xsd simpleType gt lt xsd attribute gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd attribute name name use required type xsd string gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd attribute name search_attr_name use required type xsd string gt lt xsd attribute name search_attr_type use required gt lt xsd simpleType gt lt xsd restriction base xsd string gt Loading Metadata into Oracle Ultra Search A 7 XML Schema for LOVs and LOV Display Names lt xsd enumeration value string gt lt xsd enumeration value number gt lt xsd enumeration value date gt lt xsd restriction gt lt xsd simpleType gt lt xsd attribute gt lt xsd complexType gt lt xsd element gt lt xsd sequence gt lt xsd complexType gt lt xsd element gt lt xsd schema gt A 8 Oracle Ultra Search User s Guide B Altering the Crawler Java Classpath The Oracle Ultra Search crawler is a
248. r port a During the installation of the Oracle Ultra Search backend a new ultrasearch instance owner WK_TEST is created Log on to the Oracle Ultra Search administration tool by entering the Oracle Ultra Search instance owner s database user name and password The nature of JSP pages is such that the first time any page is accessed it takes a few seconds to compile Subsequent accesses are much faster If you log on to the Oracle Ultra Search administration tool successfully then you have completed the Oracle Ultra Search administration tool configuration process Testing the Oracle Ultra Search Sample Query Applications After you verify that the Oracle Ultra Search administration tool is working you should be able to run the Oracle Ultra Search sample query applications To test the Oracle Ultra Search sample query applications do one of the following m Visit http hostname domainname port ultrasearch query search jsp a Follow the links in the Oracle Ultra Search welcome page http hostname domainname port ultrasearch index html See Also Configuring the Middle Tier with Oracle HTTP Server and OC4J on page 3 14 for information about configuring the JSP to query a specific instance Locations for sample query applications are listed in the following section Access the sample query source code by going to the directories list You can also see a working demo of each sample query JSP page with the URL root and you c
249. r example if the document has 4 attributes Company Name Category Revenue S amp P Rating then it is specified as Company Name Company 1 Category Classification 1 Revenue Revenue 0 Rating Analyst Rating 1 a Log File Name log file Oracle Ultra Search Developer s Guide and API Reference 9 25 Oracle Ultra Search J ava Email API a Log Directory Location of log file Defining a Data Source of this Type A data source is defined which initializes the data source parameters For example the value specified accesses a table whose schema is the following TABLE NEWS ARTICLE_NO NUMBER EWS_URL VARCHAR2 740 TITLE VARCHAR2 200 AUTHOR VARCHAR2 100 PUB_DATE DATE default SYSDATE PUBLISHER VARCHAR2 100 PRICE NUMBER LANG VARCHAR2 10 GNORE NUMBER DEFAULT 0 PRIMARY KEY NEWS_URL a Database Connect String dlsun1710 5521 search User Name SCOTT a Password TIGER a Table Name NEWS a URL Column NEWS_URL Ignore Flag Column IGNORE Language Column LANG a Attribute List ARTICLE_NO Article Number 0 TITLE Article Title 1 AUTHOR Author 1 PUB_DATE Report Date 2 PUBLISHER Newspaper 1 PRICE Download Cost 0 a Log File Name testagent log a Log Directory tmp ultrasearch Oracle Ultra Search Java Email API Oracle Ultra Search provides a Java API for accessing archived emails The Oracle Ultra Search query application uses the API to display emails
250. r file name for example SampleRewriter and sample jar in the administration tool in step 2 of Creating Web Sources on page 8 22 or in the crawler parameters page of an existing Web data source Enable the UrlRewriter option from Web Sources page in the administration tool Crawl the target Web data source by launching the corresponding schedule The crawler log file confirms the use of the URL rewriter with the message Loading URL rewriter SampleRewriter Note URL rewriting is available for Web data sources only 9 32 Oracle Ultra Search User s Guide Oracle Ultra Search Sample Query Applications See Also a Oracle Ultra Search Java API Reference for the API oracle ultrasearch crawler package The sample URL rewriter SampleRewriter java under SORACLE_HOME ultrasearch extension a Web Sources on page 8 21 Oracle Ultra Search Sample Query Applications Oracle Ultra Search provides several sample query applications and a sample crawler agent Use the sample query applications as examples for creating your own query application The query applications are written as J2EE compliant Web applications Your query application uses the Oracle Ultra Search query API You can also use the sample crawler agent to create your own crawler agent Note Pointers to the sample query applications and the sample crawler agent Java source code as well as their corresponding readmes are in the Oracle
251. r only crawls URLs left in the URL queue URLs already crawled are not touched on recrawl Tf the URL queue is empty but there is a new seed added since the last crawl then the crawler only crawls the new seed Ifthe URL queue is empty and there is no new seed URL then the crawler recrawls all crawled URLs Therefore if you stop the crawler and set Index Dynamic Pages to No this only affects the URLs in the queue yet to be crawled The already crawled dynamic pages are removed from the index on the third recrawl when the queue is empty Note All crawled URLs are subject to crawler setting enforcement not just newly crawled URLs Update crawling mode You can update the crawling mode to the following Automatically accept all URLs for indexing This mode crawls and indexes Examine URLs before indexing This mode crawls only For initial planning purposes you might want the crawler to collect URLs without indexing After crawling is done you can examine document URLs and status remove unwanted documents and start indexing Index only This mode indexes only The crawler behaves differently for the documents collected Crawling mode and recrawl policy can be combined for six different combinations For example Process All Documents and Index Only forces reindexing existing documents in this data source while Process Documents That Have Changed and Index Only re indexes only changed documents
252. racle Applications Oracle Applications Oracle Applications within TITLE__ 31 2 Oracle Applications Oracle Applications Oracle Applications within TITLE__ 31 2 Oracle Applications Ora Ora within TITLE__31 2 Ora Oracle Applications Oracle Applications 2 Oracle Applications 2 Orac le Application within TITLE_ 31 2 Oracle Applications 2 Oracle Applications 2 Oracle Applications n Customizing the Rules Customize this expansion to suit your organization s purposes by defining and implementing your own query syntax expansion You should have detailed understanding of Oracle Text queries using the ctxsys contains operator Oracle Text offers a rich set of linguistic features such as thesaurus theme stemming and soundex as a part of its query language See Also a Oracle Text Application Developer s Guide a Oracle Text Reference To customize Oracle Ultra Search to use your own implementation of the query syntax expansion use the oracle ultrasearch CtxContains class in your query application instead of the oracle ultrasearch query Contains class CtxContains lets you use any Oracle Text query as a part of an Oracle Ultra Search query Use the following steps 9 8 Oracle Ultra Search User s Guide Oracle Ultra Search Query Tag Library 1 Construct a Oracle Text query based on the user s input
253. racle Ultra Search crawls database tables in the local Oracle Database instance where Oracle Ultra Search is installed Additionally it can crawl remote databases if they have been linked to the main Oracle Database Remote databases are linked to the main Oracle instance with database links See Also Oracle Database Administrator s Guide for instructions on how to create database links Oracle Ultra Search provides a logging mechanism to optimize crawling of table sources Using this logging mechanism only newly updated documents are revisited during the crawling process If the source database is not an Oracle database then you must perform a sequence of steps to use this feature Synchronizing Crawling of Oracle Databases Before creating log tables and log triggers make sure that the Oracle Ultra Search instance schema has the CREATE ANY TABLE and CREATE ANY TRIGGER system privileges For tables in Oracle databases data definition language DDL statements are provided to create the following Create Log Table The log table stores changes that have occurred in the base table The Oracle Ultra Search crawler uses the change information to figure out which rows need to be recrawled For example a log table generated by Oracle Ultra Search could be named WKS LOG The structure of the log table conforms to the following rules a For every primary key column of the base table a column must be created in
254. rce and saves it in the crawler queue before processing it Note If the crawler is interrupted for any reason then the agent invocation process is repeated with the original last crawl time stamp If the crawler finished enqueuing URLs fetched from the agent and is half way done crawling then the crawler only starts the agent but does not try to fetch URLs from the agent Instead it finishes crawling the URLs already enqueued There are two kinds of crawler agents a Standard Agent Smart Agent Standard Agent The standard agent returns the list of URLs currently existing in the data source It does not know whether any of the URLs had been crawled before and it relies on the crawler to find any updates to the target data source The standard agent s interaction with the crawler is the following a Crawler marks all existing URLs of this data source for garbage collection assuming they no longer exist in the target data source 9 20 Oracle Ultra Search User s Guide Oracle Ultra Search Crawler Agent API a Crawler calls the agent to get an updated list of URLs It marks for crawling every URL that already exists If it is new it inserts it into the URL table and queue a Crawler deletes the URLs that are still marked for garbage collection Crawler goes through every URL marked for crawling and checks for updates Smart Agent The smart agent uses a modified since time stamp provided by the crawler to
255. rch Each index can be maintained separately By querying the data source at search time search results are always the latest results User credentials can be passed to the data source and authenticated by the data source itself Queries can be processed efficiently using the data s native format To use federated search you must deploy an Oracle Ultra Search search adapter or searchlet and create an Oracle Database source A searchlet is a Java module deployed in the middle tier inside OC4J that searches the data in an enterprise information system on behalf of a user When a user s query is delegated to the searchlet the searchlet runs the query on behalf of the user Every searchlet is a JCA 1 0 compliant resource adapter See Also Federated Sources on page 8 30 Oracle Ultra Search Release Information Oracle Ultra Search is released with the Oracle Database Oracle Application Server and Oracle Collaboration Suite Because of different release numbers in the past the Oracle Ultra Search release numbers are somewhat confusing a Oracle Ultra Search 9 0 4 is part of Oracle Application Server release 10g 9 0 4 a Oracle Ultra Search release 9 0 3 is part of the Oracle Collaboration Suite release 9 0 3 a Oracle Ultra Search release 9 2 is part of Oracle9i release 9 2 Oracle Ultra Search release 1 0 3 was part of Oracle9i release 1 9 0 1 a Oracle Ultra Search release 9 0 2 is part of Oracle9iAS release 2 9 0 2 XXx
256. rch backend not the middle tier is installed The classes needed for compilation are the JDK class classes zip Oracle JDBC Thin Driver classes12 zip and ultrasearch jar For example 9 24 Oracle Ultra Search User s Guide Oracle Ultra Search Crawler Agent API javac J ms16m J mx96m O classpath jdk1 2 2 05 lib classes zip lib classes12 zip SORACLE_HOME ultrasearch lib ultrasearch jar SampleAgent java To build the SampleAgent jar file enter the following jdk1 2 2_05 bin jar cv0f oracle ultrasearch lib agent SampleAgent jar SampleAgent class SampleAgent DocNode class Creating a Data Source Type A data source type that uses the sample agent must be created first Name URL table type a Description Table with rows of URLs a Agent Name SampleAgent a Agent Jar File sampleagent Defining Data Source Parameters Define parameters for a data source type a Database Connect String DB connection a User Name schema owner of the URL table a Password schema owner password encrypted a Table Name URL table name a URL Column Column holding doc URLs Ignore Flag Column 1 for ignoring 0 otherwise Language Column Document Language a Attribute List List of column for attributes a Itis in the following format column name attribute name lt data type gt column name attribute name lt data type gt where lt data type gt 0 is number 1 is string and 2 is date Fo
257. rch query CtxContains class Oracle Ultra Search Developer s Guide and API Reference 9 3 Customizing the Query Syntax Expansion This section describes the default query expansion rules and how to customize the query syntax expansion to suit your organization s preferences Default Query Syntax Expansion Implementation The default query syntax expansion implementation directly affects the following The way the end user enters a query string known as the end user query syntax The way the documents matching the query are scored known as scoring The way the end user s query string is transformed into an Oracle Text query string known as the expansion rules The default query syntax expansion is implemented in the oracle ultrasearch query Contains class The sample query applications makes use of this syntax expansion for content search as well as string attribute search End User Query Syntax The end user query syntax defined by the default query syntax expansion implementation is similar to the standard text query syntax employed by most search engines on the Web Token A token is a string enclosed in double quotes It can be a single word or a phrase Operators The default implementation defines three operators They are the and operators These operators are defined by the default implementation Change these operators to whatever you prefer in your own custom implementation The plus operator s
258. rchitecture OFA guidelines All subdirectories are not under a top level ORACLE_HOME directory There is a top level directory called ORACLE_BASE that by default is C oracle If you install the latest Oracle Database release on a computer with no other Oracle software installed then the default setting for the first Oracle Database home directory is C oracle orann where nn is the latest release number The Oracle home directory is located directly under ORACLE_BASE All directory path examples in this guide follow OFA conventions Refer to Oracle Database Platform Guide for Windows for additional information about OFA compliances and for information about installing Oracle products in non OFA compliant directories Documentation Accessibility Our goal is to make Oracle products services and supporting documentation accessible with good usability to the disabled community To that end our documentation includes features that make information available to users of assistive technology This documentation is available in HTML format and contains markup to facilitate access by the disabled community Standards will continue to evolve over time and Oracle is actively engaged with other market leading technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers For additional information visit the Oracle Accessibility Program Web site at XXV xxvi http www oracle
259. rectory See Also JDBC Based Remote Crawling on page 5 7 and Remote Crawler Profiles on page 8 16 Manual Launch Scheduling A schedule can be created with no scheduled launch time so that it can only be started on demand See Also Data Synchronization on page 8 34 Crawler Log File Versioning For each data source the crawler will preserve the latest 3 log files This avoids wiping out previous crawling log file on recrawl See Also Crawler Logging on page 15 New PL SQL Administration APIs Oracle Ultra Search now includes APIs for various administration tasks such as crawler schedule and instance administration See Also Chapter 10 Administration PL SQL APIs Integration with Oracle Internet Directory Oracle Internet Directory OID is Oracle s native LDAP v3 compliant directory service built as an application on top of the Oracle Database Oracle Ultra Search integrates with Oracle Internet Directory in the following areas a Oracle Ultra Search administration groups and group membership are stored in Oracle Internet Directory Users are authenticated through the single sign on SSO server and Oracle Internet Directory a Oracle Internet Directory performs authorization on Oracle Ultra Search users administration privileges See Also Integration with Oracle Internet Directory on page 1 12 Cookie Support Cookie support is enabled by default Crawler Cache Deletion Control During crawl
260. restrict a launcher to specific Oracle Ultra Search instances Once registered all Oracle Ultra Search instances may use it However the launcher is only a sub system for launching remote crawler processes Having multiple instances use the same launcher for launching purposes poses no security problems for most customers 5 6 Oracle Ultra Search User s Guide Using the Remote Crawler Both RMI and JDBC launchers are simply Java processes themselves They are started from the command line Oracle provides scripts for starting these launchers described in the following section Also the JDBC launcher must establish JDBC connections to the Oracle Ultra Search backend database to listen for launch events You must specify the launch user or role at registration time Oracle strongly recommends that you create a new database user or role specifically for the purposes of launching remote crawlers You should not use this user or role for any other purposes RMI Based Remote Crawling RMI based remote crawling depends on the standard RMI infrastructure Therefore each remote crawler host must have an RMI registry and an RMI daemon running These are started when you run the scripts to start the RMI based launcher a When a crawling schedule is activated the Oracle Ultra Search scheduler launches a Java program as a separate process on the database host This Java program is known as the ActivationClient a This program attempts to c
261. rom the main instance creation page click Create Read Only Snapshot Instance and provide the following information Understanding the Oracle Ultra Search Administration Tool 8 9 Instances Page Snapshot instance name Snapshot schema name this is the database user from step 1 Snapshot schema password Database link this is the name of the database link to the database where the master instance lives Master instance name 4 Enable the snapshot for secure searches If the master instance for the snapshot of is secure search enabled and if the destination database that you are making a snapshot in supports secure search enabled instances then you must also run a PL SQL procedure in the destination database where you are creating the snapshot Running this procedure translates the IDs of the access control lists ACLs in the destination database rendering them usable Log on to the database as the WKS YS user Invoke the procedure as follows exec WK_ADM USE_INSTANCE instance_name exec WK_ADM TRANSLATE _ACL_IDS j where instance_name is the name of the snapshot instance Make sure that this statement completes successfully without error Selecting an Instance You can See Also a Chapter 4 Post Installation Information for information on changing the WKSYS password and for instructions on configuring database users for Oracle Ultra Search a Oracle Database Administrator s Guide for details on using
262. roup Click Delete to remove existing portal data sources You can edit the types of documents the Oracle Ultra Search crawler should process for a portal source HTML and plain text are default document types that the crawler always processes To edit document types click Edit for the portal source after it has been created See Also The Oracle Application Server Portal documentation Federated Sources To create federated sources specify the name and JNDI for the new data source By default no resource adapter is available To create a federated source you must manually deploy the Oracle Ultra Search resource adapter or searchlet A searchlet is a Java module deployed in the middle tier inside OC4J that searches the data in an enterprise information system on behalf of a user A searchlet is a Java module deployed in the middle tier inside OC4J that searches the data in an enterprise information system on behalf of a user When a user s query is delegated to the searchlet the searchlet runs the query on behalf of the user Every searchlet is a JCA 1 0 compliant resource adapter See Also The JCA 1 0 spec from Javasoft for detailed information on resource adapters and Java Connector Architecture Deploying and Binding the Oracle Ultra Search Searchlet The Oracle Ultra Search searchlet enables queries against one Oracle Ultra Search instance The Oracle Ultra Search searchlet is packaged as ult rasearch rar and is shipped under the
263. rt for Chinese Japanese and Korean CJK and Unicode Oracle Ultra Search Components Oracle Ultra Search is made up of the following components a Oracle Ultra Search Crawler a Oracle Ultra Search Backend a Oracle Ultra Search Administration Tool a Oracle Ultra Search APIs and Sample Applications Oracle Ultra Search Crawler The Oracle Ultra Search crawler is a Java process activated by your Oracle server according to a set schedule When activated the crawler spawns a configurable number of processor threads that fetch documents from various data sources and index them using Oracle Text This index is used for querying Data sources can be 1 2 Oracle Ultra Search User s Guide Oracle Ultra Search Components Web sites database tables files mailing lists Oracle Application Server Portal page groups or user defined data sources The crawler maps links and analyzes relationships The crawler schedule is integrated with and driven from the DBMS_JOB queue mechanism Whenever the crawler encounters embedded non HTML documents during the crawling it uses Oracle Text filters to automatically detect the document type and filter and index the document See Also Chapter 7 Understanding the Oracle Ultra Search Crawler and Data Sources Oracle Ultra Search Backend The Oracle Ultra Search backend consists of an Oracle Ultra Search repository and Oracle Text Oracle Text provides the text indexing and search capabilities required
264. runs a Enterprise Manager users Understanding the Oracle Ultra Search Administration Tool 8 3 Logging On to Oracle Ultra Search a Portal SSO users To log on to the administration tool point your Web browser to one of the following URLs a For non SSO mode http hostname port ultrasearch admin index jsp a ForSSO mode http hostname port ultrasearch admin_ sso index jsp Immediately after installation the only users able to create and manage instances are the following a The WKSyYS database user a The Enterprise Manager user a The PORTAL SSO user belonging to the default company not supported in the Oracle database release a The ORCLADMIN SSO user belonging to the default company this is available only if it the Oracle Identity Management infrastructure is installed After you are logged on as one of these special users you can grant permission to other users enabling them to create and manage Oracle Ultra Search instances Using the Oracle Ultra Search administration tool you can only grant and revoke Oracle Ultra Search related permissions to and from exiting users To add or delete users use the Oracle Internet Directory for single sign on users or Oracle Enterprise Manager for local database users Note The Oracle Ultra Search product database dictionary is installed in the WKSYS schema 8 4 Oracle Ultra Search User s Guide Logging On and Managing Instances as SSO Users See Also
265. rver and use Oracle Application Server Upgrade Assistant to upgrade the middle tier See the Oracle Application Server 10g Upgrading to 10g 9 0 4 section Upgrading the Middle Tier for details 2 Perform the upgrade on the Oracle Ultra Search schema in the Oracle Application Server Metadata Repository See the Oracle Application Server 10g Upgrading to 10g 9 0 4 section Upgrading the Metadata Repository gt Executing the Oracle Ultra Search Schema Upgrade Script for details 4 10 Oracle Ultra Search User s Guide Upgrading Oracle Ultra Search Upgrading Oracle Ultra Search Shipped with Oracle Collaboration Suite If you are using the Oracle Collaboration Suite release 1 and want to upgrade to the most recent Oracle Collaboration release install the latest Oracle Collaboration Suite release and use the Oracle Collaboration Suite Upgrade Assistant to upgrade both the Oracle Ultra Search middle tier and backend See the Oracle Collaboration Suite installation guide that is appropriate for your platform If you are using Oracle Ultra Search 9 0 2 shipped with Oracle Application Sever release or Oracle Ultra Search 1 0 3 or 9 2 shipped with Oracle Database release and want to upgrade to the most recent Oracle Collaboration release then perform the following upgrade procedures 1 Get the Oracle Collaboration Suite release 1 software and upgrade your Oracle Ultra Search to Oracle Collaboration Suite release 1 first See section Upgr
266. s To grant an Oracle Ultra Search user administration privileges you must assign the user to an administration group Each user can belong to one or more groups The following groups are created for each Oracle Ultra Search instance Security in Oracle Ultra Search 6 3 About Oracle Ultra Search Security Instance administrators Users in this group can only manage instances for which they have privileges Super users Users in this group can manage all instances including creating instances dropping instances and granting privileges Oracle Ultra Search also has two classes of users Single Sign on SSO users These users are managed by the Oracle Internet Directory OID and are authenticated by the SSO server The Oracle Ultra Search administration tool identifies all Oracle Ultra Search instances to which the SSO user has access This is available only if you have the Oracle Identity Management infrastructure installed Database users non SSOQ These users exist in the database on which Oracle Ultra Search runs Oracle Ultra Search Default Users New Oracle Ultra Search instances contain the following users WK_TEST This is the instance administrator user that hosts the default instance called WK_INST In other words WK_TEST is the instance administrator for WK_ INST For security purposes WK_TEST is locked after the installation The administrator should login to the database as DBA role unlock the WK_TEST user a
267. s Efficient usage of the buffer cache can improve Oracle Ultra Search query performance The cache size is controlled by the DB_CACHE_SIZE initialization parameter Tuning and Performance 5 3 Tuning Query Performance See Also Oracle Database Performance Tuning Guide for information on how to tune this parameter a Optimize the index Optimize the Oracle Ultra Search index after the crawler has made substantial updates To do so schedule index optimization on a regular basis Make sure index optimization is scheduled during off peak hours because query performance is significantly degraded during index optimization See Also Index Optimization on page 8 38 a Optimize the index based on tokens Optimize the Oracle Ultra Search index by basing it on frequently searched tokens To log queries use the administration tool to turn on query statistics collection The frequently searched tokens then can be passed to CTX_ DDL OPTIMIZE_INDEX in token mode The Oracle Ultra Search index name is WKSDOC_PATH_IDX See Also Oracle Text Reference for more information on OPTIMIZE_INDEX a Simplify query expansion The search response time is directly influenced by the Oracle Text query string used Although Oracle Ultra Search provides a default mechanism to expand user input into a Text query simpler expansions can greatly reduce search time See Also a Customizing the Query Syntax Expansion on page 9
268. s E a a S E A ia E 8 18 Attributes Page iicc scevascistencaitaeiwiahianar i eaaa i n i sanya ant 8 19 Search Attributes is ii penea anki Eep eiae a e aan a raia E Sa AEE eA e REENE EER EEE ental 8 19 Mapping sisne nra e ek dit E Ee at E atone A 8 20 SOULCES Page si s ccssescsitecssdoecevadesesievestsosesasviacseacasailtbavsagusetosanctesdebiosslededeaseceasatasheevdarstedivectsbedetas coterie 8 21 Web SOUrCOs wi eere aea E a e e E EERE E i TEE E EO TR 8 21 Creating Web SOurc S ssie E e E E A Ee EE eN 8 22 DEN ol EE ONE iae r E E E E EA AEE TET 8 24 Creating Table SourceS spiraia a Ea rE e eta R N EN 8 25 Editing Table So rc Se siise ernia ate aeta aeae aE ae EDASI N EERE duncan 8 26 Table Sources Comprised of More Than One Table cccccsesesescsesssesesesesenseseseees 8 26 Limitations With Database Links ssssssesssessestsrssssesssstentessessestentesressestentensesnenteneeseess 8 26 EMail SoUrCE Seea astie a iet T E a E TE E ER E RA 8 27 Creating Email Sources Serenu onor Eia EE AE E Seed dane eae ER 8 27 File SourceS inina aa hes n a ana n AA a aaa 8 28 Creating File Sources hashri ei a ee aaa aeien rE a madly 8 28 Oracle SOUNCES lt i 5 ciercst en a a E celta gates a E e toes Motes EEEE 8 29 Oracle Portal Sources iemet sice ea aaa a aO E r EE e E aaa 8 29 Federated So rceS iroa osae a aA nesae ae A oea sa sae ENEA Ee eea TESES 8 30 User Defined SOUrCSsis iieii aeneae E ani Era eaa BA EEEE A EE EERE SEERA aiai 8 32 Creatin
269. s Guide Crawler Page a Mail archive path a Number of crawler threads a Number of processors a Initial Java heap size in megabytes Maximum Java heap size in megabytes Java classpath Crawler Statistics Use this page to view the following crawler statistics Summary of Crawler Activity This provides a general summary of crawler activity a Aggregate crawler statistics Total number of documents indexed a Crawler statistics by data source type Detailed Crawler Statistics This includes the following a List of hosts crawled and indexed a Document distribution by depth a Document distribution by document type a Document distribution by data source type Crawler Progress This displays crawler progress for the past week It shows the total number of documents that have been indexed for exactly one week prior to the current time The Time column rounds the current time to the nearest hour Problematic URLs This lists errors encountered during the crawling process It also lists the number of URLs that cause each error Understanding the Oracle Ultra Search Administration Tool 8 17 Web Access Page Web Access Page Proxies Authentication Use this page to set up authentication and proxies Specify a proxy server if the search space includes Web pages that reside outside your organization s firewall Specifying a proxy server is optional Currently only the HTTP protocol is supporte
270. s Intranet Documents section Getting Started with Oracle Ultra Search 2 11 Crawl and Index Ultra Appliance s Database Documents 2 Create a table data source Select the Sources tab and then the Table sub tab on the browser view Under the Table Sources header click the Create new table source button a Create Table Source Step 1 Enter a name such as Ultra Appliance DB in the Name field for the Table Source Name The name should not be the same as the one you entered when you crawled and indexed Intranet documents Under the Database Table heading do not use a database link Enter the schema name of WK_TEST in the Schema field Enter the name of Problems in the Table name field Click the Locate table button The following note is displayed if a table is present in your database Note Successfully located table problems Click Proceed to Step 2 Create Table Source Step 2 Accept the default values for Language Under the Complex Primary Key heading add the PROBLEM_ID table column by clicking on the Add column button Under the Content Column and Type heading select PROBLEM_ DESCRIPTION for Column and Plaintext for the Type Click Proceed to Step 3 Create Table Source Step 3 Under the Verify Table Source Details heading confirm the settings and values that will be displayed in the Oracle Ultra Search output Click Create Table Source Table Source Logging Select Disable logging mechanis
271. s an iteration tag It loops through all the documents in a search result Attribute Name Description result name This refers to the resultId specified in the lt US getResult gt tag instance name This refers to the instanceld specified in the lt US instance gt tag The tag loops through all the documents in a search result and defines a scripting variable doc that isa oracle ultrasearch query Document object In addition it can have nested tags of lt showAttributeValue gt which helps to render the document s attributes It is an error if the result specified is not one obtained from search on the instance specified In other words the result must come from the instance The following example shows the URL of all documents in a search result lt US iterResult result searchresult instance mybookstore gt lt US iterResult gt lt showAttributeValue gt Tag Render a Document Attribute This tag shows an attribute of a document within the lt US iterResult gt tag Attribute Name Description attributeName attname The name of the document attribute attributeType string The type of the document attribute This is needed because number date attribute name does not uniquely identify an attribute in the instance default default string A value to output when the document has no value for this attribute This is useful when a document has no title The string No Title can be displayed as the d
272. s to data source properties For example if an attribute ID is the unique ID of a document then the agent should return document_key 4 where ID has been mapped to the property document_ key and its value is 4 for this particular document If the attribute LOV is available then the agent returns them upon request Interaction Between the Crawler and the Crawler Agent The crawler crawls data sources defined by the user through the invocation of the user supplied crawler agent The crawler can do the following Invoke the crawler agent of the defined data source Supply data source parameter information to the agent Authenticate itself with the agent if needed Retrieve a list of URLs and associate attributes properties that must be crawled Use the URL provided by the agent to retrieve the document Detect insert update and delete to the data source Retrieve attribute LOV data if available Oracle Ultra Search Developer s Guide and API Reference 9 23 Oracle Ultra Search Crawler Agent AP Crawler Agent APIs and Classes The crawler agent API is a collection of methods used to implement a crawler agent A sample implementation of a crawler agent SampleAgent java is provided under ORACLE_HOME ultrasearch extension UrlData The crawler agent uses this interface to populate document properties and attribute values Oracle Ultra Search provides a basic implementation of this interface that the agent can use dire
273. sed administration tool The loader tool supports the following types of metadata a Search attribute list of values LOVs and display names Document relevancy boosting and document loading The metadata loader is a Java application To use the program you must put the metadata in an XML file that conforms to the XML schema formats described in the following sections You then can launch the Java program with the XML filename the database related parameters and the loader type parameter The program parses the XML file and uploads the metadata Status and error messages are displayed in the terminal console See Also Document Relevancy Boosting on page 1 11 Launching the Loading Tool The loader program binary file is located in the following directory SULTRASEARCH_HOME bin MetaLoader class Your computer should have Java 1 2 compliant Java Runtime or higher The following Java libraries should be included in the system Java CLASSPATH a Oracle JDBC Thin Driver version 1 2 The filename is classes12 zip a Oracle XML parser for Java version 2 The filename is xmlparserv2 jar a Oracle XML schema processor for Java The filename is xml schema jar a Oracle Ultra Search Java library The filename is ult rasearch jar Loading Metadata into Oracle Ultra Search A 1 Loading Documents and Relevance Scores Oracle JDBC globalization support version 1 2 The filename is nl s_ charset12 zip To launch the file ent
274. ser therefore it does not know the database login password By editing data sources xm1 the database user and password information is configured with OC4J The Oracle Ultra Search query application finds the data source by using its location jdbc UltraSearchPooledDS Post Installation Information 4 15 Configuring the Query Application See Also Editing the data sources xml File on page 3 21 Step 2 Deploy Multiple Query Applications Against Multiple Instances Oracle Ultra Search lets multiple instances use different schema users so multiple query applications can co exist on the same database Each query application requires its database connection information to be defined with data sources xml They must be defined to have different location values such as jdbc UltraSearchPooledDS1 jdbc UltraSearchPooledDS2 and so on Correspondingly the query application must be deployed multiple times in OC4J See Also Deploying the Sample Query Applications on page 3 19 Finally each application deployment must be configured to use the correct entry in data sources xml This is done by editing the JSP source for query For the complete search application edit common_customize_instance jsp and edit the following line to use the correct location value String m_datasource_name jdbc UltraSearchPooledDS 4 16 Oracle Ultra Search User s Guide 9 Tuning and Performance This chapter contains the following sect
275. sest together Likewise Y has a higher score than Z Class 3 is the least weighted class A document that has more tokens gets a higher score For example an end user query string Oracle Applications Financials can result in three documents found Document X might contain all three tokens Document Y might contain the tokens Oracle and Applications only Document Z might contain only the token Oracle In this scenario document X has a higher score than Y Likewise Y has a higher score than Z 9 6 Oracle Ultra Search User s Guide Customizing the Query Syntax Expansion Expansion Rules As mentioned previously the end user query is expanded to an Oracle Text query The expanded query string rules are captured in BNF Backus Naur Form notation Again these rules are the rules that Oracle Ultra Search uses as a default query syntax expansion implementation The rules that define an expanded query lt expanded query gt lt expression gt within lt title section gt 2 lt expression gt lt expression gt lt generic query expression gt lt simple query expression gt lt generic query expression gt lt plus expression gt 100 amp lt main expression gt lt minus expression gt lt simple query expression gt lt phrase expression gt 2 lt main expression gt lt main expression gt lt near expression gt 2 lt accum expression gt The following list contains some terms
276. settings for each Web source This step is optional The parameters you can override are the crawling depth the number of crawler threads the language the crawler timeout threshold the character set the maximum cookie size the maximum number of cookies and the maximum number of cookies for each host You can also enable or disable robots exclusion language detection the UrlRewriter indexing dynamic pages HTTP cookies and whether content of the cookie log file is shown You can also edit those in Edit Web Sources Robots exclusion lets you control which parts of your sites can be visited by robots If robots exclusion is enabled default then the Web crawler traverses the pages based on the access policy specified in the Web server robots txt file For example when a robot visits http www foobar com it checks for http www foobar com robots txt If it finds it the crawler analyzes its contents to see if it is allowed to retrieve the document If you own the Web sites then you can disable robots exclusions However when crawling other Web sites you should always comply with robots txt by enabling robots exclusion The URL Rewriter is a user supplied Java module for implementing the Oracle Ultra Search UrlRewriter interface It is used by the crawler to filter or rewrite extracted URL links before they are put into the URL queue URL filtering removes unwanted links and ULR rewriting transforms the URL link This transformati
277. ssion to read the protected document then the document is returned by the query API Otherwise it is not returned There are two ways to secure a data source Specify a single ACL for protecting all documents of a data source The administrator specifies the permissions of the single ACL in the Oracle Ultra Search administration tool The resulting ACL is used to protect all documents belonging to that data source a Crawl ACLs from the data source The data source is expected to provide the ACL together with the document This lets each document be protected by its own unique ACL Oracle Ultra Search performs ACL duplicate detection This means that if a crawled document s ACL already exists in the Oracle Ultra Search system then the existing 1 6 Oracle Ultra Search User s Guide Oracle Ultra Search Features ACL is used to protect the document instead of creating a new ACL within Oracle Ultra Search This policy reduces storage space and increases performance Oracle Ultra Search supports only a single LDAP domain The LDAP users and groups specified in the ACL must belong to the same LDAP domain Caution If ACLs are crawled from data sources then it is the responsibility of the administrator to ensure that the data sources being crawled belong to the same LDAP domain Otherwise it is possible that search users can inadvertently be granted permissions to access documents that they should not be able to access
278. sspath on a Remote Crawler Host 1 Log on to the remote crawler host where the Oracle Ultra Search middle tier is installed On a UNIX computer locate and open the file SORACLE_ HOME ultrasearch tools remotecrawler scripts unix define_ env On a Windows computer locate and open the file ORACLE_ HOME ultrasearch tools remotecrawler scripts winnt define_ env bat The define_env file specifies all environment settings used by the RMI subsystem To alter the classpath use a text editor to modify the APPLICATION_CLASSPATH variable B 2 Oracle Ultra Search User s Guide Altering the Crawler J ava Classpath on a Remote Crawler Host 3 Restart the RMI subsystem for these changes to take effect See Also Using the Remote Crawler on page 5 6 for more details on starting up the RMI subsystem Altering the Crawler Java Classpath B 3 Altering the Crawler J ava Classpath on a Remote Crawler Host B 4 Oracle Ultra Search User s Guide C Oracle Ultra Search Views This appendix lists all of the views provided by Oracle Ultra Search The system provides the following views a OUS_INSTANCES a OUS_SCHEDULES a OUS_DEFAULT_CRAWLER_SETTINGS a OUS_CRAWLER_SETTINGS OUS_INSTANCES This view displays all instance information Any user can query it Column Name Type Description INS_ID NUMBER Instance ID INS_NAME VARCHAR2 100 Instance name INS_SCHEMA VARCHAR2 30 Instance owner INS_MODE ONLY VARCHAR2 30 I
279. sssescscesesesesssnssesessseesesesescecesessscaanssesesneseseseececesesssssanenens 5 14 Remote Crawler File Cache esiseina i iee e esac cs Seat a E S 5 15 Logging on to the Oracle Instance p enn nesrin nas ers eerst roas ee nikte asistana 5 15 Query Search Application for Read Application Clusters ccccccsceeceteeesteneteneees 5 15 Jaya Crawler deeco e e e a GHA AE a eda eG 5 16 Choosing a JDBC Driver mosreteci esiin esane tosa esae oeaan ar i Sa Eear E TET EE eaat 5 16 Table Data Source Synchronization cccccssecssesssessseseseseseeseseesesesssesesesesesssesesesesessseseseseess 5 17 Synchronizing Crawling of Oracle Databases sesssssssrstsrtssssertesssertessnerstsstesttsntesstenteess 5 17 Create Log Table seriosi Set ress eet bee ee EE E EE EEAO eres 5 17 Create Loo Trig eens aoi e A A a EA E acai AEAEE NAA 5 18 Synchronizing Crawling of Non Oracle Databases cccccsesseceieescesesesescenesesesesesteneneees 5 20 Security in Oracle Ultra Search About Oracle Ultra Search Security 0 0 ccccccc cece ccceseescscsesesesescsesesesesescsessseseseseseseseseseseees 6 2 Oracle Ultra Search Security Model ccccccccccsesesseseeseecsseseseecesesssesnensneseseeceseseeeecesesesssnsanenens 6 2 Classes of Users and Their Privileges cccccseccccscsesesescscscsesesescscsesesesescsesssesesescseseneseseseees 6 3 Oracle Ultra Search Default Users cisccccicitin ne esaea eiae eatin aeeenin a 6 4 Oracle Ultra
280. ssword is WK_TEST For security purposes WK_TEST is locked after the installation The administrator should login to the database as DBA role unlock the WK_TEST user account and set the password to be WK_TEST The password expires after the installation If the password is changed to anything other than WK_TEST then you must also update the cached schema password using administration tool Edit Instance page after you change the password in the database The default instance is also used by the Oracle Ultra Search sample query application Make sure to update the data sources xml1 file 3 10 Oracle Ultra Search User s Guide Installing the Oracle Ultra Search Middle Tier on Web Server Hosts See Also Schema Password on page 8 11 a Editing the data sources xml File on page 3 21 Installing the Oracle Ultra Search Middle Tier on Web Server Hosts The Oracle Ultra Search middle tier includes the following a Oracle Ultra Search administration tool a Oracle Ultra Search Java query API a Oracle Ultra Search sample query applications For the Oracle Application Server release the Oracle Ultra Search middle tier is part of the Application Server installation You must choose the OracleAS Portal and Wireless option from the Oracle Universal Installer menu to install and configure the Oracle Ultra Search middle tier with the Application Server install For the database release the Oracle Ultra Search middle tier
281. stallation In this example the Ultra Appliance call center agents will have access to two types of information a An intranet that contains documents on Web pages with detailed information about appliances sold and serviced by Ultra Appliance a A database that contains information about previous maintenance issues and solutions for appliances sold and serviced by Ultra Appliance For this example the Ultra Appliance call center agents will search the company intranet and the problem database for information on and any issues associated with the Springmaster 2000 refrigerator This chapter describes how you acting as the Ultra Appliance search administrator can set up a browser based search application that enables call center agents to find information for appliances that they support The final section of the chapter describes how you acting as an Ultra Appliance call center agent can use Oracle Ultra Search to query the company intranet and database Refer to this guide the Oracle Ultra Search User s Guide for detailed instructions on installation configuration and administration of Oracle Ultra Search Installation The instructions in this section are meant to briefly cover installation For further installation information see Chapter 3 Installing and Configuring Oracle Ultra Search in this book You may also want to refer to the Oracle Universal Installer Concepts Guide for detailed installation instructions Ora
282. t boostTerm string This specifies the search term that is used for relevance boosting This is optional withCount true false This specifies whether the result has an estimate of the total hit count This is optional If unspecified the behavior is same as withCount false The lt getResult gt tag corresponds to the getResult method on the oracle ultrasearch query Instance class The attributes of tag map to the parameters of the method with the exception that getResult method can specify the attributes to fetch The lt getResult gt tag require the use of the nested lt fetchAttribute gt tag to accomplish metadata selection The following example shows a search for the first 20 documents of a query in English that appears in French documents lt US getResult resultId searchresult instance mybookstore query queryLocale document Language from 1 to 20 gt lt US getResult gt lt fetchAttribute gt Tag Metadata Selection This tag is used as nested tag inside lt getResult gt It specifies which attributes of each document should be fetched along with the query result Each lt getResult gt can have any number of nested lt fetchAttribute gt tags 9 16 Oracle Ultra Search User s Guide Oracle Ultra Search Query Tag Library Attribute Name Description attributeName attname The name of the attribute whose LOV is being fetched in this LOV attributeType string The type of the attri
283. t launch time Synchronization Status and Crawler Progress Click the link in the status column to see the synchronization schedule status To see the crawling progress for any data source associated with this schedule click Statistics If you decide to examine URLs before indexing for the schedule then after you run the schedule the schedule status is shown as Indexing Pending In data harvesting mode you should begin crawling first After crawling is done click Examine URL to examine document URLs and status remove unwanted documents and start indexing After you click Begin Index you see schedule status change from launching executing scheduled and so on Understanding the Oracle Ultra Search Administration Tool 8 37 Schedules Page The crawling progress contains the following information Data source type Data source name Start time Finish time Elapsed time Total indexing time Total size of document data collected Average document size Average fetch throughput It also contains the following statistics Documents to fetch Documents fetched This is the sum of Document non indexable Document conversion failure and Documents indexed Document fetch failures This could be an Oracle HTTP Server timeout or another HTTP server error Documents rejected The document is not within the URL boundary rule Documents discovered This is the sum of Documents to fetch Documents fetched Document fetch
284. taSourceName tablePagePath emailPagePath filePagePath instance locale instance locale instance instance locale attributeName attributeType resultId instance query queryLocale documentLanguage from to boostTerm withCount Oracle Ultra Search Query Tag Library Tag Description Attributes fetchAttribute This is a nested tag within getResult to specify attributeName which attributes of each document should be attributeType fetched along with the query results There can be any number of nested fetchAttribute tags showHitCount If withCount true in the getResult tag then the result result includes a total number of hits and you can use showHitCount to display this number showResults Renders the results of the search result instance showAttributeV Renders a document attribute attributeName alue attributeType Details of these tags are described in the following subsections Note the following requirements for using Oracle Ultra Search tags a Install the file ult rasearch_query jar and include it in classpath or the WEB INF 1ib directory of the Web application This file is provided with the Oracle Ultra Search installation under the ult rasearch 1ib directory a Make sure that the tag library description file ultrasearch taglib tld is deployed with the application and is in the location specified in the taglib directives of your JSP pages such as in the following ex
285. tent to HTML and temporarily stores that HTML in the cache directory for indexing Next the Oracle Ultra Search crawler stores all retrieved messages in a directory known as the archive directory The email files stored in this directory are displayed to the search end user when referenced by a query hit To crawl email sources you must specify the user name and password of the email account on the IMAP server Also specify the IMAP server host name and the archive directory Creating Email Sources To create email sources you must enter an email address and a description Optionally you can specify email aliases and ACL policy The description can be viewed by all search end users so you should specify a short but meaningful name When you create register an email source the name you use is the email of the mailing list If the emails are not sent to one of the registered mailing lists then those emails are not crawled Understanding the Oracle Ultra Search Administration Tool 8 27 Sources Page You can specify email address aliases for an email source Specifying an alias for an email source causes all emails sent to the main email address as well as the alias address to be gathered by the crawler An alias is useful when two or more email addresses are logically the same For example an email source representing the distribution list 1i st company com can have the alternate address list my company com If list my company com
286. tep 1 For example u01 oracle9i ona UNIX host or d u01 oracle9i ona Windows host Remember to use forward slashes for Windows hosts The JDBC based registration script prompts you for three variables 3 28 Oracle Ultra Search User s Guide Configuring Oracle Ultra Search in a Hosted Environment LAUNCHER_NAME An arbitrary string used to identify a JOBC based remote crawler launcher which is needed when you start up the JDBC based remote crawler launcher a CONNECTUSER The database user or role that the JOBC based remote crawler launcher will use to establish a database connection and listen for launch events a ORACLE_HOME The Oracle home located in Step 1 The registration script invokes the wk_crw register_remote_crawler PL SQL API The REMOTE_CRAWLER_HOSTNAME and ORACLE_HOME variables are used to compose arguments for the wk_crw register_remote_ crawler API You may optionally choose to call this API especially if you need to register multiple remote crawlers programatically 5 Verify and complete the remote crawler profile configuration Be sure to enter the correct values for both variables To verify that the registration has completed correctly log on to the Oracle Ultra Search administration tool Click the Remote Crawler Profiles subtab in the Crawler tab You should see the remote crawler launcher you registered in the remote crawler profile
287. tial Crawling 1 Modify the default stoplist before creating the instance For example to add the stopword web to the default stoplist log on as user WKSYS in SQL Plus and run the following statement EXEC ctx_ddl add_stopword wk_stoplist web To remove the stopword web from the default stoplist log on as user WKSYS in SQL Plus and run the following statement EXEC ctx_ddl remove_stopword wk_stoplist web Subsequently the stoplists of all new instances reflect the modifications made to the default stoplist 2 Replace the instance stoplist immediately after creating the instance You must create a new user defined stoplist Log on as the owner of the instance in SQL Plus and run the following statements BEGIN ctx_ddl create_stoplist example_stoplist ctx_ddl add_stopword example_stoplist example_stopword add more stopwords by repeated the previous line with new stopwords END To replace an instance stoplist with this new stoplist log on as the owner of the instance in SQL Plus and run the following statement 4 8 Oracle Ultra Search User s Guide Upgrading Oracle Ultra Search ALTER INDEX wk doc_path_idx rebuild parameters replace stoplist example_ stoplist See Also Changing Oracle Ultra Search Schema Passwords on page 4 2 for information about changing the WKSYS password Modifying Instance Stoplists After Initial Crawling If necessary alter an instance stoplist after initial
288. tings This step is optional Under HTTP Authentication specify the user name and password for host and realm for which authentication is required Under HTML Forms you can register HTML forms that you want the Oracle Ultra Search crawler to automatically fill out during Web crawling HTML form support requires that HTTP cookie functionality is enabled Click Register HTML Form to register authentication forms protecting the data source Specify the ACL access control list policy for the data source no ACL repository generated ACL or Oracle Ultra Search ACL When a user performs a search the ACL controls which documents the user can access The default is no ACL with all documents considered searchable and visible For the Oracle Ultra Search ACL you can add more than one group and user to the ACL for the data source Specify mappings This step is optional Document attributes are automatically mapped directly to the search attribute with the same name during crawling If you want document attributes to map to another search attribute then you can specify it here The crawler picks up attributes that have been returned by the crawler agent or specified here Edit crawling parameters Specify the document types that the crawler should process for this data source By default HTML and plain text are always processed You can edit user defined data sources by changing the name type default language or starting address Understan
289. tion After a data source type is defined any instance of that data source type can be defined Data source name Description of the data source limit to 4000 bytes Data source type ID Default language default is en English Parameter values for example seed http www oracle com depth 8 Data Source Attribute Registration You can add new attributes to Oracle Ultra Search by providing the attribute name and the attribute data type The data type can be string number or date Attributes with the same name but different data type can be added Attributes returned by an agent are automatically registered if they have not been defined 9 22 Oracle Ultra Search User s Guide Oracle Ultra Search Crawler Agent API User Implemented Crawler Agent The crawler agent has the following requirements The agent must be implemented in Java The agent must support the Java agent APIs defined by Oracle Ultra Search The agent must return the URL attributes and properties The agent optionally can authenticate the crawler s access to the data source The agent must flatten the data source such that each document is retrieved one by one in a streaming fashion This is to encapsulate the crawling logic of a specific data source into the agent The agent must decide which document attributes Oracle Ultra Search should keep Any attribute not defined in Oracle Ultra Search is registered automatically The agent can map attribute
290. tstxt org wc meta user html Crawling Depth URL Rewriter Crawling depth controls how deep the crawler follows a link starting from the given seed URL Since crawling is multi threaded this is not a deterministic control as there may be different routes to a particular page The crawling depth limit applies to all Web sites in a given Web data source You implement the URL rewriter API as a Java class to perform link filtering or rewriting Extracted links within a crawled Web page are passed to this module for checking This enables ultimate control over which links extracted from a Web page are allowed and which ones should be discard See Oracle Ultra Search URL Rewriter API on page 9 29 for details URL Redirection and Boundary Rule Enforcement With regard to HTTP redirection earlier Oracle Ultra Search releases 9 0 2 9 0 3 and 9 2 0 4 applied the same boundary checking to a redirected URL Thus a redirected URL would be rejected if it was outside the boundary rule If the redirected URL was to be crawled you had to make sure it was covered by the boundary rule In 9 2 0 5 iAS 10g and Oracle Database 10g the redirected URL is always allowed if it is a temporary redirection HTTP status 302 307 For permanent redirection status 301 the redirected URL is still subject to boundary rules HTTP meta tag redirection is always checked against boundary rules Understanding the Oracle Ultra Search Crawler and Data Sources
291. two permanent JDBC connections to the backend database If either connection goes down at any time then the JDBC launcher attempts to re establish it The number of attempts to re establish connections is configurable as a command line parameter The wait time between attempts is also configurable Note The JDBC launcher can be configured to periodically trigger a keep alive signal This is useful to prevent inadvertent closing of the JDBC connections by firewalls The time between signals is configurable with a command line parameter Security With Remote Crawlers When launching a remote crawler the Oracle Ultra Search backend database communicates with the remote computer through Java remote method invocation RMI or JDBC Oracle Ultra Search encrypts all RMI communication However the JDBC launcher uses the Oracle Thin JDBC driver If security is a concern then encrypt all JDBC traffic by securing the Oracle Thin JDBC driver See Also Oracle Advanced Security Administrator s Guide for more information on Thin JDBC support Scalability and Load Balancing Each Oracle Ultra Search schedule can be associated with exactly one crawler The crawler can run locally on the Oracle database host or on a remote host There is no limit to the number of schedules that can be run Similarly there is no limit to the number of remote crawler hosts that can be run However each remote crawler host requires that the Oracle Ultra Searc
292. ue Oracle Ultra Search includes highly functional query applications to query and display search results The query applications are based on JSP and work with any JSP1 1 compliant engine See Also a Chapter 9 Oracle Ultra Search Developer s Guide and API Reference a Oracle Ultra Search Java API Reference Oracle Ultra Search Features This section explains some features in Oracle Ultra Search It includes the following topics Integration with Oracle Application Server Extensible Crawler and Crawler Agents Federated Search Secure Search Sample Query Applications Sample Search Portlet Query API URL Rewrite Robots Exclusions Display URL Support Document and Search Attributes Metadata Loader Document Relevancy Boosting Data Harvesting Mode 1 4 Oracle Ultra Search User s Guide Oracle Ultra Search Features Instance Snapshot Support Integration with Oracle Internet Directory Single Sign On Authentication Query Syntax Expansion Integration with Oracle Application Server Although Oracle Ultra Search in the Oracle Application Server is the same product as Oracle Ultra Search in Oracle Collaboration Suite and Oracle Ultra Search in the Oracle Database there are a couple differences The Oracle Database is not integrated with Oracle Application Server Portal With Oracle Application Server and Oracle Collaboration Suite Portal users add powerful multi repository search to their Portal pages Oracle Appl
293. ue The following example shows all the values for a string attribute named Dept in mybookstore instance using their English display names lt US iterLOV instance mybookstore attribute_name Dept attribute_type String AN NV value gt lt displayname gt lt US iterLOV gt Formulating the Query Oracle Ultra Search supports a set of classes for building queries Currently these classes do not have any tag equivalents lt getResult gt Tag Perform Search This tag performs the search and returns the result by defining a scripting variable of the type oracle ultrasearch query Result Attribute Name Description resultId name This names the result generated by this tag This name is then used by other tags to render the result on the page Oracle Ultra Search Developer s Guide and API Reference 9 15 Oracle Ultra Search Query Tag Library Attribute Name Description instance name This is a mandatory attribute to refer to the object defined by the instance tag query lt expression gt This specifies a query object to search with queryLocale locale This specifies the locale of the query object documentLanguage locale This specifies the language of the documents for which to search This is optional If it is not specified then all languages are included in the search from number This specifies the index of the first hit to number This specifies the index of the last hi
294. ueries Page Data Groups This section lets you specify query related settings such as data source groups URL submission relevancy boosting and query statistics Data groups are logical entities exposed to the search engine user When entering a query the user is asked to select one or more data groups from which to search A data group consists of one or more data sources A data source can be assigned to multiple data groups Data groups are sorted first by name Within each data group individual data sources are listed and can be sorted by source name or source type To create a new data source group do the following 1 Specify a name for the group 2 Assign data sources to the group To assign a Web or table data source to this data group select one or more available Web sources or table sources and click gt gt After a data source has been assigned to a group it cannot be assigned to any other group To unassign a Web or table data source select one or more scheduled sources and click lt lt Understanding the Oracle Ultra Search Administration Tool 8 39 Queries Page 3 Click Finish URL Submission URL Submission Methods URL submission lets query users submit URLs These URLs are added to the seed URL list and included in the Oracle Ultra Search crawler search space You can allow or disallow query users to submit URLs URL Boundary Rules Checking URLs are submitted to a specific Web data source URL
295. ules 7 9 9 30 Index 1 DROP_INSTANCE procedure 10 5 DROP_SCHEDULE procedure 10 10 dropping a schedule 10 10 dropping an instance 10 5 E email API 1 4 9 26 9 27 Enterprise Manager 3 14 3 24 4 2 8 3 F federated search 1 6 Federator searchlet 8 31 federator rar 8 31 G GRANT_ADMIN procedure 10 6 granting user privileges 10 6 H HTTPS 6 3 8 22 8 29 index altering 4 7 7 2 optimizing 8 39 indexing documents 7 7 instance creating 10 3 dropping 10 5 setting 10 8 instance snapshot 1 12 INTERVAL procedure 10 11 interval string 10 11 IS_ADMIN_READONLY procedure 10 16 J Java classpath B 1 JAZN 6 8 jazn data xml 6 10 JDBC 3 20 3 21 3 23 3 27 5 5 5 15 5 16 6 8 Index 2 8 15 9 2 9 11 9 12 9 24 9 33 9 34 A 1 A 2 JOB_QUEUE_PROCESSES initialization parameter 4 4 L list of values LOV 1 9 1 10 7 4 8 19 8 20 8 44 8 46 9 10 9 24 A 1 metadata 7 3 loading A 1 metadata loader 1 10 migration logs 4 14 O OC4J 3 11 3 13 3 14 3 15 3 18 5 5 6 8 9 33 Oracle Internet Directory 1 12 3 7 6 4 6 9 8 3 Oracle Text 1 2 1 3 4 4 5 14 7 2 7 5 7 7 9 3 OracleAS Portal 1 5 OUS_CRAWLER_SETTINGS view C 2 OUS_DEFAULT_CRAWLER_SETTINGS view C 2 OUS_INSTANCES view C 1 OUS_SCHEDULES view C 1 P path rules 7 9 8 28 9 30 privileges granting 10 6 revoking 10 7 procedure CREATE_INSTANCE 10 3 CREATE_SCHEDUL
296. ument as shown in Step 4 can be time consuming because of network traffic or slow Web sites For maximum throughput multiple threads fetch pages at any given time Note URLs remain visible until the next crawling run When the crawler detects that the URL is no longer there it is removed from the wk doc table where Oracle Text automatically marks this document as deleted even though the index data still exists Cleanup is done through index optimization which can be scheduled separately Understanding the Oracle Ultra Search Crawler and Data Sources 7 5 Crawling Process for the Schedule Figure 7 1 Queuing URLs Server A crawler thread extracts the hypertext links from the URL page and inserts new links into the URL queue mS URL queue with Oracle Text N initially populated and Oracle with SEED URLs Ultra Search N Oracle spawns a crawler according to schedule Crawler thread removes the next URL in the queue Wwe A crawler crawler thread fetches the URL page from the web Crawler initiates multiple crawling 7 6 Oracle Ultra Search User s Guide Crawling Process for the Schedule Figure 7 2 Caching URLs document table Server The crawler registers the N URL URL and associated with Oracle Text eo informati n and Oracle N Ultra Search TAA OK Crawler Web caches HTML crawler file in the local file system Indexing Documents
297. urces page in the administration tool Display URL and Access URL For some applications for security reasons the URL crawled is different from the one seen by the end user For example crawling on an internal Web site inside a firewall might be done without security checking but when queried by the end user a corresponding mirror URL outside the firewall must be used This mirror URL is called the display URL By default the display URL is treated as the access URL unless a separate access URL is provided The display URL must be unique in a data source so two different access URLs cannot have the same display URL See Also Sources Page on page 8 21 Document Attributes Document attributes or metadata describe the properties of a document Each data source has its own set of document attributes The value is retrieved during the crawling process and then mapped to one of the search attributes and stored and indexed in the database This lets you query documents based on their attributes Document attributes in different data sources can be mapped to the same search attribute Therefore you can query documents from multiple data sources based on the same search attribute If the document is a Web page the attribute can come from the HTTP header or it can be embedded inside the HTML in metatags Document attributes can be used Understanding the Oracle Ultra Search Crawler and Data Sources 7 3 Crawling Process for the Schedule
298. ure to revoke instance administrator privileges from the specified user OUS_ADM REVOKE_ADMIN user_name IN VARCHAR2 user_type IN NUMBER DEFAULT DB_USER scope IN NUMBER DEFAULT CURRENT_INSTANCE i user_name The name of the user whose privileges are to be revoked user_type The user type OUS_ADM DB_USER database user OUS_ADM LDAP_US lightweight SSO user GI 7 scope The scope of the granting CURRENT_INSTANCE or ALL_INSTANCE GI OUS_ADM REVOKE_ADMIN scott ous_adm DB_USER Administration PL SQL APIs 10 7 SET_INSTANCE SET_INSTANCE Use this procedure to operate on an Oracle Ultra Search instance Almost all oUS_ ADM APIs require SET_INSTANCE be called first Syntax This procedure takes two forms In the first you specify the name of the instance to set OUS_ADM SET_INSTANCE inst_name IN VARCHAR2 i inst_name The name of the instance to set In the second form you specify the ID of the instance to set OUS_ADM SET_INSTANCE inst_id IN NUMBER 3 inst_id The ID of the instance to set Example OUS_ADM SET_INSTANCE Scott Instance 10 8 Oracle Ultra Search User s Guide Schedule Related AP Is Schedule Related APIs This section provides reference information for using the schedule related APIs CREATE_SCHEDULE Syntax Example Use this procedure to create a crawler schedule It returns an ID for the schedule OUS_ADM CREATE_SCHEDULE
299. user This process is slightly different depending Understanding the Oracle Ultra Search Administration Tool 8 5 Instances Page on whether Oracle Application Server Portal is running in hosted mode or non hosted mode as described in the following list Note An SSO user is uniquely identified by Oracle Ultra Search with an SSO nickname subscriber nickname combination Innon hosted mode the subscriber nickname is not required when granting privileges to an SSO user This is because there is exactly one subscriber in Oracle Application Server Portal in non hosted mode In hosted mode the subscriber nickname is required when granting privileges to an SSO user This is because there can be more than one subscriber in Oracle Application Server Portal and two or more users with the same SSO nickname for example PORTAL could be distinct SSO users distinguished by their subscriber nickname When running in hosted mode also note the following a When granting permissions for the default subscriber user always specify DEFAULT COMPANY for the subscriber nickname even though the actual nickname could be different for example ORACLE The actual nickname is not recognized by Oracle Ultra Search a When logging in to SSO as the default subscriber user leave the subscriber nickname blank Alternatively enter DEFAULT COMPANY instead of the actual subscriber nickname for example ORACLE so that it is recognized by
300. ve a specified language then the crawler assumes that the Web page is written in this default language This setting is important because language directly determines how a document is indexed Note This default language is used only if the crawler cannot determine the document language during crawling Set language preference in the Users Page You can select a default language for the crawler or for data sources Default language support for indexing and querying is available for the following languages a Polish a Chinese a Hungarian a Norwegian a Romanian a Finnish Japanese a Spanish a Slovak a English a Turkish Understanding the Oracle Ultra Search Administration Tool 8 13 Crawler Page a Danish a Swedish a Russian German a Korean a Dutch a talian a Greek a Portuguese a Czech a Hebrew a French a Arabic Crawling Depth A Web document could contain links to other Web documents which could contain more links This setting lets you specify the maximum number of nested links the crawler will follow See Also Tuning the Web Crawling Process on page 5 2 for more information on the importance of the crawling depth Crawler Timeout Threshold Specify in seconds a crawler timeout The crawler timeout threshold is used to force a timeout when the crawler cannot access a Web page Default Character Set Specify the default character set The crawler uses this setting whe
301. ve user management available with the Oracle database Oracle Ultra Search Extensibility and Security Oracle Ultra Search is extensible for example the crawler agent but this poses no extra security considerations Configuring a Security Framework for Oracle Ultra Search This section describes special security configuration steps within Oracle Ultra Search Security in Oracle Ultra Search 6 9 Configuring Oracle Ultra Search Security Configuring Security Framework Options for Oracle Ultra Search Storing clear text passwords in data sources xml poses a security risk Avoid this by using password indirection to specify the password This lets you enter the password in jazn data xm1 which is automatically encrypted and point to it from data sources xml See Also a Editing the data sources xml File on page 3 21 a Oracle Application Server Containers for J2EE Services Guide Configuring Oracle Identity Management Options for Oracle Ultra Search To configure the Oracle Ultra Search administration tool with the SSO server you must follow certain steps See Also Configuring the Administration Tool with Single Sign On Server on page 3 17 Configuring Oracle Ultra Search Security Oracle Ultra Search has no specific security passwords See Also Configuring Security Framework Options for Oracle Ultra Search on page 6 10 for more information on Oracle Ultra Search configuration issues to leverage security 6 10 Oracle
302. ventions in Code Examples Fl Ge You can specify this clause only for a NUMB column You can back up the database by using the BACKUP command Query the TABLE_NAME column in the USER TABLES data dictionary view Use the DBMS_STATS GENERATE_STATS procedure Enter sqlplus to open SQL Plus The password is specified in the orapwd file Back up the datafiles and control files in the disk1 oracle dbs directory The department_id department_name and location_id columns are in the hr departments table Set the QUERY_REWRITE_ENABLED initialization parameter to true Connect as oe user The JRepUtil class implements these methods You can specify the parallel_clause Run old_release SQL where old_release refers to the release you installed prior to upgrading Code examples illustrate SQL PL SQL SQL Plus or other command line statements They are displayed in a monospace fixed width font and separated from normal text as shown in this example SELECT username FROM dba_users WHERE username MIGRATE The following table describes typographic conventions used in code examples and provides examples of their use xxi Convention Other notation Italics UPPERCASE xxii Meaning items Do not enter the brackets Braces enclose two or more items one of which is required Do not enter the braces A vertical bar repres
303. y in special circumstances where you require the classpath to be different from the RMI subsystem classpath Complete the crawler configuration with the administration tool 5 10 Oracle Ultra Search User s Guide Using the Remote Crawler Create schedules and data sources Assign one or more data sources to each schedule Each schedule must then be assigned to a remote crawler or the local crawler The local crawler is the crawler that runs on the local Oracle database host itself To assign the a schedule to a remote crawler host or the local database host click the host name of a schedule in the Schedules page You can also turn off the remote crawler feature for each schedule thereby forcing the schedule to launch a crawler on the local database host instead of the specified remote crawler host To turn off the remote crawler feature click the host name of a schedule in the Synchronization Schedules page If a remote crawler host is selected the RMI based remote crawler hostname or JDBC Based launcher name will be displayed Change this to the local database host in order to disable remote crawling See Also Chapter 8 Understanding the Oracle Ultra Search Administration Tool Start the remote crawler launching sub system on each remote crawler host Use the helper scripts in ORACLE_ HOME tools remotecrawler scripts operating_system to do this a Ifthe remote crawler is running on a UNIX platform then source
Download Pdf Manuals
Related Search
Related Contents
カタログ (7.6MB) 小型無停電電源装置 1~20kVA Little star 制作時間2各約 0.5〜1H MSK 101 - Heyl Neomeris HID Identity iCLASS Tag Copyright © All rights reserved.
Failed to retrieve file