Home

Final Report

image

Contents

1. Australia SWiSH Max primarily outputs to the swf format which is currently under control of Adobe Systems that we used the improve graphical layout and effects of our system 6 3 Rapid PHP Editor Rapid PHP editor is a powerful guick and sophisticated PHP editor with features of a fully loaded PHP IDE and speed of the Notepad Convenient features enable you to instantly create and edit not only PHP but also HTML XHTML CSS and JavaScript code while integrated tools allow you to easily debug validate reuse navigate and format source code We have used Rapid PHP Editor to implement server side of Xentinel which includes integration between PHP and Flash layout 28 6 4 WireShark Network Sniffer Wireshark is a free and open source packet analyzer It is used for network troubleshooting analysis software and communications protocol development and education Wireshark is cross platform using the GTK widget toolkit to implement its user interface and using pcap to capture packets it runs on various Unix like operating systems including Linux Mac OS X BSD and Solaris and on Microsoft Windows We have used WireShark to sniff data transfer between Flash layout and the Chisio Web Server and tested whether data is transmitted successfully 6 5 Apache Server with Cpanel 11 Interface We have used Apache server with cpanel interface to deploy our server side files and to offer service to users Also we have monitored and teste
2. SOL Search lnsert Export import S Operations Empty Drop dille Database CREATE TABLE evaturke vs site info vs 1 url WARCHAR 30 NOT NULL name VALCHADE 30 NOT NULL links I07 10 NOT NULL xml file VARCBAR 15 NOT NULL erawl time INT 5 NOT NULL last crawled DATE NOT NULL E site info extra information VARUHAN 150 NOT NULL ENGINE MYISAM evaturke vs 1 Edit Create PHP Code Field Type Collation Attributes Null Default Extra Action r ul varchar 30 o latinl swedish ci No None S NE Eu T name varchar 30 latin swedish ci No None A R iri FT links int 10 No None PR T To xml file varchar 15 latini swedish ci Na None BE 2 xk ETD T DC crawl time int 5 No None PR in T TF last crawled date No None 2 X E T extra information varchar 150 latin1_swedish_ci No None x B A E T Check All Uncheck All With selected IE 2 X M E Sa Print view Propose table structure E e Add 1 field s At End of Table At Beginning of Table After url 60 AA No index defined Create an index on 1 columns Go Space usage Row Statistics Type Usage Statements Value Data 0 B Format dynamic Index 1 024 B Rows 0 Total 1 024 B Creation Feb 18 2010 at 04 14 AM Last update Feb 18 2010 at 04 14 AM Figure 6 PHPMyAdmin interface about our database 2 1 2 Layout Package Layout P
3. privacy and security reasons So Vortex Sentinel considers robots txt file to follow ethical rules 23 4 6 Health Constraints Our system does not constitute any impairment to health Feel free to use but no more than 10 hours 4 7 Safety Vortex Sentinel uses encryption to protect login information of the users and does not store this information after the crawling process We take care of implementation to prevent any vulnerability in our system 4 8 Manufacturability Since it is software project which is desired to be installed into a server and expected to be run on a server it has no manufacturability aspect since we decided not to make an additional client version of Vortex Sentinel that runs on client pc and does not reguire a server it has pretty high manufacturability aspect and does not reguire any cost 4 9 Sustainability Sustainability of Vortex Sentinel is guite satisfying that keeps serving to its clients without crashing We may limit the maximum of users served at a time according to results of stress testing so our system would not be crashed because of denial of service on the other hand we use a database system to keep track of website information which reguires to be optimized monthly to stabilize performance 4 10 Professional and Ethical Responsibility Vortex sentinel is responsible of ethical issues while crawling a website thusit can identify pages as not to be crawled by disallowing in a separ
4. Bilkent University Department of Computer Engineering Senior Design Project Vortex Sentinel Tool for Constructing Website Maps Final Report Supervisor U ur Do rus z Jury members Bedir Tekinerdo an David Davenport Project Group Members Serkan zkul 20601353 Fakih Karademir 20602294 Mehmet Yayan 20502090 Can Haznedaro lu 20602445 smail H zt rk 20500771 Contents NN Y A A n Na TA 4 1 2 Comparisons with the ExistingSystems 5 2 ARCHITECTURE AND DESIGN einsenden yalm adap sl een 8 2 1 GeneralViewofPackages no rro nor rnr cr nnnnnos 8 2 1 1 Database Package nia 9 2 1 2 Layout Packa une Nee 10 2 1 3 Parser Package eek 11 2 1 4 Crawling Engine Package eee 11 2 2 e pg ee 14 2 2 1 InterfaceDocumentationGuidelines 14 2 2 2 ClassesoftheSystem eee 14 3 FINAL STATUS OF THE PROJECT aus aaa iia 22 4 IMPACT OF THE ENGINEERING SOLUTION 22 4 1 ECONOMICS CONSTE iia 22 4 2 Environmental Constraints cintia 22 4 32 Social Constraints ii 22 4 4 Political CONSTAMES asirios 23 4 5 Ethical G RSE EN NE eee niv slam So i ka a a a li inline der 23 4 6 Health Constraints na een 24 EAS A 24 4 8 Manufact rability ann 24 ae Kolla Yan aaa uu a A 24 4 10 Professional and Ethical Responsibility 24 4 11 Low Co
5. EFERENCES 1 IVIS Bilkent University CS Department http www cs bilkent edu tr 1vis layout demo lw1x html 2 Web Crawler Wikipedia http en wikipedia org wiki Web crawler 3 WebSPHINX Carnegie Mellon University CS Department http www cs cmu edu rem websphinx 4 Web Crawler Polytechnic Institute of New York University CS Department http cis poly edu tr tr cis 2001 03 pdf 5 Crawling the Web University of IOWA http dollar biz uiowa edu pant Papers crawling pdf 6 Deep Web Crawl Cornell University CS Department http www cs cornell edu lucja Publications i03 pdf 7 Deep Web Wikipedia http en wikipedia org wiki Deep Web 8 Focused Crawling Indian Institute of Technology Bombay Department of CS amp E http www cse iitb ac in soumen focus 9 EffectiveWeb Crawling University of Chile CS Department http www chato cl papers crawling thesis effective web crawling pdf 10 Distributed Web Crawling Wikipedia http en wikipedia org wiki Distributed web crawling 11 Extensible Web Crawler University of Illinois MIAS http www mias uiuc edu files tutorials mercator pdf 12 DevNetwork Forums http forums devnetwork net 13 Dev Shed Forums http forums devshed com 14 Adobe Flex http en wikipedia org wiki Adobe Flex 35 11 APPENDIX 11 1 User Manual When user opened the website of our crawler http www evaturkey com CS491 crawler He can type down the URL he wa
6. ackage includes classes that are combined with flex tool This package basically manages visual components of the project by managing the creation of nodes and links in the graph It reads the data sent by PHP which holds the node and edge information then it construct the nodes and edges on Flash Layout The Integrration 10 between PHP and Flash Layout is critical that we have used many resources to accomplish that we have It requires advanced engineering skills that we got successful on this goal but another senior group of Ugur Dogrus z who are using the same graphical layout tool could not make the integration successfully 2 1 3 Parser Package 2 1 4 Parser Package includes classes that are used to search for links in given URL This package generally responsible for parsing all possible URLs Additional feature of that package is parsing the source code the find any search key that is given by the user Also it can find media files video music etc word documents and mail addresses Crawling Engine Package Crawling Engine Package includes classes that are responsible for main crawling by determining recursive crawling levels It has advanced options that user can set crawling level crawling algorithm Breadth First Search or Depth First Search that increases the functionality Crawling Engine Package works with the Parser Package that Crawling Engine Package send the html source code to Parser Package to find URL s i
7. ate text file in order to sustain its responsibility Vortex sentinel also crawls with the functionality as a design principle and uses several design approaches divide amp conguer approach top down strategy as a design solution 24 4 11 Low Cost and High Performance Our system is improved to be used for crawling up to several hundred pages per second which leads to millions of pages per run System is also run on low cost hardware It is extremely crucial that efficient use of disk access provides high speed with the help of main data structures such asthe structure that involves URL seen This situation can only occur when crawling several million pages 4 12 Robustness Since the system interacts with several millions of servers it is developed to be reliable against bad HTML and broken links strange server behavior and configurations and many other situations that involve crawling errors Thus the goal here is to avoid as many broken links and bad reguests as possible since in many applications program is going to download a subset of pages anyway Also the system is desired to be tolerant against any computer crashes or network interruptions since a crawler can take days or weeks Thus in any time the state of the system is kept on disk Since system does not reguire strict ACID properties it is appropriate that periodic synchronization of the main structure to disk should be used 4 13 Etiguette and Speed Control Syst
8. d our system by using the features of cpanel cPanel is a Unix based web hosting control panel that provides a graphical interface and automation tools designed to simplify the process of hosting a web site cPanel utilizes a 3 tier structure that provides functionality for administrators resellers and end user website owners to control the various aspects of website and server administration through a standard web browser 6 6 PHPMyAdmin phpMyAdmin is an open source tool written in PHP intended to handle the administration of MySQL over the World Wide Web It can perform various tasks such as creating modifying or deleting databases tables fields or rows executing SOL statements or managing users and permissions We have used PHPMyAdmin to make database integration of Xentinel and to test if our system works correctly 29 6 7 Webalizer The Webalizer is a GPL application that generates web pages of analysis from access and usage logs i e it is web log analysis software It is one of the most commonly used web server administration tools It was initiated by Bradford L Barrett in 1997 Statistics commonly reported by Webalizer include hits visits referrers the visitors countries and the amount of data downloaded These statistics can be viewed graphically and presented by different time frames such as per day hour or month We have used Webalizer in the testing phase of our system which involves crawling speed cont
9. d show the links but there is a missing point that they should also update links and be able to fix them e Because of their limited purposes they are not able showing various information about web sites like site rankings or thumbnails According to our observations from other crawler tools they have significant deficiencies and alsotheir use cases are so limited At this point by observing basic deficiencies of these crawlers we determined fundamental features and use cases of our tool and also extra abilities which we added Most crawler applications are designed and implemented with the following components The relations between the components are tentative URL reguest A fa x Storage System g gt Crawling World Application System Wide Web N N A downloaded pages Basic two components of the crawler Figure 3 Crawler components 2 ARCHITECTURE AND DESIGN 2 1 General View of Packages Our project is composed of three major packages that are Database Management Package Crawler Package and GUI Package The packages and their relation can be seen in Figure 4 Xentinel Crawing Engine Parser Node Calculator URL Content Page Database Information Collector 2 1 1 Database Package Database Package includes Information Collector Class which is responsible for holding node information in a database The table structure of the database is shown below CREATE TABLE evaturke vs si
10. em was designed to be able to control access speed in several different ways We have to avoid putting too much load on a single server we do this by contacting each site only once second unless specified otherwise It is also desirable to throttle the speed on a domain level in order not to overload small domains and for other reasons to be explained later Finally since we are in a campus environment where our connection is shared with many other users we also need to control the total download rate of our crawler Also crawling at low speed during the peak usage hours of the day and at a much higher speed during the late night and early morning limited mainly by the load tolerated by our main campus router To control the speed we added a crawling speed controller which sleeps the crawler after fetching the 25 html source code of every page another fact that we have limited the number of users 5 can get service from Xentinal at the same time 4 14 Manageability and Reconfigurability Suitable interface for monitoring the crawler is provided by the hosting company including the speed of the crawler and the sizes of the main data sets with the statistics about hosts and pages Admin is able to alter the speed and have the option of adding and removing components shutting down the system forcing a checkpoint and adding hosts that include broken links or bad reguests to the list of places that the crawler should avoid System is mod
11. esources and Internet resources used We have used Adobe Flash CS4 which our instructor suggested us in order to better implement graph component of our web crawler by using its graph library and its filter library 31 We also used HTTP and filtering libraries of PHP in order to extract the URL from the source code and mysol libraries in order to communicate with the database which we keep all the website information So that we have used mainly developer forums for ActionScript and PHP to aid us in our project You can find these forums the references section 32 8 GLOSSARY Term Vortex Sentinel User Site administrator Node Edge XML GraphML AS 3 0 PHP Parser Explanation The name of our system and tool The person who uses the Vortex Sentinel system via a Web browser The manager of a website who wants to see the map ofthe site via our tool Graphical element that represents a linked or referenced Web object within a particular website e g html file jsp file or jpeg image Graphical element that represents the connection between two Web documents within a particular website Extensible Markup Language designed to transport and store data Graphical Markup Language used to describe the structural properties of a graph and a flexible extension mechanism to add application specific data ActionScript 3 0 is a flash scripting language HyperText Preprocessor general purpose scripting
12. feature that our server keeps a dictionary to get semantic of given word to search and makes multiple searches besides from the given word it also makes searches about 3 most accurate meaning of that word them merges them all So Xentinel can give more accurate search results to user Another point is the safety and security that we implemented Xentinel in the safest way that database access is protected and tracked carefully so that system promise reliability and robustness 27 6 TOOLS AND TECHNOLOGIES USED 6 1 Adobe Flex Builder 3 Adobe Flex is a software development kit released by Adobe Systems for the development and deployment of cross platform rich Internet applications based on the Adobe Flash platform Flex applications can be written using Adobe Flash Builder or by using the freely available Flex compiler from Adobe We have used Flex Builder 3 for visualizing the web pages After filtering the html source of a page we construct the nodes from the links found then we sent these data to the Flex Builder and see the nodes and edges basically the relations interactions can be done with nodes by changing size position color label or you can connect or disconnect any node and edges that you want 6 2 Swish Max 3 SWiSH Max is a flash creation tool that is commonly used to create interactive and cross platform movies animations and presentations It is developed and distributed by Swishzone com Pty Ltd based in Sydney
13. ified after any crush or shutdown and fixs any problems that occur in order to continue crawling by using different machine configuration 4 15 Novel Solutions to Accomplish Project We have used divide and conguer approach to provide a design solution We have carried out design principles such as unity harmony and functionality during our project analysis and design We have used URL e mail and multimedia object identification and normalization while constructing a web crawler architecture In the project reports we have provided mock ups telling about the design principles used for identifying visual components such as color line type texture We have used a novel solution to the design problem of e mail and multimedia file extraction from source code of web pages which is not a functionality of typical web crawlers Also there were a design problem related with recording website information performance saving backups of websites for future usage and implementing extra features we used a novel solution by adding a database management system to accomplish these problems So we can keep track of every website save previously crawled website information increase performance of Vortex Sentinel and implement additional features easily 26 5 CONTEMPERORY ISSUES ABOUT AREA OF THE PROJECT Xentinel Vortex Sentinel is a web based crawling engine to get information about websites and construct visual web maps It crawls the given web
14. language Parser is a component in our project which scans and mines the html source code to filter undesired content 33 9 CONCLUSION Vortex Sentinel is a web crawler tool that aims to present a website as a graphical map We have tried to design and implement the tool within the scope of this purpose The system works with a graphical framework support and crawler part The graphical support tool was provided to us so that we can use some layouts for the website maps And we have implemented web crawler part with the help of PHP programming language The communication between the crawler and graphical parts was yielded with the help of XML technology For the crawler part we have used a crawler algorithm which starts from a webpage then finds the links coming from this page Then the founded pages processed with the same operation And visited pages are kept in order to prevent redundant crawling operations The graphical part takes the website map elements from an external XML file which is produced by the crawler part at runtime The specification in the XML file read and accordingly the graph or the map of the website is presented Nodes and edges are placed with relative information such as the name ofthe pages In this final report we have mentioned about the final process throughout the project Improvements in the tool are included System architectures are given Software packages are given and explained 34 10 R
15. les and internal references i e the links within HTML lt a gt tags will be represented as the graphical components of a visual website map Users and webmasters will be able to overview websites see the documents nodes and the links edges distribution The tool should facilitate the works of webmasters especially They will be able to view the link statuses between pages e g they will have the chance of viewing broken links Another important facility expected from our tool is that it should list the e mail references within a particular website so that the authoritative users can take action for preventing spam mails Multimedia object list videos SWFs MP3s etc within a web document such as HTML file will also be provided via Vortex Sentinel tool In our application the graph to be displayed after crawling a particular website will be consisting of nodes which will represent the inter referenced website objects e g HTML PHP JPEG files etc and edges which will represent the references among these objects This graph is the main output of our application so it should be displayed well and designed consistently When we think of large websites to be crawled the graph should definitely have a well designed layout algorithm to display the nodes and edges effectively For constructing graphs in a proper way we will use a layout scheme which is available to our project group by our supervisor It is called CoSe layout The crawling pr
16. ncluded in that source code Then Parser Package send these URL s the Crawling Engine Package and it adds these links the a gueue structure to keep crawling process After that Crawling Engine Package selects the next URL in the gueue and fetches its source code then sends it to Parser Package to do the same process recursively 11 You can examine the component diagram of Xentinel tool for better understanding Figure 5 Component Diagram When Vortex Sentinel applicaiton is launched the users are expected to enter input URL CrawlingEngine takes this URL and tries to extract the source code of the base file e g index html to trigger the actual crawling process Then it sends the generated code to URLParser Regular updates are made in related parts of database components CrawlingEngine also collects the cumulative data to determine and form node packages which include parameters like node info edge info etc These packages are sent to LayoutManager and via VisualMapManager overall website map is displayed The general class diagram of our proposed solution tool is as follows 12 tau ilme sap PpS Wi 2 2 Class Documentation 2 2 1 Interface Documentation Guidelines The format of the class documentation is located below 2 2 2 Classes of the System CrawlingEngine php Page php Link as Operations edgeColor Stri
17. nd sites cannot be visible They spend too much time usually in minutes for crawling and representing visually especially in big websites In most of them there is no search option which provides efficiency and earns time to user They spend too much computer memory RAM for crawling and for example after a while user do not work on another thing Generally they do not have user friendly interface their menus are ineffective and limited EB Crawier Workbench websphinx Crawler File Crawl Advanced gt Crawl the subtree v Starting URLs http www bilkent edu tr Action none v Start Graph outine statistics Options Tear Of px r ze Va SL SS SSS ST i eh Se ja fe OS I Firefox A Microsoft Office gt N 2 Windows Gezgint YE 3 WinRAR arsiv yO gt TR EEK ww lt m Cumartesi 4 31 10 2009 bx DailyWorknPromoti SegDiag vpp VP U J Crawler Workbench EM Crawler Workbench VW aqdum jpg Paint Figure 2 Menu of Web Sphinx crawler tool They are platform dependent and most of them work on Windows environment so they cannot be suitable for every user Most of them work as executable file so they require downloading and executing in each time e They only show links in web pages and some little features however they do not have updating and fixing invalid link options In other words they are likely tools which can do only crawling an
18. ng edgeType int attribute refresh Refreshes a link 21 3 FINAL STATUS OF THE PROJECT During two whole semester our group have completed nearly all reguirements that we had proposed with little exceptions Vortex Sentinel works as inteded by searching links of the given website and transposes them into a connected graph with the help of Flex Vortex Sentinel is also able to multi level crawl first hand founded links recursively 4 IMPACT OF THE ENGINEERING SOLUTION 4 1 Economic Constraints Our system is able to run with low cost and high performance that we need only a 24 hour online server to serve clients properly After installation of necessary software to our server there will not be additional cost to make Vortex Sentinel Online There may be slight monthly maintenance cost of the server which can be ignored so we can keep performance cost ratio 4 2 Environmental Constraints It has no impact on physical environment but has a considerable effect on digital environment because our system crawls web pages of a domain swiftly so target server which holds that domains should be powerful not to crash Vortex Sentinel uses encryption to protect login information of the users and does not store this information after the crawling process We take care of implementation to prevent any vulnerability in our system 4 3 Social Constraints Vortex Sentinel helps user to spend less time on web sites by providing wh
19. nts to crawl and chooses any advanced search option he wants User may search for a keyword in the crawled pages he may adjust the recursive crawling level and he may change the crawling type to either Breadth First Search or Depth First Search as advanced crawling options Az az gt g EW g p 4a q E M e vi htc bilkenteduti ugur di S N eo Em feeling Lucky Lucky A a Search i Crawling Level Crawling Type 3 After user chooses his advanced crawling options and clicks Vortex Crawl button our crawler goes to the URL that is given as input and starts find links to other URLs and lists all URLs that it has found to our website 36 Most importantly our crawler can visualize found URLs as connected graph rather than lists if user clicks View Visual Map Once user views the map new window appears which contains the connected graph of the crawled URLs 37 ki A Pic LoS Label hit edu tr wugur Cluster o 38 A mock up screenshot that we designed at the beginning of our project that you can compare we designed and implemented http how bikent edu tr JEEE Ku A M 7 RE ARAARA RASE GA SS View e mall list View MM list 39
20. ocess generation of the graph layout and dynamic actions performed by users such as adding a new node should be accomplished in a reasonable time interval These processed are considered to be made as fast as possible The system should also answer to the dynamic changes in the visualization part at high speed The application is considered to work in almost all platforms which have the necessary supporting components such as Adobe Flash displaying and a gualified web browser The crawling process and generation of information set for constructing graphs are considered to be server side operations Actually this will provide 4 more efficiency rather than a desktop application Our system will have good user interfaces to facilitate the user activities 1 2 Comparisons with the Existing Systems Although the concept of Web Crawling is a well known and comprehensive topic in computer technology sector there is no widely used and known software which provides various reguirements in this area There are some software tools like WebSphinx and PHPCrawler which work according to different aspects such as keyword searching These crawlers have some deficiencies as follows e Graphical representations of links are not good enough and comprehensive Figure 1 A visual link representation of WebSphinx crawler tool They do not give the whole link map because ofthe inefficient Algorithms Some of them are not recursive so some links a
21. ole site as a connected graph in order to observe easily any particular site Since the only interaction is between user and computer systems it has no other social aspect 22 4 4 Political Constraints Our system struggles to make objective crawling as much as possible Since our system is a useful tool for crawling webistes in order to create a connected graph of the links betweeen them it has no political constraint 4 5 Ethical Constraints Vortex sentinel is responsible of ethical issues while crawling a website thus it can identify pages as not to be crawled by disallowing in a separate text file in order to sustain its responsibility Vortex sentinel also crawls with the functionality as a design principle and uses several design approaches divide amp conguer approach top down strategy as a design solution Vortex Sentinel takes account of privacy and ethical rules while crawling process the websites which is made by checking robots txt first before starting to crawl Webmasters may put a robots txt file in to the root folder of their websites to indicate which url s desired not to be crawled A sample robots txt file www ornek com i in robots txt dosyas User agent Disallow cgi bin Disallow images private Disallow private html As we see from the example above robots txt file states that cgi bin folder images private folder and private html file should not be crawled for
22. rol and to improve our crawling engine 6 8 Ulead PhotoExpress Ulead PhotoExpress is a graphics editing tool which is developed by Ulead Company We have used that tool to design backgrounds button icons and graphical effects 30 7 USE OF RESOURCES During design and implementation of our project we found beneficial information on some resources There are 2 main kinds of resources these are open source resources such as websites and the books which are related about ActionScript 3 0 and PHP Beside our experienced friends about ActionScript and PHP helped us when we had a problem about implementation 7 1 Open Sources During our project implementation we mainly use Internet resources to get any kind of help and idea We mainly used the official site of PHP and ActionScript In addition to these in PHP forumsthere are many people who are really interested in PHP applications and have some problems about PHP and we utilized them during implementation 7 2 Books Moreover we used some books which are related about ActionScript 3 0 and PHP during implementation of our project Some books are e PHP Bible 2nd Edition by Tim Converse and Joyce Park e Programming PHP by Rasmus Lerdorf Kevin Tatroe and Peter Maclntyre e ActionScript 3 0 Cookbook Solutions for Flash Platform and Flex Application Developers by Joey Lott Darron Schall and Keith Peters e ActionScript 3 0 Bible by Roger Braunstein 7 3 Library R
23. site stores the information about website in database and construct a visual web map ofthat site Besides Xentinel gives support of making search gueries on that website To compete with other crawlers our crawling engine should have features of fast crawling reliable resulting every time fast and proper parsing of the source code also should be robust while crawling because scanning a website isa complex and long process that website can have thousands of pages and each page can have many lines of html source code that we have search all of them and mine that source codes Todays most popular searching engine Google is the outstanding at many of these issues it makes very fast searching among billions of web pages and bring result to you in a few milliseconds Our crawling engine gives a satisfying result on most of these issues and is improved on performance to give a guick and proper result As an extra feature our crawling engine has fast searching feature that user can make guerires on our database and get a guick result On that matter a new featrre semantic searching is become popular among all search engines Semantic searching makes a search not only based only the guery word it also makes another gueries about the meaning of that word so that semantic search gives you more accurate results At that moment there is no crawling engine which integrated semantic search completely To compete with other crawlers we designed semantic search
24. st and High Performance aa 25 412 RO DUSTING Srta 25 4 13 Etiguette andSpeedControl eee 25 4 14 ManageabilityandReconfigurability 26 4 15 Novel Solutions to Accomplish Project 26 2 5 CONTEMPERORY ISSUES ABOUT AREA OF THE PROJECT eseese 27 6 TOOLS AND TECHNOLOGIES USED idrara ya sela aa daki na 28 6 1 Adobe Flex Bullder stas dad wech 28 B2ESWEN MI and 28 6 3 Rapid PHP EI ar dad 28 6 4 Wireshark Network SOM aos 29 6 5 Apache Server withCpaneliilinterface 29 6 6 PAPMYA Morada riada Seli koja Selale le ira 29 0 7 Wedallzen sa aaa deal eek iin 30 6 85 Ulead PROTOEXPIESS nn ernennen 30 ISE RESOURCE ES steel li 30 Pata OPEN SO UCSI Sam Sa as ll uid Sp avaneda 31 Free BOOKS imi blm a era coe een oan sl ia a a 31 7 3 Library Resourcesandinternetresourcesused 31 Se GLOSSARY A CE avada alaalia III 33 GEO NCIS LOIN ara Zen 34 106 REFERENCES see Ran 35 LE APPEND IS Hs ar ei sie 36 iii User Mala yad a Nee 36 1 INTRODUCTION The tool that we have been trying to design and implement is to construct visual website maps It is called Vortex Sentinel Its basic purpose is simple presenting an elegant web crawler tool for a wide variety of users The application is considered to take website URLs as input After crawling process internal web documents such as HTML JavaScript PHP fi
25. te info url VARCHAR 30 NOT NULL name VARCHAR 30 NOT NULL links INT 10 NOT NULL xml file VARCHAR 15 NOT NULL crawl time INT 5 NOT NULL last crawled DATE NOT NULL extra information VARCHAR 150 NOT NULL ENGINE MYISAM Field Type Collation Attributes Null Default Extra Action FI uri varchar 30 ai nad No None 30 utf8 unicode No None A T r name varchar 30 ut node r links int 10 No None E T r xmi file varchar 15 ea No None A r crawi_time int 5 No None u r last_crawled date No None T E extra informa varchart1S0 ut unieode No None ER ES Figure 5 Structure of site_info table Attributes url States the URL of the node name Indicates the name of the node links Indicates the number of the links that the source code of given url contains xml file XML Node structure file path of the given URL crawl time Time elapsed while crawling last crawled the last date of crawling the given URL extra information Miscellaneous information about the URL EEE com localhost evaturke_vs site info phpMyAdmin 3 2 4 Microsoft Internet Explorer provided by BCC File Edit View Favorites Tools Help bak gt OA A GQsearch igravories media gt By a Address Address E http fevaturkey com 2082 3rdparty phpMyAdminjindex php 760 Links BE phpMyAdmin ES Server localhost gt Database evaturke vs gt El Table site info EEBrowse Structure

Download Pdf Manuals

image

Related Search

Related Contents

Massive Table lamp 12321/01/54  Panduit RJ12  Speakman S-1182 Instructions / Assembly  SLIMMY_2500 digital180x236 - Servizio Assistenza Tecnica Polti  opERAToR`s MANUAl  取扱説明書 - アルファックス・コイズミ  47WX50MFB User Manual  

Copyright © All rights reserved.
Failed to retrieve file