Home

Universiteit Leiden Computer Science

1. esses 21 2 9 3 Conclusion of FilterdImageCrawler esses 24 Chapter 3 Website for UML Database Collecton 25 3 1 Improvement to Distributed Web Crawler sss 25 3 2 Website for Database Collection ssssseseseseseeeeeeenee nennen 25 3 2 L Introd ction of Drupal 4e inm aea md te n As 26 3 22 Website Design escis eR HER RR E REUS 26 3 2 3 Future Work for the Website eesesssssseeeneeneeene eene 27 3 3 Change of mageCrawler eese ener nennen nnns 27 Chapter 4 Website for XMI Storage and Ouer 28 4 T Overview Ob XML is oret a ae eerte eet re e ert 28 4 1 1 Overview of XML Schema 29 4 2 Overview ot XML e PR A O tele eos he ee a a e 30 4 2 1 XMI vs Relational Database eu 30 4 3 Related Work etur rU aae ut ds 31 4 31 XML SOTA LE ione PR IE PR Po p e qu Deeg 31 4 3 2 XML Query gefemmt i noe tree ee TE e Een P UP nee Gr notis 31 4 4 XM I Databases et RARIOR enlace EEN EE 32 441 Database Design icine e a ai S EE NUES 32 RTE 34 4 5 Website Design for XMI Storage and Query sse 34 Z5 I XML Storage oi tere re ee rU p tU Ur red per Erde 34 A S 2X IR 35 dE EE ER Chapt r 5 E Ee EE 37 Appendix A User Manual for mageCrawler essere 38 A Image Downloading Part t ei sts Reads aon Rese AR STORIE 38 A 2 Domain Rank Part x e See hen EROR REO Bem 39 Appendix B User Manual for Filter4ImageCrawl
2. All the URL in the database VRL Black List 41 Figure 19 The initial interface of ImageCrawler A 1 Image Downloading Part The image downloading part is to download images from the result of Google image search based on the keywords the user has input Then the information of those images will be saved into an access database Step 1 Select the database that will restore the information of images to be downloaded There are two buttons to the right of the Database textbox providing two ways of the database selection The Create New button can create an empty access database file that contains all the five tables of our designed database pic picblack picwhite blackcount whitecount The new access file will be created in the same folder with the executable program file It will be named pic with the current date and time attached to it For example if the database is created on 18 30 25 December 21 2012 the name will be pic20121221183025 mdb In this way the conflict of duplicated file name will be saved Another way is to click the Browse button to select an access file that already contains the five tables After the user has selected a database file the program will check whether the database has the tables and keys required by the program When a database file is selected or created the full path will be shown in the Database textbox 38 Step 2 Select the folder the images will be saved
3. The mageCrawler program is implemented by a PC which does not have the ability to deal with great capacity of information from the Internet A web crawler of high performance can be reached by distributed system P Most search engines have different IDCs Internet Data Center Each IDC is in charge of a range of the IPs Within each IDC there are many instances of web crawler working parallel There are two methods used by the distributed web crawler 1 All web crawlers of the distributed system rely on the same connection to the Internet They co operate to fetch data and combine the result within a local area network The advantage of this method is that it s easy to expand the hardware resource The bottleneck will appear in the bandwidth of the connection to the Internet that are shared by all instances of web crawlers 2 The web crawlers of a distributed system are far away from each other geographically The advantage of this method is that there is no problem about the insufficiency of bandwidth However the conditions of the network differ from place to place Thus how to integrate the result of different web crawlers is the most important problem 3 2 Website for Database Collection As the hardware for building a distributed system 1s not possible at the moment a website can be established to collecting databases from users who have downloaded the mageCrawler program The website provides the download link of the program Wh
4. The path is shown in the Save Path textbox There is a default value C UML class diagram The user can change it by clicking the Browse button next to the textbox Step 3 Input the key words to search The default keyword is uml class diagram The user can input his choice Step 4 Set the number of images to be downloaded The default number is 100 The user can input his choice Step 5 When step 1 to step 4 is finished the user can start the downloading process by clicking the Start button The program will first check if the database save path keywords and number of images have been set If not there will be a hint to tell the user to finish that first The downloading process may take a while to finish It depends on the number of images the user wants and the condition of the user s Internet connection When finished a message box will pop up to acknowledge the user A 2 Domain Rank Part The domain rank part implements an analysis function The user can build a blacklist that contains the images not related to the keywords Then he can get the statistical data of the domains What s more the program can give an output of the database in the form of csv files Function 1 Load the database Click the Load Database button to choose a database After a database is chosen the program will check whether the tables and keys of the database meet the requirement Then the URLs of the images will be loaded The
5. 1 1 Overview of UML Unified Modeling Language UML is a standard modeling language that was created in 1997 by the Object Management Group Since then it has been the industry standard for modeling software intensive systems In UML 2 2 there are 14 types of diagrams They are divided into 3 categories structure diagrams behavior diagrams and interaction diagrams Here s the table of their role in the UML standard Category Name Role Structure diagrams Class diagram Contains the attributes methods and relationship between the classes of a system Component diagram Composite structure Splits a system into components and shows the dependencies among them Shows the detailed structure of a diagram class and the collaborations that this structure makes possible Deployment diagram Describes the hardware deployment of a system Object diagram Shows a complete or partial view of the structure of an example modeled system at a specific time Package diagram Splits a system into different packages and shows the dependencies among them Profile diagram Shows the classes and packages at the metamodel level together with the extensions of them Behavior diagrams Activity diagram Shows the detailed workflow of the components in a system State machine diagram Describes the states inside a system and the transitions among them Use case diagram Describes the interact
6. Universiteit Leiden Computer Science A System for Collecting and Analyzing UML Models and Diagrams Name Chengcheng Li Student no s1026968 Date 13 08 2012 1st supervisor Michel Chaudron 2nd supervisor Bilal Karasneh Contents List of EE 2 Fast ot Tabless een Sege ed nni e I eR EE 3 Abstract ERE RERO RERO ERROR Odo 4 Chapter T Introduction ern ORE e REG HR MM 5 LJ Overviewnof UML atten uet edic utn eive a E EE Pita 5 1 2 Overview of UML Class Diagrams nennen nnne 6 1 3 Overview of Our Project sciet IURE ERR ENEE 7 Chapter 2 image Crawler d castes te ERREUR GR RN E eet 8 GT Overview of Web Crawler ide eH E We eite teet 8 2 2 Related Work sis itte im tp et etn e ave AERE 10 2 3 Software Requirements etie ee tet a eet ie 11 Di Why Google ien te ot ee DNO Oe CIO A 11 2 5 Implementation of mageCrawler esses eene 12 2 5 1 Database Design RR RR RH ROW S 12 2 5 2 Image Downloading Pat 14 2 5 3 Domain Rank Part 5 ento tod eO USER OU e ie 16 2 6 Validation i ette i REGINE TRIP UR EE S aT ad T dta 17 2 6 1 Source Code with UML ni teca eae ee Pede edes 18 2 7 The Limitation of mageCrawler esee eene nennen 18 2 8 Focused Web Crawler tee Rep petiere 18 2 9 Perceptual Hash Algorithm Ar tette tenete ere i o Ane ee 19 2 9 1 Implementation of FilterdImageCrawler eee 20 2 9 2 Validation of FilterdImageCrawler
7. attribute The id of the class thi owner varchar 200 j S E attribute belongs to The id for thi xmi_id varchar 200 ue Se 3 attribute The date and time thi datetime varchar 20 e R 3 RSEN item is generated The URL of the url varchar 200 source image of this attribute Table 7 The table to store attributes tblOperation Key Type Comment name varchar 200 Name of the operation visibility varchar 20 Public private Whether it s an isAbstract varchar 20 abstract operation The id of the class this owner varchar 200 operation belongs to di The id for this xmi_id varchar 200 B operation The date and time this datetime varchar 20 item is generated The URL of the url varchar 200 source image of this operation Table 8 The table to store operations tblAssociation Key Type Comment N fth name varchar 200 ENS b association visibility varchar 20 Public private y The direction of this ordering varchar 200 NJ association The type of the aggregation varchar 200 aggregation The id of the association varchar 200 association this item belongs to The id of the class this typeofAsso varchar 200 K association points to The id for thi xmi id varchar 200 E operation 33 The date and time this datetime varchar 20 item is generated The URL of the url varchar 200 source image of thi
8. implementation class P OptimizationProblem OptimizationProblemComparison U double currentbest OptimizationProblemSolution activeproblems OptimizationProblem M double nodesGenerated long elapsedTime double opc solve unspecified selectProblem unspecified getElapsedTime double getNodeCount long compare boolean interface OptimizationProblem getRelaxation OptimizationProblem getSolution OptimizationProblemSolutio isFeasible boolean branch OptimizationProblem performReduction Figure 10 Example of black and white normal UML class diagram Result 45 69 of the result is the same as in the standard list Experiment 3 Black and white complicated UML class diagram Example image che Cache LRULinkedTableCache cheUpdatePriority Integer Thread MIN Search Request HioadFromCacheAndDB List Search Result List of Keys interface Cacheable IsicadFromids Collectio interface Cache IgetMaxSize Integer A s A 1 implementation class LRULinkedMapCache waxSize Integer 0 failedGetCounter Integer 1 successfulGetCounter Integer 1 getMaxSiza Integer put get Object j removeEldestEntry Boolean implementation class LRULinkedTableCache cache LRULinkedMapCach
9. 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29 Figure 30 Figure 31 Figure 32 Process of BFS Process of URL queuing Interface of ImageCrawler Class diagram of ImageCrawler Links of pages of Google image search Class diagram for Filter4ImageCrawler Interface for Filter4IlmageCrawler Example of black and white simple UML class diagram Example of black and white normal UML class diagram Example of black and white complicated UML class diagram Example of colored simple UML class diagram Example of colored normal UML class diagram Example of colored complicated UML class diagram An example of XML file An example of XML Schema The architecture of MOOGLE The interface of XML2DB The initial interface of ImageCrawler The interface of mageCrawler in process The initial interface of Filter4ImageCrawler User login page User registration page Request new password page Upload page My settings page Databases page Database showing page Initial interface of XML2DB Upload page for SQL files Search models page Query of URL page List of Tables Table 1 14 kinds of UML diagrams Table 2 Relationships of class diagrams Table 3 Process of operation in a queue Table 4 Design of the database Table 5 Comparison of XML and relational database Table 6 The table to store classes Table 7 The table to store attributes Table 8 The table to store operations
10. 5 1 XMI Storage Taken safety into account we only let the administrator of the website to upload the SQL file generated from the XML file After the SQL file is uploaded the website will abstract the commands inside and execute them Thus the content of the XML file is written in our database 34 4 5 2 XMI Query We provide a query function for the user of the XMI database There are two ways to make a query of the XMI database e Search by URL After a search of a database of the information of images the result page will show a list of URLs of the images The user can select by choosing the checkboxes in front of the items to search for all the information classes attributes operations and associations contained in the selected images The result page will show the images and all their information The information of each image will be shown in four tables class attribute operation and association e Search by keywords The user can input keywords and the models containing the keywords will be selected We provide all the information of the images containing the models to the user If the user inputs more than one keyword we take the keywords separated by space as the relation AND So the result should be items containing all the keywords The user can use OR to connect keywords Then the result will show models that contain either of the keyword connected by the OR condition By default we provide a fuzzy enquiry
11. 64 results on 8 pages There are tools based on this API and they can only get at most 64 images It s far from enough Firefox has a plug in named Save images that can save images from the current tab page the user has opened The problem is that the result of Google image search shown on the webpage is just thumbnails When we use the add on in Firefox all we have got are just thumbnails from this one page The images downloaded are too small to use What s more the tool can only download images from pages that have opened in the browser It s not possible to open all the pages of the list of images and download them one page by one page Bulk Image Downloader is a software that can download full sized images from almost any thumb nailed web gallery However it takes 24 95 to buy it Otherwise functions will be incomplete Thus we have developed our own software named mageCrawler to collect images from the Internet 2 3 Software Requirements The requirements of our software have two points 1 We want to get as many images as possible 2 We want to finish the process efficiently To finish the task we have to implement a web crawler for images all over the Internet A web crawler with high performance should have two features 1 It should be able to grab great capacity of data from the Internet That s why we use it 2 It should run on a distributed system As the quantity of data is extremely large in the Internet a
12. It is the parent node of any other elements An element can have attributes For example every lt book gt element has an attribute called category The value of an element appears between the start and end tags of the element For example Everyday Italian is the value of the first lt title gt element 4 1 1 Overview of XML Schema XML Schema describes the structure of an XML file It is also a standard of W3C It contains several aspects l An XML Schema defines the elements that appear in the XML file An XML Schema defines the attributes that appear in the XML file An XML Schema defines the sequence of the appearance of elements An XML Schema defines the number of elements An XML Schema defines whether an element is empty An XML Schema defines the data type of elements and attributes An XML Schema defines the default value of elements and attributes XML Schema uses the same grammar with XML Thus it is easy to understand and write Here s the XML Schema corresponding to the XML example above 1 lt xml version 1 0 encoding UTF 8 gt 2 Ej xs schema xmlns xs http f wer w3 org 2001 XMLSchema elementFormDefault qualified gt 3 lt xs element name bookstore gt 4 lt xs complexType gt lt xs sequence gt 6 lt xs element maxOccurs unbounded ref book gt lt xsisequence gt E xs complexType 9 m xs element 10 lt xs element name bo
13. Table 9 The table to store associations Abstract UML is a most widely used language for building and describing models There are 14 kinds of UML diagrams Among them class diagram is one of the most important in software designing Within a class diagram the basic elements and structure of a software system is described For the developers class diagrams are the guide of the process of development phase The standard of designing UML class diagrams is rather loose Various styles can be found from different developers When researchers want to get research on class diagrams they meet similar problem that there is no efficient way to collect large number of UML class diagrams to work in analyzing querying and so on As most of models on the Internet are images we focus on collecting them providing usability of storing and querying models that are saved in images and supporting reusability of these models In this thesis a software for collecting a large number of UML class diagram images from the Internet is developed It can download images of UML class diagrams efficiently and build a database that contains information about these images As a part of the project to transform models in images to XMI the paper provides a method of transforming XMI files into relational database Then a website is built for storing and online querying models Keywords UML class diagram web crawler XMI model search engine Chapter 1 Introduction
14. URLs in the table pic which contains all the URLs of the images will be shown in the list box on the left The URLs in the table picblack which contains all the URLs of images in the blacklist will be shown in the list box in the middle Function 2 View the images in the database When a URL of either the left or the middle list box has been selected the program will show the corresponding image in the picture box on the top right But before that the user must click on the Select image folder button to select where the images are saved Function 3 Manipulating the blacklist The user can manually add or delete items from the blacklist There are two buttons between the list of all URLs and the list of URLs in the blacklist By clicking the button with the arrow gt the selected URL in the list of all URLs will be added to blacklist The function of the button with the arrow lt can delete a URL from the blacklist Function 4 Ranking the domains After the blacklist is established the user may want to know which domains contribute many items to it After clicking the button Domain Rank the program will take an analysis of all the URLs that are in or not in the blacklist Then the domains of the images will be abstracted The program will make a list of how many images have been downloaded from each domain The result will be shown in the list box on the right side in descending orde
15. click on the Show chosen database button a new page will appear and show the user the corresponding content There is a setting of Show thumbnail When it s chosen and the database to be shown contains the URLs of images the thumbnail of an image will be shown when the mouse moves over the corresponding URL However this function may slow down the user s computer C 5 Database Showing Page Here s the screenshot of the database showing page Home ShowDB View Edit Submitted by admin on Sun 07 22 2012 12 45 The chosen table is picwhite Search for content id url Thumbnails r 1 http hi csdn net attachment 201101 18 0 129531961715s8 gif Figure 28 Database showing page This page does not provide a direct link for the users It can only be accessed via the Databases page or the My Settings page by selecting a database to show It shows the content of the chosen database If the database contains the URLs of images the thumbnail will be shown to the user In 45 the meantime the user can choose by clicking on the checkboxes in front of the items and clicking the Search for content button the information classes attributes operations and associations of the selected images will be shown to the user 46 Appendix D User Manual for XMI2DB This program aims at transforming XML files with models stored in to SQL files that contain Insert commands for our XMI database Here s the scre
16. for periods hyphens apostrophes and underscores E mail address A valid e mail address All e mails from the system will be sent to this address The e mail address is not made public and will only be used if you wish to receive anew password or wish to receive certain news or notifications by e mail Create new account Figure 23 User registration page If a user has forgot his password just use the Request new password function to get a new one 42 When the username or email address has been input in this page a one time log in link will be provided to the user via his email address Then the user can reset the password the same as when creating a new account User account Create new account Log in Request new password Username or e mail address E mail new password Figure 24 Request new password page C 2 Upload Page Here s the screenshot of the upload page Upload View Edit Submitted by admin on Fri 07 20 2012 10 15 Upload File Browse Please choose the type of table you are uploading pic It contains all the information of the images picblack It contains the url of images in black list picwhite It contains the url of images in white list blackcount It contains the domain information of the url in black list whitecount It contains the domain information of the url in white list Submit Download ImageCrawler This Program aims to build a database and download UML class diagra
17. for the user The result will include models that ec the keywords partly match For example if the user inputs edu models with education will be matched The user can choose the precise search mode by selecting the option on the page Then models with education will only be matched by the keyword education 4 6 Validation The advantages of our website are Simpler transformation As we have our own OCR process to generate XML files the transformation from XMI to relational database is simpler and faster Good efficiency Relational database is used to store the elements in XML files We split the models into four parts class attribute operation and association We sacrifice the space to get the advantage of querying efficiency Thus the query process is faster than searching through model files on the server Model images provided As our project aims at querying models that are saved in images users of our website will get the access of images we have collected from the Internet Presenting models in a graphical view is friendlier than by text even the text has been formatted Access for users to communicate Our website is based on a content management system so there s access for the users to make comments and communicate their ideas with each other For future development a smarter search engine can be developed Our project can also expand to 35 search with text files As no official stan
18. page provides the search function of the models in our XMI database Query in Models View Edit Submitted by admin on Thu 08 09 2012 12 25 Please input your keywords You can use OR to expand your keywords Please use brackets to split the keywords For example education OR agriculture T Precise Enquiry Mode Search Figure 31 Query in models page To use this function the user just input the keywords in the textbox and click Search button The result will be shown in the same page with the URL of the images corresponding to the models found The user can view the images instead of text to have a better impression of the models The keywords can be separated by spaces or connected by the OR keyword Keywords 48 separated by spaces will be regarded as the relationship of AND All the keywords should be matched in the result Keywords connected by the OR function will be matched at least one of them in the result If the checkbox Precise Enquiry Mode is selected the keywords will be precisely matched in the result Otherwise the result will include items that partly contain the keywords required E 3 Query of URL Page When a user chooses to search several images in the list of URL this page shows all the information stored in the XMI database of the images Here s the screenshot of this page Query of URL View Edit Submitted by admin on Sun 08 12 2012 19 05 BankAcc
19. the process is done there will be a list of URLs of images shown in the list box on the right side They are considered similar to the example image The user can view the images by selecting a URL in the list box 41 Appendix C User Manual for Database Collection Website C 1 User Control System We use the user control system that s provided by Drupal If a user has not logged in or registered he doesn t have the permission to use the functions A message will be shown on the webpage to let the user log in or register Here s the screenshot of the log in part User login Username Password e Create new account e Request new password Log in Figure 22 User login page The Create new account link can let an anonymous use create his account A unique username and email address is required for the user registration When those information have been provided a confirmation letter will be sent to the email address Within this email a link for one time log in is provided Considering the aspect of safety the link will expire after one day The user should log in via that link and set the password and other information Then the account will be activated to use the functions without restrictions Here s the screenshot of the Create new account page Home User account User account Create new account Log in Request new password Username Spaces are allowed punctuation is not allowed except
20. the starting node A queue is needed as the data structure to save the nodes The algorithm is 1 Starting node V is in the queue 2 If the queue is not empty continue 3 Node V is signaled as visited 4 Get the nodes that adjacent to node V and push them into the queue Then go to step 2 Here is an example Figure 2 Process of BFS In this figure we use node A as the starting point and the process will be Operation Nodes in the queue Ain A A out empty BCDEF in BCDEF B out CDEF C out DEF D out EF E out F Hin FH F out H Gin HG lin GI G out I Tout empty and end Table 3 Process of operation in a queue So the order that nodes are visited will be A B C D E F H G I The web crawler starts from a URL as the starting node If there is a link that leads to a file like a PDF file for the user to download it s a termination because the crawler cannot get links from it The process is to start from the initial URL put links inside it into the queue The links that s visited will be signaled visited If a link is already visited it will be ignored next time it s found The process can be shown by this picture New URL Initial URL URL visited Queue New Queue Visited Figure 3 Process of URL queuing 1 Put the initial URL into the Queue New 2 Get URL A from Queue New and get all links it has 3 If the links exist in th
21. the table of the design of this database Table pic Contains all the information of images Key Type Primary Key ID Auto increment Url Text Yes Width Text Height Text Pixelform Text Fname Text Comments Text isUML Boolean Table picblack Contains the black list Key Type Primary Key ID Auto increment Url Text Yes Table picwhite Contains the white list Key Type Primary Key ID Auto increment Url Text Yes Table blackcount Contains the domain statistics of the black list Key Type Primary Key ID Auto increment Domain Text Yes Count Text Table whitecount Contains the domain statistics of the white list Key Type Primary Key ID Auto increment Domain Text Yes Count Text Table 4 Design of the database 13 The interface of the program is as follows EB ImageCrawler LE Ox Database Dy Documents VE APic2OIZT30DG4D ndb B Create New Save Path EW dee Zeen Browse Keyword uml class dian Stet Number of Images fioo Ready Load Database Select image folder Domain Rank Generate csv All the URL in the database VRL Black List on Figure 4 Interface of ImageCrawler cn OleDbConnection savePath string cmd OleDbCommand keyword string reader OleDbDataReader numImage int ac AccessConnection Here s the class diagram buildConnection ImagePatiern str
22. two images by their fingerprints If two images have something in common the more unique it is for example the Eiffel Tower or the portrait of a movie star the more similar the fingerprints of the images are As for UML class diagrams the components are mostly shapes like rectangles triangles and diamonds together with some text They are not very unique features in an image Thus the fingerprint of a UML class diagram tends to be an ordinary one As a result the correctness of this program is not good enough because it mistakes some images as UML class diagrams that are not according to their fingerprints A suggestion for the future development of this program is to use shape recognition technology The fingerprints of UML class diagrams are the elements If the program can detect the shapes like rectangles with text inside arrows pointing to rectangles and so on the accuracy will be improved to a higher level After all this program is takes an example image instead of keywords to filter images It s difficult to reach a high accuracy 24 Chapter 3 Website for UML Database Collection 3 1 Improvement to Distributed Web Crawler There is a problem in mageCrawler that the number of images has been limited by Google image search the solution is to improve the hardware Because we have implemented this mageCrawler on a PC to start from zero is not efficient That s why we use the result of Google image search as the starting node
23. 02235 UMLClass 29 UMLAttribute 30 201281002235 1 3 pet into tblOperation values opr02 public false UMLClass 29 UMLOperstion 33 201281002235 Insert into tblClass values class3 public false UMLClass 34 201281002235 Insert into tblAttribute values atr l public UMLClass 34 UMLAttribute 35 201281002235 Insert into tblAttribute values atr02 private UMIClass 34 UMLAttribute 36 201281002235 Insert into tblOperation values opr01 public false UMLClass 34 UMLOperation 3T 201281002235 Insert into tblOperation values opr02 private false UMLClass 34 UMLOperation 38 201281002235 Insert into tblClass values class4 public false UMLClass 39 201281002235 Insert into tblAttribute values atr l public UMLClass 39 UMLAttribute 40 201281002235 Insert into tblAttribute values atr 2 private UMLClass 39 UMLAttribute 41 201281002235 Insert into tbl peration values Copr0i public false UMLClass 39 UMLOperation 42 201281002235 Insert into tblOperation values opr02 private false UMLClass 39 UML peration 43 201281002235 gt v Figure 18 The interface of XML2DB 4 5 Website Design for XMI Storage and Query There are two functions of the website XMI storage and query 4
24. acy problem that not all images downloaded are really UML class diagrams a way to solve it is to use the Perceptual Hash Algorithm 7 51 1P l The algorithm can generate the fingerprint for images and compare them The more similar the fingerprints of two images are the more similar the two images are The algorithm consists of several steps 1 To zoom an image into a small size For example zooming an image to a small square that the width and height are both 8 pixels So the small image is combined by 64 pixels Then the details of the image are eliminated Only the structure and level of brightness are preserved Thus the difference of images with different sizes is eliminated 2 Make the image a greyscale one 3 Calculate the average of gray value of all the 64 pixels 4 Compare the gray value of each pixel with the average one If larger than or equal to it the corresponding pixel is marked 1 Otherwise mark it 0 5 Now we have a table with 64 bit of 1 or 0 This is the table that represents the image When we want to know how similar two images are just compare the tables of them to see how many bits are the same 2 9 1 Implementation of FilterdImageCrawler Filter4ImageCrawler is the program that uses perceptual hash algorithm as a filter for the images downloaded by mageCrawler It loads the database that s built by the mageCrawler program locates the folder where the images downloaded are and takes an image as the
25. al blackcount It contains the domain information of the url in black list c whitecount It contains the domain information of the url in white list Figure 26 My settings page The page has been divided into two parts The top part is the general settings of the user Now here s only one option whether to share the user s database with the other users The user can click on the Save Settings button to save his choice The bottom part is for the user to view his database Choose one table and click on the button the chosen table will be shown in a new page C 4 Databases Page Here s the page of the list of the databases 44 Databases View Edit Submitted by admin on Sun 07 22 2012 12 08 databases list Show chosen database T Show thumbnail Warning this may slow down your computer Core Database pic It contains all the information of the images Core Database picblack It contains the url of images in black list Core Database picwhite It contains the url of images in white list Core Database blackcount It contains the domain information of the url in black list Core Database whitecount It contains the domain information of the url in white list usertable User No 1 pic usertable User No 1 picblack MI i icartahla l lear Ma 1 nicwhita Figure 27 Databases page In this page the core database and the databases that are shared by the users are listed Choose one of the tables in the list and
26. an be generated by the mageCrawler program If it s the first time a user uploads the database his database containing the five tables will be created on the server The five tables will be named after the user s id Thus the users databases won t have the conflict of the same names What s more using user s id instead of username although unique to name his database can give the most privacy for the user The function to show the users the core database and databases of other users A user can choose whether his personal database is allowed to be shown to others When a database contains the URL of images the user can choose whether to show the thumbnails in the webpage If chosen the corresponding thumbnail will be shown dynamically when the mouse moves over the URL of an image However the thumbnail showing function will take some resource of the user s system which may slow down his computer 26 3 2 3 Future Work for the Website Till now the function of the website is mainly for database operation Drupal is originally a content management system So our website can be extended to a community for the users to make comments for the databases and exchange ideas with each other There are two drawbacks of the website 1 When the user uploads his database the csv file will be uploaded Although there s a function to check whether the file type is csv there s no way to check the content of the file If it s not in the same format as t
27. art 20 The number of images starts with 0 which means the second page shows the images from 21 to 40 There are other parameters such as hl en which means the language of this page is in English They are not important for this process so we ignore them Thus the regular expression of links that point to other pages of list of images is search u003Fq a zA Z0 9 amp start 0 9 a zA Z0 9 amp When we have found a link just add http www google com in front of them and the link is fully constructed Then the link is stored in a list that has all the links found The second regular expression we have to construct is the URL for images This is relatively an easier one The URL for images should start with the http s and end with an extension of image file So the regular expression is http s 2 w w Gpgljpeg pnglico bmp gif When we have found a link to an image we first check whether it is in the database which means it has been downloaded before If not we save it to the list that contains all the URL of images found Then there is the process to download it 2 5 2 3 Image Download After we have got the URL for an image we can download it using HttpWebRequest to get the data stream of this image Then save it as an instance of the System Drawing Image object The next step is to save the information of the image into our database We use the Oledb namespace to implemen
28. atabase It s useful for the following steps of digital image processing Some methods can only apply to specific kinds of images For example the ExhaustiveTemplateMatching class which implements exhaustive template matching algorithm and can be used to detect shapes in the UML class images can only be applied to grayscale 8 bpp and color 24 bpp images Another key we have added to the database is isUML It is a Boolean value that shows whether the image is an UML class diagram Although the key words we have input are uml class diagram no search engine can ensure that all the images that found are strictly uml class diagram So there should be a key that shows which images are UML class diagram that can be used in other operations As not all images downloaded are UML class diagrams there is a list needed that can save the URLs of such images to save the trouble to download them again next time Thus a blacklist is added into the database together with a whitelist The table picwhite and picblack will store the urls of the images that are are not UML class diagrams Not only URLs of the images but also the domains are interested by us We want to find out which websites can provide UML class diagrams more than others There are two tables named blackcount and whitecount that store the domains where the images of the black list or white 12 list come from and how many images have been downloaded from each domain Here s
29. bsite Web developers around the world have contributed a lot in making it more and more powerful At the same time it has a useful access control system Different permissions can be provided for different users of the websites built on Drupal Last but not least it is appreciated by graphic designers because the appearance of Drupal websites is flexible to adapt Various theme templates are available for different requirements 3 2 2 Website Design The target of our website is to set up a database that contains the index of images of UML class diagrams The functions of our website are 1 User control system People can register and log in through this user control system The only requirement is a unique username and the email address A confirmation letter will be sent to the email address after register to provide the user with a one time log in link Through this link the user can set his password portrait and other information The user control system can let the users build their own databases and protect their privacy The user can also choose whether to share his database with others The download link of the mageCrawler program for the users to build their own database of images of UML class diagrams The function for the users to upload their databases and merge to our core database Our core database contains the entire index that the users have collected The required format of the upload file is csv which c
30. d explicitly For example the element book in the XML file has four elements embedded so in line 11 and 19 it is described as a complex type 4 2 Overview of XMI XMI is short for XML based Metadata Interchange It provides a method for the developers to exchange the metadata by XML XMI integrates three industry standards They are XML UML and MOF Meta Object Facility an OMG Object Management Group language for specifying meta models P 4 2 1 XMI vs Relational Database Our project uses XMI as the output of the OCR process for images of UML class diagrams We have built a website for XMI storage and query First there s a comparison of the difference between an XMI file and the relational database XMI RDB Data storage File system Database system Data structure Only related to logical Only related to logical structure structure Pattern description DTD Schema Relational patterns Logical interface SAX DOM ODBC JDBC Query language XQuery SQL Safety level Low High Concurrent query No Yes Usage Standard for Internet data Method for storing and exchange querying for data Operating language No standard language SQL Internet usage Directly Need application Data description Structural or semi structural Structural Table 5 Comparison of XML and relational database From the table we can see that XMI is really a good method for exchanging data through the Internet H
31. dard of XMI for models is established many configuration files will be needed to transform from different model files to our standard in order to insert into our database The website can be expanded to a community for researchers in this field to communicate and exchange ideas 36 Chapter 5 Conclusion In this project we have built a system that establishes databases of images of UML class diagrams from the Internet turns the images to XMI models via image processing technology stores the models into relational database and queries within the models A website is established to share our results It can also let the users make comments and communicate with each other For future research the programs we have developed is planned to be written in the form of web applications and integrate with our website to make a complete online system for collecting recognizing and querying models that are presented in images More efficient web crawler algorithms can be developed to improve the percentage of UML class diagrams abstracted by the ImageCrawler program 37 Appendix A User Manual for ImageCrawler This is the user interface when the program starts The top part is the image downloading part and the bottom part is the domain rank part E ImageCrawler Browse Create New Browse Keyword bi class diaran Steet Number of Images 100 Ready Load Database Select image folder Domain Rank Generate csv
32. date the actual result may be less So to solve this we use several key words for search to build our database Another weak point is accuracy As the searching process is motivated by key words the images found are those with descriptions that include the key words Thus not all images found are really UML class diagrams Some of them maybe the screenshot of a presentation named UML class diagram or a photo of a book that relates to UML class diagram 2 8 Focused Web Crawler A way to improve the efficiency of a web crawler is the focused web crawler Different from normal web crawlers a focused web crawler does not rely on a huge coverage of websites It mainly focuses on the websites that belong to a certain topic Thus focused web crawler is better performed for fetching data for the users with a specific topic 21415161 18 There are three aspects that should be concerned in designing a focused web crawler 1 The definition of the topic which the crawler is going for 2 The analysis and filter of the content of data fetched by the crawler 3 The method to search through the Internet In this case the topic is the UML class diagram As discussed above normal web crawler often uses breadth first search to explore the Internet The implementation of BFS is relatively simple In general websites of the same topic tend to gather together Thus BFS can be used in a focused web crawler Given the start node the content of no
33. des adjacent to it is considered belonging to the same area The drawback of BFS in implementing focused web crawler is that as the coverage becomes bigger more irrelevant pages will be fetched and downloaded What s more when the number of irrelevant pages is much larger than that of relevant ones the web crawler may be trapped in a vicious cycle There is another way to avoid this problem When the links within a page are detected the web crawler analyses them before downloading the data The analysis is to find which links are most relevant to the topic and ignore the others The efficiency is better than BFS However the prediction cannot be 100 accurate Some relevant pages may be thrown away There are some methods to analyze the quality of a webpage In general within a website the links to the most important pages will be provided on its homepage or other places that are easy for the user to find If a page has been linked to by many websites it can be considered the one with high importance PageRank is a technique that is based on this idea On the other hand if the number of links to certain page is too large it may not be a valuable one because there is possibility that it s just a junk page that appears many times just to improve the PageRank value Focused web crawler can be embedded to our mageCrawler program However the time of webpage analyzing may slow down the efficiency 2 9 Perceptual Hash Algorithm As for the accur
34. e Synchronized Wrapper Cache 1 interface Timestamped V rgetid String getTimestamp Long interface java util Map og get Object D java util LinkedHashMap Pull get Object HremoveEldestEntry Boolean Figure 11 Example of black and white complicated UML class diagram Result 46 87 of the result is the same as in the standard list Experiment 4 Colored simple UML class diagram Example image 22 Figure 12 Example of colored simple UML class diagram Result 46 09 of the result is the same as in the standard list Experiment 5 Colored normal UML class diagram Example image Figure 13 Example of colored normal UML class diagram Result 47 55 of the result is the same as in the standard list Experiment 5 Colored complicated UML class diagram Example image Figure 14 Example of colored complicated UML class diagram Result 47 16 of the result is the same as in the standard list 2 9 3 Conclusion of Filter4ImageCrawler From the result we can see that the correctness the program can give is nearly 50 no matter the example image is black and white or colored or the example image is simple or complicated The results the program gives remain in a stable level but not high enough to be trusted The reason comes from the algorithm itself The perceptual hash algorithm compares similarity between
35. e Queue Visited ignore Otherwise put them into Queue New 4 Put the URL A into the Queue Visited Go to step 2 BFS is the most widely used method by the web crawlers The main reason is that when we are designing a webpage we usually put links to the pages that have contents related to the current page Using BFS can find them by first time A web crawler should work continuously Good stability and reasonable usage of resource is required Web crawler is a vulnerable part of a search engine It deals with web servers that cannot be under control of the search engine system A bug from one webpage may put the web crawler into severe situations like halt crash or other unpredictable problems Thus the design of a web crawler should take different conditions of the network into account 2 2 Related Work Other than the large search engines like Google and Yahoo it is difficult to get a large number of UML class diagrams in a short time Even if people use the search engines no downloading service is available to save them to the local disk There are several tools that are specially designed for downloading images from the Internet However they have drawbacks Some of the tools cannot meet the requirement of downloading images with a large number and full sized Most of them are commercial Google has provided an API for downloading images from the result of image search However the API has a limitation of getting a maximum of
36. e and more powerful filter for the images The most important reason we choose Google is that the URL of Google image search can be easily constructed with different parameters We can use Google image search in our program without going to the web browser first 2 5 Implementation of mageCrawler The program of mageCrawler does not rely on the API of Google image search It is the program that can download as many images as possible from the result of Google image search The key words can be set by the user as well as the number of images that he wants to get It simulates the process of downloading images from Google manually It implements the theory of web crawler gets the data from Google image search and downloads them automatically After that the information of the images will be saved in the database At the same time the program has the function to build a blacklist of URLs of images that are not UML class diagrams There is another function that gives a rank of domains of images to see which websites contribute how many images 2 5 1 Database Design The database 1s designed for keeping the information of images we have downloaded Most of the information of the images is saved in the table pic As URL is unique in the Internet we use the URL as the primary key The width and height as the basic attributes of an image is also saved into the database What s more there is the record of the pixel format of an image in the d
37. en a user has built a database he can upload it in the website After a database has been uploaded the website will read its content and write them into the database saved in the server that contains all the information from the databases uploaded It is an alternative way to the distributed system of web crawler If more and more people use the program to collect images and upload the URLs of the images together with the whitelist blacklist showing which URLs contain images that are not UML class diagrams and the statistics of the domain from the whitelist blacklist a database will be available on the server that contains the index of a large number of images that are UML class diagrams It s like the index of images built by a search engine only that people using the software take the place of web crawlers 25 3 2 1 Introduction of Drupal Drupal is a free open source content management system CMS that s developed by PHPP It s best used for building a backend system About 2 websites in the world are based on Drupal including http www whitehouse gov the website for the White House Here s a list that shows the famous websites that use Drupal as the framework http www seo expert blog com list the most incomplete list of drupal sites Drupal is combined by modules It is convenient to add or remove functions on it In fact rather than a CMS it is more regarded as a framework for a content management we
38. en shot of this program EFE Biel x Path of the XML file Start Figure 29 Initial interface of XML2DB Step 1 Open the XML file The user should click on the Browse button and select the XML file The path of the file will be shown in the textbox near the button Step 2 Start the transformation By clicking the Start button the program will abstract the classes attributes operations and associations from the XML files In the end of the process a message will pop up to inform the user with the path of the SQL file created The SQL file is named after the current data and time to avoid the problem of duplicated file name Step 3 Result of the process The SQL commands generated during the process will be shown in the textbox at the bottom part It s a friendly interface for the user to check the elements detected and transformed in the XML file Appendix E User Manual for XMI Query Website E 1 Upload Page for SQL Files Here s the screenshot of the upload page for SQL files generated by the XML2DB program uploadXML View Edit Upload File Browse Submit Figure 30 Upload page for SQL files It s a very easy function The user just click on the Browse button choose the SQL file generated and click the Submit button The SQL commands in the uploaded file will be executed For safety concern only the administrators have the access to this page E 2 Query in Models Page This
39. ent model search engine Here s the architecture of the Moogle search engine Query HTTP Query d ur We sere Ss ser User Interface Indexes Model descriptors Model Extractor e Model repositories database local file system web Figure 17 The architecture of MOOGLE The system consists of three parts the model extractor the searcher and the user interface The model extractor can find and read model files from different repositories It reads the content of the model files extract necessary content and generate model descriptors which will be used in the searcher function The searcher is based on an open source search engine Apache SOLR With SOLR an index for the model descriptors is established that contains all the necessary information of them The graphical user interface of Moogle provides the users with three functions simple search 31 advanced search and metamodel browsing Moogle is a good idea for us to build a website in searching for models It supports query in different kinds of models However it has some drawbacks The core function model extractor uses the configuration file to get necessary content from them A valid unique namespace declaration is required Thus if the namespace declaration of a model file is damaged it will stop generating no matter how well the other part is preserved Moogle is a search engine for model files that are presented with text What we want to achi
40. er sese 41 Appendix C User Manual for Database Collection Website cccceccessesseeeeteeteeeeeeeeeseeeesseeneens 42 C TI User Control Systems peret eee ted ed Bereich 42 C 2 Upload Page ee Hee ei eet mete se 43 C3 My Settings Page o eoo rima pp ane edt 44 54 Databases Paten e NUR RISO OUR RUSO DU 44 C5 Database Showing Pages pee Sae ten cope ei odi S uere tecti ces 45 Appendix D User Manual for XS MIDD 47 Appendix E User Manual for XMI Query Website 48 E l Upload Page for SQL Piles 43 42 t stet vec ati a IL n a IIS cud 48 E 2 Query in Models Page ise RR UR UR EUREN cues 48 ES Queryiot URL Regel REIR EN 49 Appendix F Statistics of UML Class Diagrams Collection eee 50 ET Table pic EE 50 F2 Table picwhite ice tee EH Rer Ho e RO RH s 50 E3 Tables pricblack saute aU ate eed de 50 EA Table whitecount 2 oer PEDE HX FOR o UP sts 50 E 5 Table blackco nt aeter ree De eta a Ri Ee eti ations 50 Acknowledgements 5 bees ede A ERE auo ed 51 References Sereno Debet eet eebe M Rec ul edi 52 This paper is a gift for my family who have supported me all the way List of Figures Figure 1 An example of a class diagram Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure
41. eve is a project for models that are presented in UML class diagrams As there are already different search engines that deal with text it s more difficult to turn images into text models and do the query Another drawback is the searching method Moogle uses indexes for model descriptors to implement the query process It is a search within the files Thus the efficiency will not be as good as querying in a relational database 4 4 XMI Database Our standard is based on the XMI files exported from starUML There are four categories that we take into account class attribute operation and association Our database is built on that foundation 4 4 1 Database Design Our database contains four tables tb Class tblAttribute tblOperation and tblAssociation corresponding to classes attributes operations and associations Here are the keys in the tables tbIClass Key Type Comment name varchar 200 Name of the class visibility varchar 20 Public private Whether it s an isAbstract varchar 20 abstract class xmi_id varchar 200 The id for this class The date and time this datetime varchar 20 item is generated The URL of the url varchar 200 source image of this class Table 6 The table to store classes tblAttribute Key Type Comment name varchar 200 Name of the attribute visibility varchar 20 Public private typeofAttr varchar 200 The type of this 32
42. example for the filtering process As the result the database loaded will be changed There is a key in the table pic in the database named isUML It is a Boolean value that shows whether the image is a UML class diagram The value of this column will be changed according to the result of the program If an image is similar enough to the example image its corresponding isUML attribute will be true The threshold of similarity is set to 25 It means that after the process of perceptual hash algorithm if there are less than 25 pixels different between the testing image and the example image we can regard them to be similar Here s the class diagram of the program Filier4hnages Access Connection imgPath string ac ccess Connection cn OleDb Connection Database Operation cmd OleDbCommand FilterdImages reader OleDbDataReader Image Operation build Connection Database Operation execute SQLO FilterdImages execute S QLReader Image Operation Figure 7 Class diagram for Filter4ImageCrawler Here s the screen shot of this program iE Filter4ImageCrawler mf x The folder of images C 1000 UNL class Browse Database C black done 1000 pic mdb mem http upload wikimedia org wikipedia commons thumb b bl Visitor UML class diagr a http www ibn com developerworks rational library content Rati onalEdge sep04 be http www ibm com developerworks rational library content RationalEd
43. ferent domains involved in Among them http upload wikimedia org ranks at the top because 30 UML class diagrams come from the domain Following it there are websites like http www ibm com with 29 UML class diagrams downloaded http ars els cdn com with 25 and http docs nullpobug com with 21 12 domains contain 10 to 17 UML class diagrams for each 58 domains contain 4 to 9 UML class diagrams 641 domains have less than 3 UML class diagrams downloaded In the blacklist 417 different domains are involved in http img docstoccdn com with 53 images that are considered not UML class diagrams ranks at the top of the blacklist There are 13 domains that contain over 10 images that are not UML class diagrams 133 domains contain 2 to 9 images that are not UML class diagrams The rest 270 domains have only one image that is not UML class diagram for each 2 6 1 Source Code with UML Another work we want to fulfill is to find out whether there are source codes provided together with the UML class diagrams We have searched on the domains on the whitelist that have provided more than 3 UML class diagrams Among the 29 websites 11 of them have source code provided They can be classified into 4 categories 1 UML education standard http www xml com 6 images provided http www uml org cn 15 images provided and http www c jump com 6 images provided belong to this category The source codes on these websites are provided for t
44. ge sep04 be http www databaseanswers org data_models uml_class_diagram_for_shopping_cart i http blog caplin con wp content uploads UML Class diagram gif http www codeproject com KB 1ibrary inSNMPWr epper class_ lt diagram png http www edrawmax com images software UML Class Di agram_full png http www agiledata org images ool01ClassDi agram gif http edn embarcadero Con arti cl e insges 31053 clessdi agranne3d gif http viki audesn con nedi a hd helpdesk8 png http svn pjsip ElK ap osip jeo jae UD x pjnath docs UML class di agran http www clear rice edu comp20 Wen a AT png http www agilemodeling com images models E jpe http i msdn microsoft con dynimg IC315445 p http www databaseanswers EE class diagram for risk management http blog joycode com wp content uploads images blog TERR con joy 414 o whi http www tutori alspoint com images unl class di agrami http staff aist go Gye a Seed Gal eat eae http www codeguru com ime legacy mise unldamll gi http 518 acing EE if http www dannyfowler com uploads 800px Composite_ WL cl ass_diagran_fr svg_ png http www agilemodeling re E UR ete SSE zif bttp ff enx org content nl 1658 Latest 1istvisi tors p e e Se eege ege Seene Sen afimaeafI ine Figure 8 Interface for Filter4ImageCrawler 20 2 9 2 Validation of Filter4I
45. hat has similar features with the original picture the algorithm takes it as a confident match Then Google will list the confident matches on the website for the user The advantage of this algorithm is that it simulates the process of human being watching an image If the image for query has a unique appearance the result will be very good The result for unique landmarks like the Eiffel Tower is fantastic However when we are using the function to search for UML class diagram the result is not so efficient The reason is that UML class diagram are mainly combined by shapes like rectangles arrows and text There are no unique symbols When we upload a UML class diagram to the Google search by image the features Google will extract are just normal ones Thus the result is not only UML class diagram but also graphs with curves and rectangles that relate to math research stock market and medical research and so on Thus the function we want to use is the traditional way to search images by key words At least for UML class diagrams the accuracy of searching by image does not take advantage of searching 11 by key words In the meantime Google provides details of personal result which collects images related to your most websites that the user has visited or is interested in which is helpful for collecting models the user wants Other search engines also have similar functions of image search but Google provides a better user interfac
46. he URL that directly links to the resource of images It s the end of a process of the crawler s work There comes a problem When we have got the data of the page not all links are what we need We need only two types of URL as discussed above Thus we have to give two regular expressions that correspond to the two types of URL to the crawler for matching in the page data The first is the URL which links to another page that has the result of image search First we should analyze the structure of Google s page of image search In the web browser the list of images is shown in standard version It dynamically shows pictures in other pages as we pull the scroll down However the page data the program can get is of the basic version If we want to go to another page we have to click on the number that has the link Searches related to uml class diagram uml class diagram example uml sequence diagram 123 45 6 7 8 9 10 Nex Google Images Home Help Figure 6 Links of pages of Google image search Thus what we want to get are the links of numbers represented They start with search q and the key words There is the parameter tbm isch which means the search is for images Another important parameter is start followed by a number After the searching process there are thousands of images The number after the start parameter is to show which is the first image that the URL links to For example the URL of page 2 contains st
47. he csv files created by the mageCrawler undefined content will be merged into the databases on the server 2 Although there s an option for the user to choose whether he wants to share his database with other users the content of his database will be included in our core database Thus when the core database is shown a user s privacy may have been violated One way to solve this problem is not to show our core database to the users However we suppose that users of our website are mostly willing to share what he has collected from our program Thus showing the users our core database is a better choice 3 3 Change of ImageCrawler To cooperate with the website the mageCrawler has been changed It can search by URL instead of by key words The program behaves as a normal web crawler 1 It uses the URL provided by the user as a starting node 2 Itgets the page data of the URL and analyzes the page content 3 It uses the breadth first search to collect all the URLs included in the content of the page and saves them into a list The list will only receive URLs that are not already included Otherwise the program may deal with a URL that s visited before and trapped in an infinite cycle 4 It detects all the links to the images within the page content and downloads them The other functions remain the same The program no longer relies on the result of the Google image search Users can focus on one website and collect the images from
48. he purpose of education to explain how UML class diagrams are used 2 Software reference http mdp toolkit sourceforge net 10 images provided and http gdal org 6 images provided belong to this category Both the websites are reference to certain libraries or software toolkit The source code provided are examples to explain how certain functions are used 3 Software design http www felix colibri com 14 images provided http staff aist go jp 4 images provided http www codeproject com 17 images provided http www c sharpcorner com 3 images provided and http tw mapandroute de 6 images provided belong to this category The source codes provided by these websites are the implementation for the software 4 Blog of IT articles http www blogjava net 4 images provided is the only one in this category It s the blog of articles related to the IT industry This category may contain one of the three classes above The source code provided may be the implementation of certain software or example codes attached to some educational content of UML standard 2 7 The Limitation of ImageCrawler The advantage of mageCrawler is based on Google image search so is the weakness The reason is that the pages of the result of an image search are at most 50 The average number of images within one page is 20 Thus the maximum number of images the program can get with one piece of key word is 1000 With the URL of some images are out of
49. hed the blacklist using the function in the Domain Rank Part of ImageCrawler Images that are not UML class diagrams or UML class diagrams but too blur to distinguish or a screen shot that contains only a part of a UML class diagram are put into the blacklist As a result 947 of the 2341 images have been put into the blacklist with 1394 UML class diagrams Compared to other tools that collect images from the Internet mageCrawler has its advantage 1 The tool that uses Google API to download images from Internet has a limitation of 64 images ImageCrawler does not have this limitation so that it s easy for the users to collect a large number of images 2 The plug in of Firefox named Save images has two drawbacks It can only download the images appear in the current tab opened in the Internet browser The tool will just download images as they are represented in the page When it s used to download the images listed as thumbnails in the result page of Google image search it cannot download the images with their original sizes which are really what the users need 3 ImageCrawler 1s free for the users It s developed by NET framework Thus it has a good expandability In future development it can be transformed to an online application instead of Windows application to provide more flexible usability The result of the statistics of domain name has been saved in the database In the whitelist there are 715 dif
50. imeon From XML Schema to Relations A Cost Based Approach to XML Storage Data Engineering 2002 29 Ye Kaizhen The Study of XML Storage Technique in Relational Database Master Thesis SUN YAT SEN UNIVERSITY 2007 30 Daniel Lucredio Renata P de M Fortes Jon Whittle MOOGLE a metamodel based model search engine Softw Syst Model 2012 53
51. ing setName name String void getSocialSecurityNumber String setSocialSecurityNumber socialSecurityNumber String void getDateOfBirthQ Date setDateOfBirthidateOfBirth Date void calcAgelnY earsQ int Figure 1 An example of a class diagram There are several relationships between the classes Here s a table of their definition and how they are represented in the UML class diagrams Name Description Representation Generalization It describes the relationship between An arrow pointing to the particularity and generality parent class Realization It describes the relationship between an An arrow with dash line interface and a class implementing it pointing to the interface Association It s the relationship if one class has certain A line with an arrow pointing members or methods of another class to the class that is associated Aggregation It s the relationship if one class owns A line with an empty diamond another pointing to the owner Composition It s the relationship if one class is a part of A line with a solid diamond another that cannot exist alone pointing to the owner Dependency If a class needs another one to implement A dash line with an arrow they have the relationship of dependency pointing to the class that s relied on Table 2 Relationships of class diagrams UML class diagrams are very important They can show the static structure of a
52. ing executeSQLO URL string execute S QLReader NEXTPAGEGO OGLE string DomainPattern string ImageDownload Database Operation domainName string DomainRankQ domainCount int getName getCouni setName setCouni Figure 5 Class diagram of ImageCrawler 2 5 2 Image Downloading Part 2 5 2 1 Initialization The program takes the key word the user has input and constructed a URL for searching images 14 from Google The URL for downloading images from Google is constructed by two parameters key word and search category image search For example the URL for downloading images that relates to UML is http Awww google com search q uml amp tbm isch At the same time the program takes the number of images that the user has input and saves it into a variable Before the process starts the program requires the user to select a database to save the information of the images to be downloaded There is a function checkDB to check whether the database the user has selected meets the format of the database we have designed In this function the program gets the names of all the tables inside the database and save them into DataTable objects For the keys inside the tables there is a function getCols that also uses the DataTable objects The function will put the names of the columns in a table and compare them with the database we have designed If all the tables and columns exist the database is ready for the
53. ions between the system and its users A use case is the target of a functionality provided by the system Interaction diagrams Communication Shows the interactions between diagram different kinds of components within the system It describes both the static structure and the dynamic behavior of the system Interaction overview Describes an overview of a control diagram flow with nodes that can contain interaction diagrams Sequence diagram Uses messages to show the communication between objects within the system as well as the lifecycles of them Timing diagram Focuses on timing constraints to explore the behavior of objects during a period of time Table 1 14 kinds of UML diagrams 1 2 Overview of UML Class Diagrams A UML class diagram represents the content of classes and the relationships between them A class is represented by a rectangle with three parts lined vertically On top of a class is the name of it It is mandatory In the middle part of the rectangle of a class there are members and attributes On the bottom there are methods within the class Here s an example of a class Person with four members and seven methods The sign in front of attributes or methods means that this attribute or operation is private and the sign means it is public name String socialSecurityNumber String dateOfBirth Date emailAddress String getNamed Str
54. it Compared to the result of Google image search which contains images from different websites the result of searching by URL can be more focused on one topic because the images downloaded are mainly from one website Combining the mageCrawler and the website an alternative way for distributed system 1s finished It s a better way because the website is not only for uploading and showing databases but also a way for people to communicate and has a good extensibility for more functions 27 Chapter 4 Website for XMI Storage and Query 4 1 Overview of XML XML is short for Extensible Markup Language It is a language that uses tags to mark the electronic files to make them structural It can be used to mark the data and define the data types The tags can be defined by the users which provide it with a strong extensibility XML is a subset of SGML Standard Generalized Markup Language and the recommended standard by the W3C World Wide Web Consortium It is very popular and useful in web transmission The grammar of XML is similar with HTML However XML is not an alternative of HTML XML is used to store data while HTML is used to format and print data to the webpage XML consists of tags and the tags must appear in pairs which is different with HTML XML is case sensitive which requires a stricter attitude in using it An XML file is like a tree It starts from a root node and expand to the leaves Here s an example of an XML fi
55. le 1 K9xml version 1 0 encoding IS0 8859 1 2 lt bookstore gt 5 lt book category COOKING gt 6 lt title lang en gt Everyday Italian lt title gt lt author gt Giada De Laurentiis lt author gt lt year gt 2005 lt year gt lt price gt 30 00 lt price gt 10 r book io cO mm H H N l lt book category CHILDREH gt lt title lang en gt Harry Potter lt title gt author J K Rowling lt author gt lt year gt 2005 lt year gt lt price gt 29 99 lt price gt r book HH HH H JO C bh c ip 0 H H lt book category WEB gt lt title lang en gt XQuery Kick Start lt title gt lt author gt James McGovern lt author gt AA lt author gt Per Bothner lt author gt 23 lt author gt Kurt Cagle lt author gt 24 lt author gt James Linn lt author gt 25 lt author gt Vaidyanathan Nagarajan lt author gt 26 lt year gt 2003 lt year gt 27 lt price gt 49 99 lt price gt lt book gt A N E A A lo o A z bookstore Figure 15 An example of XML file 28 In this file the first line is the declaration of this file It defines the version of XML is 1 0 and the encoding is ISO 8859 1 From line 2 to line 30 the entities in the tags are called elements The lt bookstore gt and lt bookstore gt are the two parts of the root element The root element should appear in every XML file
56. le com images google Ze a Lum 2er google Conf inages nav Logo IZ Domain Rank http www ibm com D http www elix colibri com 14 http upload wikimedia org http blog caplin con wp content uplot http www agilemodeling com images moc http rollerjm free fr images ejb uml http www agilemodeling com images moc http www tutorialspoint com images un http www codeproject com KB library http www edrawmax com images software http JL fine agiledata org images oo101C1 iff edn embarcadero com article ima http oldresources visual paradigm con http Gel linkvp ieee aA http bpl blogger com _sDOeSHxTdk SBi http www jetbrains com idea features http www altova com images shots UML_ http diacinstsller de doc en graphic http www jetbrains com img webhelp r http www modeliosoft eter http images visusl p con do http www osZezine EE TS ML a http images vi sual paradi gn com docs http images visual paradi gm com docs http SE GE gn sch nat imacac lacslhact http www ples resources org http wiki eclipse org http mdp toolkit sourceforge net http www xml com http www codeproject com http staff aist go jp http www c sharpcorner com http www uml org cn http www agilemodeling com httpi fars els cdn com http www c7 jump com ht
57. mageCrawler To validate the correctness of recognizing UML class diagrams of the program several experiments have been implemented Before the experiment a database of 1000 UML class diagrams has been established The attribute of is UML of the database is manually filled So the database is a standard version of the list to show whether each image downloaded is actually a UML class diagram In the experiment the program will take an image of UML class diagram as the example image abstract its structure using the perceptual hash algorithm and compare the structure with the structures of the images we have collected Then a Boolean value of isUML is given for every image in the database After the program is finished we compare the database it has changed with the standard list to validate the efficiency of the program As an image is needed as an example for the process of the filter we take 6 images to perform 6 experiments They are divided into two groups colored and black and white In each group three subgroups has been made by the content of the images simple normal and complicated Experiment 1 Black and white simple UML class diagram Example image Figure 9 Example of black and white simple UML class diagram Result 47 06 of the result is the same as in the standard list Experiment 2 Black and white normal UML class diagram Example image 21 implementation class BnB
58. ms Figure 25 Upload page In the bottom part of this page the user can download the mageCrawler program to build his database Then he can upload the csv files generated by the mageCrawler program by clicking on the Browse button As our program will generate five csv file for a database that correspond to the five tables the user should choose which table the csv file to be uploaded contains When all the required information is set just click the Submit button to upload the file The upload function will first check the type of the file uploaded If a non csv file has been uploaded there will be a message to warn the user If uploaded successfully the information of the uploaded file will be shown to the user like the name and size of the file Then the content of the 43 csv files will be merged to his existed database on the server as well as our core database If it s the first time the user uploads his database the five tables will be built automatically on the server C 3 My Settings Page Here s the screen shot of the page for the user settings My Settings View Edit Submitted by admin on Sat 07 28 2012 18 02 v would like to share my database with others Save settings databases list Show chosen database c pic It contains all the information of the images c picblack It contains the url of images in black list c picwhite It contains the url of images in white list V
59. n integrated system cannot meet the requirement The crawlers should work on a distributed system parallel to improve the efficiency However as the task should be finished in a normal PC the efficiency of a normal web crawler is not good Different from a normal web crawler that gets every piece of information from the Internet the crawler we want to implement is just to download images of UML class diagrams from the Internet to establish the database Thus we can use the result of an image searching website as our starting node 2 4 Why Google Google as one of the most widely used search engines in the world can meet our requirements It can get a large number of images focused on a special topic When we go to the image search page of Google we can get the list of images from various websites It s the assembly of images related to the key words we have input The time for Google image search is extremely fast because the information of billions of images has already been saved in the server of Google Thus if our web crawler can use the result of Google image search as a starting node the efficiency is great In October 2009 Google has released the function to search by Images TT When an image is uploaded the algorithm is applied on it to abstract the features from it The features can be textures colors and shapes Then the features are sent to the backend of Google and compared with the images on the server If there is an image t
60. ok gt 11 d lt xs complexType gt 12 lt xsisequence gt 13 lt xs element ref title gt 14 lt xs element maxOccurs unbounded ref author gt 15 lt xs element ref year gt 16 lt xs element ref price gt 17 p xs sequence 18 xs attribute name category use required type xs HCHame gt 19 F xs complexType 20 Fr lt xs element gt 21 E lt xs element name title gt 22 H lt xs complexType mixed true gt 23 xs attribute name lang use required type xs NCName gt 24 E xs complexType 25 Fr lt xs element gt 26 lt xs element name author type xs string gt E lt xs element name year type xs integer 26 lt xs element name price type xs decimal gt 29 xs schema Figure 16 An example of XML Schema The xs that appears in front of every element is the namespace The lt schema gt tag is the root element of an XML Schema file It must appear 29 The lt element gt elements describe the elements that appear in the XML file In this case there are bookstore book title author year and price If the value of an element is a simple type it will be described as an attribute of the element For example the author element contains the type string Thus in line 26 the string type is described as an attribute of the author element If the value of an element is a complex type it will be describe
61. ond Proceedings of the 17th international conference on World Wide Web 2008 17 Niu Xiamu Jiao Yuhua An Overview of Perceptual Hashing ACTA ELECTRONICA SINCA Vol 36 No 7 2008 18 Schaathun H G On Watermarking Fingerprinting for Copyright Protection Innovative Computing Information and Control 2006 19 Monga V Evans B L Perceptual Image Hashing Via Feature Points Performance Evaluation and Tradeoffs Image Processing IEEE 2006 20 Zhao Yuxin A research on hashing algorithm for multi media application PhD thesis Nanjing University of Technology 2009 21 Bai He Tang Dibin Wang Jinlin Research and Implementation of Distributed and Multi topic Web Crawler System Computer Engineering Vol 35 No 19 2009 22 Wu Xiaohui An improvement of filtering duplicated URL of distributed web crawler Journal of Pingdingshan University Vol 24 No 5 2009 23 http drupal nl over drupal Over Drupal 24 http www w3 org TR xml11 The specification of XML 1 1 52 25 http www w3 org TR xmlschema11 1 W3C XML Schema Definition Language XSD 1 1 Part 1 Structures 26 http www omg org spec XMI 2 4 1 Official documents for XMI specification 27 Xiao Jihai A research of the transformation from XML Schema to entity relationship database Master thesis Taiyuan University of Technology 2006 28 Philip Bohannon Juliana Freire Prasan Roy Jerome S
62. or the search engine The basic operation of the web crawler is to fetch information from the website The process of fetching data is the same as we surf online The web crawler mostly deals with URL It gets the content of the files according to the URL and classifies it Thus it s very important for the web crawler to understand what a URL stands for When we want to read a website the web browser as the client end sends a request to the server end gets the source of the page explains it and shows it to us The address of a webpage is called uniform resource locater URL which is a subset of universal resource identifier URI URI is the identifier for every resource on the website html documentation images videos programs and so on URL is the string to make the description of the information online It contains three parts the protocol the IP address of the server where the resource is and the path of the resource at the server end When we put the web crawler into practice we use it to traverse the entire Internet and get the related information which is just like crawling During this procedure it regards the Internet as a huge graph and every page is a node inside with the links in a page as directed edges Thus the crawler will take either depth first search DFS or breadth first search BFS However in DFS the crawler might go too deep or get trapped So BFS are mostly chosen In the process of BFS a node is chosen as
63. ount owner String balance Dollars deposit amount Dollars witha awal amount Dollars Figure 32 Query of URL page First the image will be shown to the user Then under each image four tables will be listed to show the classes attributes operations and associations included in this image 49 Appendix F Statistics of UML Class Diagrams Collection F 1 Table pic It is the table that contains the full information of the images downloaded There are 2341 items in this table with the width height pixel format and other attributes included F 2 Table picwhite It is the table that contains the URL of the images that are considered UML class diagrams 1394 items are included in this table F 3 Table picblack It is the table that contains the URL of the images that are considered not UML class diagrams 947 items are included in this table F 4 Table whitecount It is the table that contains the statistics of domains where the images in the table picwhite come from There are 715 different domains included in this table F 5 Table blackcount It is the table that contains the statistics of domains where the images in the table picblack come from There are 417 different domains included in this table 50 Acknowledgements Hereby I would like to thank my supervisors Dr Michel Chaudron and Mr Bilal Karasneh for their guidance and support of my thesis With
64. out their help it would have been rather hard for me to go through the final step of my master program A special thank to Leiden University and the Netherland where I have found a platform to finish my master program during which I have made an improvement not only in knowledge but also in the way of thinking I also want to express my deepest gratitude for my family It s hard for one to study and live alone in a foreign country with a different culture Their support and encouragement far from China has helped me to go through the toughest period of my thesis 51 References 1 http www uml org Object Management Group UML 2 Frakes W B Pole T P An empirical study of representation methods for reusable software components Software Engineering IEEE Transactions 1994 3 Scott Henninger An evolutionary approach to constructing effective software reuse repositories ACM Transactions on Software Engineering and Methodology TOSEM Volume 6 Issue 2 April 1997 Pages 111 140 4 Mascena J C C P de Lemos Meira S R de Almeida ES Cardoso Garcia V Towards an effective integrated reuse environment 5Th ACM International Conference on Generative Programming and Component Engineering GPCE short paper Portland Oregon USA 2006 5 Vanderlei T A Garcia V C de Almeida E S de Lemos Meira S R Folksonomy in a software component search engine coop erative classi cation through shared me
65. owever it s not good in operating with data On the other hand relational database can efficiently query within the data but it needs a flexible way to publish the data into the Internet Thus to combine XMI with relational database is a good solution 30 4 3 Related Work 4 3 1 XMI Storage Phil Bohannon s paper has provided a way to map XMI files into relational database It comes up with a cost based method for XMI storage in relational database In this method the first step is to generate a physical XML Schema A physical XML Schema is an equivalent way of XML Schema It eliminates the multi valued elements by generating separate new types for each of them Then for the attributes of each type the size minimum and maximum value and the number of distinct values is inserted in the physical XML Schema Then it is ready to map from the Schema to the relations The physical XML Schema is a great method of mapping from XML Schema to relational database However it ignores the constraints of the elements like minoccurs maxoccurs or P l What s more it deals with the XML Schema so that it is used to create the tables instead of build a whole database But it provides us a good idea about how to design a database for an XMI file unique 4 3 2 XMI Query System In Daniel Lucredio Renata P de M Fortes and Jon Whittle s paper a metamodel based searching engine named Moogle is created The system is an effici
66. process If the user has selected a database we suppose that there is already information about many images saved in the database To avoid downloading images that are already in the database a list of URL will be established When the URL of an image is found in the next steps the program will check whether the image has already been in the database If so the downloading part will be ignored 2 5 2 2 URL Analysis When we have constructed the URL we should download the data of the page In Net the WebClient class is used WebClient class provides the method to send and receive data from a resource that is signaled by a URI The DownloadData method is to download the page data from the URI provided So we take the URL we have constructed as the parameter of the DownloadData method and get the source of the page Within the page data there are links to kinds of resources This is like the starting node in the theory of web crawler So how do we get the links from the source code of the page One of the most efficient ways is by regular expression Regular expression is a string which can be used for parsing and manipulating text In complex search and replace operations and validating whether a piece of text is well formed regular expression takes great advantage than other methods For the image crawler we have two kinds of strings to match One is the URL that represents a link to another page It s like the adjacency node The other is t
67. r by the number of images contributed by the domains 39 Function 5 Output to csv files As our online database requires csv files as the input the program can generate csv files from the database The user can click on the Generate csv button to generate five csv files corresponding to the five tables in the database The five csv files will be created in the same folder with the executable program file They will be named in the format of csv for and the name of the access database file and _ with the table name For example if a database is named pictures mdb the five csv files will be 39 ce csv for pictures pic csv csv_for_pictures_picblack csv csv for pictures picwhite csv csv for pictures blackcount csv and csv for pictures whitecount csv Here s a screen shot of the program in running Cat o ImageCrarler Database E black done blackcount done 1000 pic mdb Browse Create Hew BankAccount DEE bie Dolars Save Path EW class diagram Browse nmeenmem se Fe depasit amourt Dolars wirasat amer Zeiss Keyword Number of Images Ready oruslltresPats arortaga raf ceres Gol Desen eines Ce wwitdrewa amcunt br cens vtirmerest than are dn URL Black List http www uml org cn sj jm images figur a http www drkT jp MT drk images 090322 Load Database Select image folder File does not exist http www goog
68. s operation Table 9 The table to store associations 4 4 2 SOL Generator We have developed a program XML2DB to convert from an XML file to SOL commands to insert the content to our database The main feature we have used is the System Xml namespace of Net In this program we load the XML file generated by the OCR process get the attributes of classes attributes operations and associations in this file generate the Insert into command and write them into a SQL file The SOL commands generated will be shown in the textbox Here s a screen shot of this program Path of the XML file E Nexample01 xml Browse tblClass values Classl public false UMLClass 24 201281002235 tblAttribute values atr01 public UMLClass 24 UMLAttribute 25 201281002235 tbl ttribute values atr02 private UMLClass 24 UMLAttribute 26 201281002235 tblOperation values opr l public false UMLClass 24 UMLOperation 2T 201281002235 tblOperation values opr 2 private false UMLClass 24 UML peration 28 201281002235 tblClass values Class2 public false tblAttribute values atr l public tblAttribute values atr02 private UMLClass 29 UMLAttribute 31 201281002235 tblOperation values opr01 public false UMLClass 29 UMIOperation 32 201281002235 UMLClass 29 2012810
69. system They have provided the basic elements for other diagrams They are especially useful for the developers They are mainly used in the purpose of analysis and design During the analysis phase one should not take the technology into account Abstracting the system into different modules is an efficient way The UML class diagram is a way to represent the abstraction When a project goes into the phase of design the UML class diagram used in analysis phase may be changed to suit for the development environment They become the basis of the code for the project Different people have different styles of codes The most efficient way for them to communicate is by the UML class diagrams 1 3 Overview of Our Project Our project aims for building a system that can establish databases of images of UML class diagrams from the Internet transfer the images to XMI models and make queries within the models It includes the image collecting part transforming part and querying part Image is different from text It s very difficult to abstract information from images Current researches of model search engine focus on dealing with the models stored in text files Our project supports the usability of collection conversion storage and query for models from images As it s expensive for developers to find software assets which results to lack of knowledge about reusable assets this is one of main reasons why developers are not willing to reuse sof
70. t it The URL filename width height and pixel format of the images will be saved into the database 2 5 3 Domain Rank Part As not all images downloaded are UML class diagrams there should be the function to distinguish them The program can show the list of all URLs in the database and show the corresponding image of selected URL The user can put them into the blacklist or delete them from blacklist manually The whitelist contains the URLs of images that are UML class diagrams 16 Based on the blacklist and the whitelist the program can abstract the domains where the images come from Then the program can give the statistical data of the domains and make a rank of them Our database is built as an access file As the database will be used in the website and csv file is more popular in the web applications there is a way to generate csv files from our database The user can click on the Generate csv button and five csv files corresponding to the five tables within the database will be created 2 6 Validation With the mageCrawler we have developed a database with 2341 images has been established The key words we have used are uml class diagram uml class diagram example and uml class model The result of images differs with different key words The whole process has cost an hour The time cost depends not only on the algorithm but also on the Internet condition We have manually establis
71. tadata Brazilian Symposiumon Software Engineering Tool Session Florianopolis Brazil 2006 6 Brian Pinkerton Webcrawler Finding what people want PhD thesis University of Washington 2000 3 Xu Yuanchao Liu Jianghua Liu Lizhen Guan Yong Design and Implementation of Spider on Web based Full text Search Engine Control and Automation Publication Vol 23 2007 8 https developers google com image search v l devguide Image Search Developer s Guide of Google 9 http support google com images bin answer py hl en amp canswer 1 325808 Features of Google s search by image 10 http support google com websearch bin answer py hl en amp answer 2409866 Details of personal results by Google 11 http googleimagedownloader com Google Image Downloader the tool to download images with Google API 12 Zhou Lizhu Lin Ling Survey on the research of focused crawling technique Computer Applications Vol 25 No 9 2005 13 Marc Ehrig Alexander Maedche Ontology focused crawling of Web documents Proceedings of the 2003 ACM symposium on Applied computing 2003 14 Zhou Demao Li Zhoujun Survey of High performance Web Crawler Computer Science Vol 36 No 8 2009 15 Liu Jinhong Lu Yuliang Survey on topic focused Web Crawler Application Research of Computers Vol 24 No 10 2007 16 Hsin Tsang Lee Derek Leonard Xiaoming Wang Dmitri Loguinov IRLbot scaling to 6 billion pages and bey
72. tp www uml diagrams org http www clear rice edu http kennii files wordpress com S W tnc ie manandvonta da il H Figure 20 The interface of ImageCrawler in process 40 Appendix B User Manual for Filter4ImageCrawler Here s the screenshot of the FilterdImageCrawler El Filter4ImageCrovler DIE The folder of images Browse Database Browse pen Example Image Start Figure 21 The initial interface of Filter4ImageCrawler Step 1 Choose the folder of the images Click on the Browse button next to the textbox on the top to set the path of the folder of images Step 2 Choose the database Click on the Browse button next to the textbox of Database to set the path of the database The program will check whether it meets the format of the database ImageCrawler has created If not there will be a message to warn the user to select again The full path of the chosen database file will be shown in the Database textbox Step 3 Choose the example image The example image is the image that is taken as an example in the process of perceptual hash algorithm All the images will be compared to the example for similarity Click on the Open Example Image button to choose an image as the example The image will be shown in the picture box below the button Step 4 Start the process Click on the Start button and the process will be started Step 5 View the result When
73. tware 2 3 4 5 For this finding reusable assets is a fundamental aspect of an effective reuse program 30 so we try to find models and support reusability of these models via converting them into suitable format that is easy to be modified In this paper the image collecting part and the XMI querying part 1s included For the image collecting part we have developed a program mageCrawler that behaves as a web crawler to collect images of UML class diagrams from the Internet Then a website is established for building a database of images we have collected For the XMI querying part we implement functions by mapping XMI models into relational databases and make queries within them Chapter 2 ImageCrawler 2 1 Overview of Web Crawler We all know that it s convenient for us to Google something Google as the biggest search engine in the world has information of billions of websites and keeps them up to date How can that be done Search engine is the technology which started in 1995 As more and more information has been put online it becomes more and more important Search engine collects information from the Internet classifies them and provides the search function of the information for the users It has become one of the most important Internet services The web crawler is a program that can fetch information from the Internet automatically It plays an important part in search engine because it can download data from the Internet f

Universiteit Leiden Computer Science

Contents

Download Pdf Manuals

Related Search

Related Contents