Home

NWB manual - Network Workbench

1. The resulting network is unconnected i e it has many subnetworks many of which have only one node These single unconnected nodes also called isolates can be removed using Preprocessing gt Delete Isolates Deleting isolates is a memory intensive procedure If you experience problems at this step you may wish to consult the tutorial entitled Increasing Memory for NWB The FourNetSciResearchers dataset has exactly 65 isolates Removing those leaves 12 networks shown in Figure 2 right using the same color and size coding as in Figure 1 left Using GUESS Display gt Information Window reveals detailed information for any node or edge Here the node with the highest GCC value was selected Alternatively nodes could have been color and or size coded by their degree using c g reest ut degree gray Black Note that the out degree corresponds to the LCC within the given network while the 154 5 reflects the number of references helping to visually identify review papers The complete paper paper citation network can be split into its subnetworks using Analysis gt Unweighted 4 Directed gt Weak Component Clustering with the default values The largest component has 163 nodes the second largest 45 the third 24 and the fourth and fifth have 12 and 11 nodes respectively The largest component also called giant component is shown in Figure 3 The top 20 papers by times cited in ISI have been l
2. Correlation E desired Pearson s Ron any of above 12 1 Data Extraction Data are either compiled by hand or downloaded in bulk from major databases For details see section 1 1 2 2 Units of Analysis Major units of science studied in scientometrics are authors papers and journals as well as other institutional geospatial and topical units Note that a laboratory andor center can be interdisciplinary A departmentschoo is typically discipline specific Authors have an address with information on affiliation and geo location Most author consumed produced records have a publication date a publication type e g journal paper book patents grant et topics e keywords or classifications assigned by authors and or publishers Because authors and records are associated the geo location s and affliation s of an author can be linked to the authors papers Similarly the publication date publication type and topic s can be associated with a paper s author s 1 2 3 Selection of Measures Statistics such as the number of papers grants co authorships citation over time per author bursts of activity number of citations patents collaborators funding ete or changes of topics and geo locations for authors and their institutions over time can be computed Derived networks are examined to count the number of papers or co authors per author the number of citations per paper or journal etc but also to determine the stren
3. Layout gt Bin Pack to compact and center the network layout Using the mouse pointer hover over a node or edge to see its properties in the Information window Right click on a node to center on color remove Modify Field a node see Figure 8 Interact with the visualization as follows Pan simply grab the background by holding down the left mouse button and move it using the mouse Zoom Using the scroll wheel on the mouse OR press the and buttons in the upper left hand corner OR right click and move the mouse left or right Center graph by selecting View gt Center mCIICNOc iam mu ue Figure 9 GUESS Graph Modifier Use the Graph Modifier to change node attributes e g gt all nodes in the Object drop down menu and click Show Label button gt Select nodes based on gt in the Object drop down menu then select wealth from the Property drop own menu 5 from the Operator drop down menu and finally type 50 into the Value box Then select a color size shape code see Figure 9 5 3 2 Interpreter Use Jython a combination of Java and Python to write code that can be interpreted Here we list some GUESS commands which can be used to modify the layout Coloring ALL Nodes uniformly a c circle filing o x circle ring a circle label Size Code jet make labels of most productive authors visible Figure 15 shows
4. ku Figure 4 Output Data Type window 2 Data Conversion Service More detailed explanation will be provided shortly CiShell amp OSGI Figure 1 Converter graph 3 Compute Basic Network Statistics It is often advantageous to know for a network Whether it is directed or undirected Number of nodes Number of isolated nodes A list of node attributes Number of edges Whether the network has self loops if so lists all self loops Whether the network has parallel edges if so lists all parallel edges A list of edge attributes Average degree Whether the graph is weakly connected Number of weakly connected components Number of nodes in the largest connected component Strong connectedness for directed networks Graph density In the NWB tool use Analysis gt Nenwork Analysis Toolkit NAT to get get basic properties e g for the network of Florentine families available in yourmebdirectory sampledata senworks florentine nwb The result for this dataset is Node attributes present label wea priorates No parallel edo Density disregarding weights 0 225 4 Tree Visualizations Many network datasets come in tree format Examples include family trees organizational charts classification hierarchies and directory structures Mathematically a tree graph is a set of straight line segments edges connected at their ends containing no closed loops cycles A tree g
5. Korea to be identified as the same word As the Garfield ISI data is very different in character from the rest it is left out of the burst analysis done here One particular difference is the absence of ISI keywords from most of the works in the Garfield dataset In the NWB tool use File gt Load and Clean ISI File to load ThreeNetSciResearchers ii which is a file that contains all of Wasserman s Vespignani s and Barabdsi s ISI records and is provided as a sample dataset in yournwbdirectory sampledata scientometrics isi ThreeNetSciResearchers isi The result is two new tables in the Data Manager The first is a table with all ISI records The second is a derived indented table with unique ISI records named 262 Unique ISI Records In the latter file ISI records with unique ID numbers UT field are merged and only the record with the higher citation count CT value is kept Select the 262 Unique ISI Records table and run Analysis gt Textual gt Burst Detection using the parameters Note Throughout the tutorial we will use to indicate comments or further instructions within parameter or code blocks These do not need to be entered into the tool A third table derived from 262 Unique ISI Records labeled Burst detection analysis will appear in the Data Manager On a PC running Windows right click on this table and select view to see the data in Excel On a Mac or a Linux system right click and sa
6. Tor lpi vualon PE aci vs dol by aeta uzl Dto Mesh mami Talley pur pompis dia tao Se is Man Sato aaa Ben ele ak Tara rona Media lesen arda Borne uses the Cheng sarta Sel hpe cishel ovoj developed ot the Semara mene fer evt Sane Cano nep ens ss dome t no rra an Figure 2 Network Workbench tool interface components All operations such as loading viewing or saving datasets running various algorithms and algorithm parameters ete are logged sequentially in the Console window as well as in the log files The Console window also displays the acknowledgement information about the original authors of the algorithm the developers the integrators a reference paper and the URL to the reference if available together with an URL to the algorithm description in the NWB community wiki The Data Manager window displays all currently loaded and available datasets The type of a loaded file is indicated by its icon Table or csv file Matrix data Pajek mat Plot plain text file that can be plotted using gnuplot E Text other text file G GUESS GUESS Visualizations ERE Tree Tree Data Structure TVs Network Network data could be in memory Graph Network object or Network files saves as Graph ML XGMML NWB Pajek net or Edge list format Derived datasets are indented under their parent datasets That is the children datasets are the results of applying certain algorithms to the
7. amp Bauin 1983 Callon Law amp Rip 1986 Salton s term frequency inverse document frequency TFIDF is a statistical measure used to evaluate how important a word is to e g paper in a corpus The importance increases proportionally to the number of times a word appears in the paper but is offset by the frequency of the word in the corpus Salton amp Yang 1973 Dimensionality reduction techniques such as self organizing maps SOM or the topics model see Table 1 are commonly used to project high dimensional information spaces e g the matrix of all unique papers times their unique terms into a low typically 2 dimensional space See Section 6 2 Co Oceurrence Linkages for examples on how to use the NWB Tool for word co occurrence analysis Deerwester Dumais Furnas Landauer amp Harshman 1990 Griffiths amp Steyvers 2002 Kohonen 1995 Kruskal 1964 Landauer amp Dumais 1997 Landauer Foltz amp Laham 1998 6 Network Analysis The study of networks aims to increase our understanding of natural and manmade networks It builds on social network analysis Carrington Scott amp Wasserman 2005 Scott 2000 Wasserman amp Faust 1994 physics Barabasi 2002 information science B rner Sanyal amp Vespignani 2007 bibliometrics Borgman amp Fumer 2002 Nicolaisen 2007 scientometries webometries Narin amp Moll 1977 White amp McCain 1998 informetrics Wilson 2001 webometrics Thelw
8. covering 1988 2007 Top 10 of most cited references are selected 415 nodes and 7147 links are laid out in the network An alternative larger figure of a document co citation network derived via NWB Tool is given in section 63 1 Document Co Citation Network DCA Figure 7 RABASI AL 1999 SCIENCE Figure 2 Document co citation network of FourNetSciResearchers with NWB Tool left and using CiteSpace II right 9 322 Author Co Occurrence Co Author Network Figure 3 shows the CiteSpace rendering of the co authorship network for the FourNetSciRescarchers dataset using five 4 year time slices covering 1988 2007 In this network the Igp100 of most occurred authors from each slice are selected There are 249 nodes and 907 links in it Compare to NWB rendering in section 6 2 1 Author Co Occurrence Co Author Network Seca Figure 3 Co authorship network of FourNetSciRescarehers in CiteSpace Il 9 3 3 Word Co Occurrence Network Figure 4 shows the CiteSpace rendering of the keywords descriptors and identifiers co occurrence network for the FourNetSciResearchers dataset using ten 2 year time slices covering 1988 2007 The top 50 of the most often occurring keywords from each slice are selected in the network which has 247 nodes and 830 links Compare to NWB rendering in section 6 2 2 Word Co Occurrence Network Figure 4 Keyword co occurrence network of FourNetSciResearchers in CiteSpace IL 9 3
9. window then run Visualization gt Tree Map prefuse beta A window similar to Figure 2 will appear display the Tree Map visualization Use the search box in the bottom right comer to enter a search terms and matching will highlight Ter paha Pa Figure 2 Tree Map visualization of complete nwb directory hierarchy 43 Balloon Graph Visualization A balloon graph places the focus node in the middle of the canvas and all its children in a circle around it Children of children are again places in a circle around their parents etc In the NWB tool select a tree dataset e g generated using the Directory Hierarchy Reader and run Visualization gt Balloon Graph prefuse alpha A window similar to Figure 3 will appear displaying the Balloon Graph visualization Double click on a node to focus on it and observe the change of the layout Like in all other Prefuse layouts hold down left mouse button to pan and right button to pan Figure 3 Balloon Graph Visualization of complete nwb directory hierarchy 44 Radial Tree Visualization Radial trees layout uses a focus context fisheye technique for visualizing and manipulating very large trees The focused node is placed in the center of the display and all other nodes are rendered on appropriate circular levels around that selected node The further away a node is from the center the smaller it is rendered This way potentially very large rooted or unrooted t
10. Annotation and Fruchterman Rheingold with Annotation and Small World layouts e GUESS tool supporting more customized visualizations and diverse output formats DrL for visualizing very large networks up to 1 million nodes LaNet 5 1 JUNG based Circular Kamada Kawai Fruchterman Rheingold and Spring Layouts Visualizations of the yournwbdirectory sampledata networks florentine nwb network using Circular Kamada Kawai Fruchterman Rheingold and Spring Layouts are given in Figure 5 Figure 5 Partial Circular Kamada Kawai Fruchterman Rheingold and Spring JUNG layouts 5 2 Prefuse based Specified Radial Tree Graph Force Directed and Fruchterman Rheingold and Small World Layouts Specified layout requires pre computed x y values for each node Node positions can be computed e g using Visualization DrL Visualizations of the vourmvbdirectory sampledatunenvorks florentinemwb network using Radial Teee Graph Force Directed with Annotation and Fruchterman Rheingold with Annotation are given in Figure 6 Note that algorithms that do not read nwb format e g Ballon Graph are grayed out and not selectable Figure 6 Radial Tree Graph and Fruchterman Rheingold with Annotation Prefuse layouts The Fruchterman Rheingold and the Force Directed with Annotation layout were run using the parameter values shown in Figure 7 left The menu to the right of the Force Directed layout lets one increase the DefauliSp
11. Java SE 5 version 1 5 0 or later to be pre installed on your local machine You ean check the version of your JAVA installation by running the command line 1f not already installed on your computer download and instal Java SE 5 or 6 from hup wswurjava com en download index js To download NWB Tool go to http nwb sls indiana edu download html and select your operating system from the pull down menu see Figure 1 Te hoe rem Mies e e QU e x ou Demean Tem per EON een Br save me dona se jar Figure 1 Select operating system and download NWB Tool Save the jar file in a new empty yournwbdirectory directory double click the jar file or run the command line taller 0 7 0 ar java jar nub ami After the successful installation two NWB icons NURSE vill appear on the Desktop Torun NWB tool double click the Network Workbench icon To uninstall NWB tool i e to delete all files in yournwbdirectory double click the Uninstall NWB icon 3 User Interface The general NWB Tool user interface is shown in Figure 2 see also Herr I et al 2007 P Network Workbench Toot Fle Parish MORENO Mayas Muatan Suetoneuis He Esas f concn dos sopra IIT kemon kemba weld Pana mama Sm TUT ngen nga pret NANA acknowledgements as r kamenempan yon Beteng apa kind br merana een D y Wh md Esc aa im n fetum DE paperoj
12. P299 0 0 NWB is able to distinguish these two records which have unique ISI IDs but are both book reviews by the same reviewer on the same page in the same journal issue HistCite identifies 901 edges between the 360 papers NWB Tool originally identified 335 nodes and 9595 edges as not only linkages between papers in the set but also linkages to references are extracted The latter nodes can be excluded by removing nodes with a globalCitationCount value of 1 see section Scientometrics section 6 1 1 The resulting network has 341 nodes and 738 edges or 276 nodes and 738 edges after deleting isolates This network can be visualized in HistCite using Tools gt Graph Maker The Graph Maker inputs the nodes of the network which are then laid out chronologically from the top of the screen to the bottom The size of the nodes relates to the value of either the Local Citation Score LCS or the Global Citation Score GCS depending on the type of graph selected The script examples from Scientometrics section 6 1 1 give suggestions on how to resize the nodes within GUESS to accomplish something similar with NWB The nodes included can be limited within the Graph Maker according to either their ranking in the sequence of LCS or GCS or their LCS GCS values In NWB this corresponds to Extract Top Nodes or Extract Nodes Above or Below Value To see all nodes set the limit to a number above the number of nodes and click on the Mak
13. links in alternative paths length of a path up to which the triangle inequality must be maintained A network of N nodes can have a maximum path length of q N 1 With q N I the triangle inequality is maintained throughout the entire network For details on the method and its applications see Schvaneveldt 1990 In the NWB Tool the algorithm can be found under Preprocessing gt Pathfinder Network Sealing Shown below is a visualization of the FourNetSciResearchers dataset 7 2 2 Main Path Analysis is an alternative method from Hummon and Doreian that only works on directed acyclic graphs such as ideal citation networks real world citation networks can have loops frequently due to citations into the future Hummon amp Doreian 1989 There are three variants called node pair projection count NPPC search path link count SPLC and search path node pair SPNP all of which produce similar results The SPLC method is frequently used due to lower computational requirements and main path analysis using that method will be available in the NWB Tool shortly All three methods work by counting how many times links are traversed based on various models of information flow through the network The SPLC method counts how many times every edge is traversed over all possible search paths through the network from every origin node A main path is found by traversing from a source a node with no in links along each edge with the highes
14. parent dataset The scheduler lets users keep track of the progress of running algorithms If an algorithm has implemented a ProgressTrackable interface users can pause resume or cancel that algorithm through the Scheduler while it is still running 4 Workflow Design Many if not all network studies require the application of more than one algorithm A sequence of algorithm calls is also called a workflow A common workflow comprises data loading sampling modeling then preprocessing and analysis and finally visualization Figures 3 and 4 shows the menu items available in the v 1 0 beta release There are domain specific plug in sets e g for scientometrics see Domain Specific Scientometrics Figure 1 Algorithms that are not applicable to a currently loaded and selected dataset are grayed out Crete E pes E pu ws Fess bse er s Moos aber sect Net a Magee ange Fe I Figure 3 File Preprocessing Modeling and Visualization menus The Analysis menu has four submenus as shown in Figure 4 Weighted network analysis algoritluns will be added shortly leading to two more submenus gt Unweighied Directed l iy Toll NAT Nodelndegro Unweighte amp Undrected nveichted amp Directed Eee indegee Distibution Search Wats Sragets Chustering Coefficient Quidegree Distribution Tecusl ovasestosst cisterna Cosfreent Overte ichs Neighbor PEN
15. the network after coloring and resizing the nodes based on their betweenness using Read hrips nub slis indiana edu community n VisualizeData GUESS on more information on how to use the interpreter Figure 10 GUESS Interpreter 544 DrL Large Networks Layout Sce hitps nwb slis indiana edu community n VisualizeData Del 55 LaNet Sce hitps nwh slis indiana edu community n VisualizeData K CoreDecomposition 6 Saving Visualizations for Publication Use File gt Export Image to expert the current view or the complete network in diverse file formats such as jpg png raw pdf gif ete Domain Specific Information Science 1 Read and Display a File Hierarchy Itis interesting to see the structure of file directory hierarchies that can be thought of as a tree with a specific folder at its root and its subfolders and the files within them as its branches In the NWB tool use File gt Read Directory Hierarchy to parse a file directory hierarchy recursively using parameters Change the default input by checking Recurse the entire tree and uncheck Read directories only skips files shown in Figure 1 After clicking the OK button a Directory Tree Prefuse Beta Graph shows up in the Data Manager window Select this dataset before selecting any other analysis or visualization see General Tutorial Section 4 for different tee visualizations Reet directory ClUsers User Desktopinvo pina
16. the paper citation network see section Paper Paper Citation Linkages and run Scientometries gt Extract Document Co Citation Network The co citation network will become available in the Data Manager It has 5 335 nodes 213 of which are isolates and 193 039 edges Isolates can be removed running Preprocessing gt Delete Isolates The resulting network has 5122 nodes and 193 039 edges and is too dense for display in GUESS Edges with low weights can be eliminated by running Preprocessing gt Extract Edges Above or Below Value with parameter values Here only edges with a local co citation count of five or higher are kept The giant component in the resulting network has 265 nodes and 1 607 edges All other components have only one or two nodes The giant component can be visualized in GUESS see Figure 7 right see the above explanation and use the same size and color coding and labeling as the bibliographic coupling network Simply run GUESS File gt Run Script and select yournwbdirectory sampledata isi reference co occurence nw py E lt we Figure 7 Undirected weighted bibliographic coupling network lefi and undirected weighted co citation network right of FourNetSciResearehers dataset with isolate nodes removed 6 3 2 Author Co Citation Network ACA Authors of works that are repeatedly juxtaposed in references cited lists are assumed to be related Clusters in ACA networks often re
17. then backbone identification and community detection methods discussed below should be applied to identify major structures 7 2 Backbone Identification 7 2 1 Pathfinder Network Scaling P Net is a structural modeling technique originally developed for the analysis of proximity data in psychology by Schvaneveldt Durso amp Dearholt Schvaneveldt Durso amp Dearbolt 1985 According to Chen Chen 1999 Pathfinder Network Scaling provides a fuller representation of the salient semantic structures than minimal spanning trees but also a more accurate representation of local structures than multidimensional sealing techniques The algorithm takes a proximity matrix as input and defines a network representation of the items while preserving only the most important links It relies on the so called triangle inequality to eliminate redundant or counter intuitive links Given two links or paths in a network that connect two nodes the link path is preserved that has a greater weight defined via the Minkowski metric It is assumed that the link path with the greater weight better captures the interrelationship between the two nodes and that the alternative link path with less weight is redundant or even counter intuitive and should be pruned from the network Two parameters r and q influence the topology of a pathfinder network The r parameter influences the weight of a path based on the Minkowski metric The q parameter defines the number of
18. which visualized by NWB Figure 12 The network of bibliographie coupling among authors with Leydesdorfl s software A Figure 13 The network of bibliographie coupling among joumals with LeydesdotTs software 954 Processing of Scopus data LoydosdorfTs software cannot process the data from Scopus directly but it can convert the data format to ISI format to analyze or use Scopus Exe for the organization of Scopus output into files for relational database management MS Access dBase However the typical characteristic of NWB is that all kinds of data format can be loaded into it so that the trouble of converting data formats can be avoided Figure 14 lays out co authorship network with the data format from Scopus directly While Figure 15 shows the same network with ISI data format by using Scop2ISL exe The network structures produced by Leydesdorfl s software and NWB are similar and the main nodes appear in both networks similarly The differences can be seen in Table 5 Table 5 Comparison of Co authorship network structure between two data formats Attributes Scopus I Direction in the Graph Directed Undc Nodes 405 395 Edges 1581 1575 Weakly connected component 44 36 Nodes in the largest connected 121 E component pT brepem E ese En M km ener bron TIR Ae 2 a ee gama an T Wes munt m amp imu
19. 008 Marshakova LV 1973 Co Citation in Scientific Literature A New Measure of the Relationship Between Publications Scientific and Technical Information Serial of VINITI 6 3 8 Meho Lokman I Kiduk Yang 2007 Impact of Data sources on Citation Counts and Ranki Web of Science versus Scopus and Google Scholar Journal of the American Society Science and Technology S813 2105 2125 http www interscience wiley com cgi bin ulltex 116311060 PDFSTARTT accessed on 9 23 08 Monge P R N Contractor 2003 Theories of Communication Networks New York Oxford University Press Narin F LK Moll 1977 Bibliometrics Annual Review of Information Science and Technology 12 35 38 Nicolaisen Jeppe 2007 Citation Analysis In Blaise Cronin Ed Annual Review of Information Science and Technology Vol 41 pp 609 641 Medford NJ Information Today Inc accessed on Nisonger T E 2004 Citation Autobiography An Investigation of ISI Datbase Coverage in Determining Author Citedness College amp Research Libraries 65 2 152 163 O Madadhain Joshua Danyel Fisher Tom Nelson 2008 Jung Java Universal Network Graph Framework University of Califomia Irvine hip juna sourceforge neu accessed OSGi Alliance 2008 OSGi Alliance hip vunw osgi org Main HomePage accessed on 7 15 08 Palla Gergely Imre Der nyi Il s Farkas Tamas Viesek 2005 Uncover the overlapping community structure of complex networks in
20. 07 The superset of the four data files is called FourNetSciRosearchers in the remainder of this tutorial Table 2 Names ages number of citations for highest cited paper h index and number of papers and citations over ime as rendered in the Web of Science for four major network science researchers Name Age Total H Total 4 Papers and Citations per Year for the last 20 Years EL Index Papers Cites Eugene so 185 Mi on Published tems in Exch Year Garfield Stanley Alessandro Vespignani Albert Lisa Barabisi Repeated query on Sepi 215 2008 2 40 a m asi ans 16920 EI a7 52 35 101 126 159 Published tems in Each Year URIBBIIBIUE Polished ama i Each Ver nan cations In Each Year Sir The table reveals that a high age i e more time for publishing typically results in more papers The enormous differences in citation dynamics between physics and social sciences such as scientometrics or sociology are vis Vespignani and Barabisi are both physicists and their papers very rapidly acquire citation counts Note that neither books nor Conference papers are captured in this dataset 2 1 2 Scopus Data The NWB Tool reads publication data from Scopus documentation will be provided shortly 2 1 3 Google Scholar Google Scholar data can be acquired using Publish or Perish Harzing 2008 that can be freely downloaded from htip www harzing com pop ht
21. 2 Bursts for each word are shown as horizontal red bars across the time dimension Bursts with strength above 10 are colored a darker red All bursts are shown for authors and keywords but only words in the fifteen most powerful bursts are shown for cited references and terms in the abstract For each type of word words are sorted by the start of their first burst and the end of their last burst In the burst for abstract terms stop words are removed This chart was made in MS Excel by following the steps enumerated to create Figure 1 with some minor modifications for display Salasktavoz 1 3 84997 Tecobuci D a 60910818 Vizek T 3 550125 p asantana Okta 24 5121735205 181 Keywords Holland PUn 3991 J AM STAT ASZOC V76 P32 sennizoii Wien TA 2984 PHYS REV LETT V47 P1400 5 20031061 Vsok T 1992 FRACTAL GROWTH PRENO saen Barabasi AL 1995 FRACTAL CONCEPTS SUR pmo Bak P 1907 PHYS REV LETT VOS P301 p Faloutsos M 1999 COMP COMM R V29 P251 sangane Barabasi AL 1999 SCIENCE V256 P503 43 29081504 Amaral LAN 2000 P NATL ACAD SCI USA Va P11149 assente Abart R 2032 REV MOD PAIS VIA 247 155355555 DDorepovtsey SN 2009 EVOLUTION NETWORKS B 111341164 Figure 2 Visualization of bursts for author names ISI keywords and cited references Itis interesting to note in the cited reference bursts that many of the strongest bursts are the most recent This confirms that many of the foundational techniques of network science hav
22. 2008 the NWB tool supports loading the following input file formats GraphML xml or graphml XGMML xml Pajek NET net Pajek Matrix mat NWB nwb TreeML xm Edge list edge CSV test ISL isi and the following network fle output formats GraphML xml or graph Pajek MAT mat Pajek NET 9 het NWB nwb XGMML xml These formats are documented at htips nwb sli indiana edu community nla Formas HomePage Code Library The NWB Tool is an empty shell filled with plug ins Some plug ins run the core architecture see OSGi and CISbell plug ins others convert loaded data into in memory objects or into formats the different algorithms can read behind the scenes Most interesting for users are the algorithm plug ins that can be divided into algorithms for preprocessing analysis modeling and visualization purposes Last but not least there are other supporting libraries and entire tools that are plugged and played 1 OSGI Plugins org eclipse org eciipse brg eclipse mej org eclipse brg eclipse org eclipse brg eclipse equinox E ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES E ES eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse eclipse ecli
23. 3 Gi hetpe nwb slis indiana edu conmunity n AnalyzeData NetworkAnalyaieToolkit This graph claims Lo be directed Weakly Connected components 0 isolates raph is not strongly connected i Density disregarding weights 0 06803 densities weighted against standard max 1f you should find isolates or self loops in your data they could pose problems for several algorithms that are currently implemented NWB offers options for dealing with these issues Preprocessing Remove Self Loops and Preprocessing Delete Isolates These options will create a new network file in the Data Manager window You can then select this network and save it with a new name 3 Network Analysis NWB implements a few basic algorithms that you can use with an unweighted and directed network like PSYCHCONSULT Make sure your network is highlighted in the Data Manager window e In degree and out degree centrality can be calculated with Analysis Unweighted amp Directed gt Node Indegree Note that this will actually create two files a text fle with a sequence of the in degree centrality of the nodes and a new network file that has appended the in degree as an integer attribute Choose this network file apply the Node Ouidegree centrality algorithm from Analysis gt Unweighted amp Directed Node Ouldegree and you can create a new network with both measures as attributes e Reciprocity can be c
24. 4 Burst Detection Burst detection using NWB Tool was discussed in section 3 2 Burst Detection Td niv Notre Dame AEPA Figure 5 Network of co author s institutions for FourNetSciResearchers with burst phrases in CiteSpace II 9 35 Comparison Supported Data Formats NWB Tool supports phML unl or graphmil XGMML xml Pajek NET net Pajek Matrix mat NWB nwb TreeML xml Edge list edge CSV esv ISI Mis CiteSpace II can load ISI export format It also offers converters from SDSS NSF Scopus and Derwent to WOS SD and M dlii Sce also Table 1 Table 2 Network Workbench Tool vs CiteSpace I Visualizations Function NWE Tool CiteSpacell Data Extraction OOOO ISI Y D Scopus Y Y Toogle Scholar 7 Medline y NSF y 7 Citeseer Y y Google Scholar Y Paper Paper Citation Network Y Author Paper Consumed Produced 7 Network Document Co Citation Network v y Author Co citation Network Y Journal Co citation Network N Co authorship Network y M Network of Co author s institutions Y Network of Co author s countries Y Word Co occurence Network y Ni Subject Categories Co occurrence Network y Cited Reference Co Occurrence v Bibliographi oupling Network Cluster View Time Zone View N Time slicing Y Pathfinder Network Y y Detect Duplicate Nodes N Merge Duplicate Nodes N E Betweenness Centrality y Y Extract K co
25. 8 DI component 9 5 2 Word Co occurrence Network Leydesdorff s software can perform the word co occurrence analysis of ttle and full text while NWB can analyze title abstract keywords and so on Because the number of words that Leydesdorff s software can analyze is limited we deleted the stop words such as a an and with 2001 by and so on with stopword exe which Dr Leydesdort provides The word co occurrence network produced by NWB has 908 nodes and 10694 edges So we extract top 1000 edges by weight However NWB doesn t have the function of deleting stop words coming soon As a result the main nodes in Figure 23 are the stop words which influent the result of the analysis Table 4 shows the differences of two networks of word co occurrence Table 4 Comparison of Word Co occurrence Network structure between the two software packages Attributes Teydesdoris NWE software Directed Undc 755 308 SRD 1000 Weakly connected component 1 3 Nodes in the largest connected 295 D component Stop words deleting Yes Te j Figure 11 Word co occurrence network with NWB 9 5 3 Bibliographic Coupling Network NWB can extract the network of bibliographic coupling among references However Loet s software can do the bibliographic coupling analysis among authors and journals Figures 12 and 13 show the networks of bibliographic coupling among authors and journals
26. Cytoscape is also adopting an architecture based on OSGi though it will still have a specified internal data model and will not use CIShell in the core Moving to OSGi will make it possible for the tools to share many algorithms including adding Cytoscape s visualization capabilities to Network Workbench Many of these tools are also libraries Unfortunately it is often difficult to use multiple libraries or sometimes any outside library even in tools that allow the integration of outside code Network Workbench however was built to integrate code from multiple libraries including multiple versions of the same library For instance two different versions of Prefuse are currently in use and many algorithms use JUNG the Java Universal Network Graph Framework We feel that the ability to adopt new and cutting edge libraries from diverse sources will help create a vibrant ecology of algorithms Although it is hard to discern trends for tools which come from such diverse backgrounds it is clear that over time the visualization capabilities of scientometrics tools have become more and more sophisticated Scientometrics tools have also in many cases become more user friendly reducing the difficulty of common scientometries tasks as well as allowing scientometrics functionality to be exposed to non experts Network Workbench embodies both of these trends providing an environment for algorithms from a variety of sources to seamlessly interact in
27. E Single Node In Out Degree Correlations Seach gt Average Shore Path age onk hortest Path Distribution TEE bed Node Seveenness Cent h Are Racieocty kfandom Wah Connected Components adjacency Tansy Rarelom eats Fi ivesk component remar Annotate K Burs Detecten Annotate K Coren Figure 4 Analysis menu and submenus Sample Datasets and Supported Data Formats 1 Sample Datasets The youmwbdirectory sampledata directory provides sample datasets from the biology network scientometics and social science research domains see listing below humanprot 2 biology OND Used in DND model drosephila esy graduat ion seCase cov SlmpleDuDFunct on cev network VertGraph humanprot aub elyu grapimi sa terror graphal xmi terror kami kal Miklpedia 2ODEUL03 sclence en n4b 8 0 graph ackentometrice fsctentonetrics bibtex fackentometrics csv foe ntonetrice endnote fscientonetrics i testspapers lei foe Konetrice nef acipolloy ner sampledata sclentometrics properties Used to extract networks and merge data piprenconuthorsnip propert ies perCitatlo mergekibtexkuthojs properties mergesndncteAuthora properties mergelaiAuthors properties P propert les sampiedata hape code networks 2 Data Formats In September
28. GUESS supports the repositioning of selected nodes Multiple nodes can be selected by holding down the Shit key and dragging a box around specific nodes The final network can be saved via GUESS File gt Export Image and opened in a graphic design program to add a title and legend The image below was created using Photoshop and label sizes were changed as well Joint Co Authorship Network Node Site Num as s 1 L cM umm 1 MO I 3 Figure 4 undirected weighted co author network for FourNetSciResearchers dataset 6 2 2 Word Co Occurrence Network The topic similarity of basic and aggregate units of science can be calculated via an analysis of the co occurrence of words in associated texis Units that share more words in common are assumed to have higher topic overlap and are connected via linkages and or placed in closer proximity Word co occurrence networks are weighted and undirected In the NWB Tool select the table of 361 unique ISI records from the FourNetSciResearchers dataset in the Data Manager Run Preprocessing gt Normalize Text using parameters The performed text normalization utilizes the StandardAnalyzer provided by Lucene htip ucene apache org iava 2 2 Vapi orp apache lucene analysis standard StandardAnalyzer html It separates text into word tokens normalizes word tokens to lower case removes s from the end of words removes dots from acronyms and deletes stop words Soon the Porter
29. Network Workbench Tool User Manual 1 0 0 beta Getting Started General Tutorial Domain Specific Information Science Tutorial Domain Specific Social Science Tutorial Domain Specific Scientometrics Tutorial Updated 2009 03 05 Programmers Bonnie Weixia Huang Micah Linnemeier Timothy Kelley Russell J Duhon Users Testers amp Writers Katy B rner Angela Zoss Hanning Guo Ann MeCranie Mark Price Cyberinfrastruciure for Network Science Center School of Library and Information Science Indiana University Bloomington IN For comments questions or suggestions please post to the nwb helpdesk googlegroups com mailing list Table of Content Getting Started 1 Introduction 2 Download and Install 3 User Interface 4 Workflow Design Sample Datasets and Supported Data Formats 1 Sample Datasets 2 Data Formats Code Library 1 OSGI Plugins 2 CIShell Plugins 3 Converter Plugins 4 Algorithm Plugins 5 Supporting Libraries 6 Integrated Tools General Tutori 1 Load View and Save Data 2 Data Conversion Service 3 Compute Basic Network Statistics 4 Tree Visualizations 5 Graph Visualizations 6 Saving Visualizations for Publication Domain Specifie Information Science 1 Read and Display a File Hierarchy 2 Error and Attack Tolerance of Networks 24 3 Studying Peer to Peer Networks 24 Domain Specific Social Science 1 Load Data 2 Basic Network Properties 3 Network Analysis 4 Visualization Doma
30. Paper presented at the Conference on Human Factors in Computing Systems Portland OR New York ACM Press pp 421 430 accessed on Herr I Bruce W Weixia Bonnie Huang Shashikant Penumarthy Katy Bomer 2007 Designing Highly Flexible and Usable Cyberinfrastractures for Convergence In William S Bainbridge amp Mihail C Roco Eds Progress in Convergence Technologies for Human Wellbeing Vol 1093 pp 161 179 Boston MA Annals ofthe New York Academy of Sciences accessed on itation networks in digital libraries Information Huang Weixia Bonnie Bruce Herr Russell Duhon Katy B rner 2007 Nenwork Workbench Using Service Oriented Architecture and Component Based Development to Build a Tool for Network Scientists Paper presented at the International Workshop and Conference on Network Science accessed on Hummon N P P Doreian 1989 Connectivity in a Citation Network The Development of DNA Theorty Social Networks 11 39 63 Ihaka Ross Robert Gentleman 1996 R A language for data analysis and graphics Journal of Computational and Graphical Statisties 53 299 314 btp www amstat orw publications jess accessed on 7 17 08 Jaro M A 1989 Advances in record linking methodology as applied to the 1985 census of Tampa Florida Journal of the American Statistical Society 64 1183 1210 Jaro M A 1995 Probabilistic linkage of large public health data file Statistics in Med
31. Stemmer Algorithm http tararus org martin PorterStemmer will become available as well The result is a derived table in which the text in the abstract column is normalized Select this table and run Scientometrics gt Extract Word Co Occurrence Network using parameters Text Deliniti eine Aggregate Function File lone The outcome is a network in which nodes represent words and edges denote their joint appearance in a paper Word co occurrence networks are rather large and dense Running the Analysis gt Nenwork Analysis Toolkit NAT reveals that the network has 2 888 word nodes and 366 009 co occurrence edges There are 235 isolate nodes that can be removed running Preprocessing gt Delete Isolates Note that when isolates are removed papers without abstracts are removed along with the keywords The result is one giant component with 2 653 nodes and 366 009 edges To visualize this rather large network run Visualization gt DrL VxOrd with default values To keep only the strongest edges run Preprocessing gt Extract Top Edges using parameters and leave the others at their default values Once edges have been removed the network can be visualized by running Visualization gt GUESS In GUESS run the following commands Linear referen eolorize s aay and set the background color to white to re create the visualization The result should look something like the one in Fig
32. To save your visualization choose File gt Export Image Domain Specific Scientometrics 1 Introduction 1 1 Available Datasets and Algorithms Scientometrics specific sample datasets can be found in zonal bibliograph Relevant algorithm plugins are stored in the general Directory see section Code Library Each algorithm has the package name that best fits its function e g preprocessing analysis visualization etc An example is In the NWB Tool the scientomettics specific algorithms have been grouped under a special Scientometries menu item to ease usage see Figure l oues Maite majas Mibi tamtama Me IB orn Een te aerponto asysta paana o eios qe f awa mansa and oiu ug pekemo smn a Ran mitte al ehem D o a estesa ee SETI Mamen waja wan MERE SENATE P beside TIGAN SpA S mravne man on E muiaa aial mape anaa EZ Este AMR UD Una Sum mam cm EE Figure 1 Scientometries menu in the Network Workbench Tool 12 General Workflow The general workflows of scientometri studies and specifically the mapping of science were detailed in Borner Chen amp Boyack 2003 The general workflow of a scientometric study is given in Table 1 Major steps are 1 data extraction 2 defining the unit of analysis 3 selection of measures 4 calculation of similarity between units 5 or
33. Traekablekigoritha 1 0 0 jar 8 Converter Plugins iconverter edgelist 1 0 0 jar Converter junqpref se 1 0 0 jar Converter jungprefusebeta 1 0 0 jar Converter nubgraphml 1 0 0 jar Converter nebpa jeknet 1 0 0 jar Converter pajekmat 1 0 0 jar Converter pa jeknatpa jeknet 1 0 0 jar Converter pajeknet 0 7 0 jar Converter prefusebibtex 0 0 1 jar Converter prefusecsv D 7 0 jar Converter prefusegraphai 0 7 0 jar Converter prefuseisi 0 7 0 jar Converter prefosensf 0 0 1 jar Converter prefuserefor 0 0 1 jar Converter prefusescopuz 0 0 1 jar Converter prefuseTreebetaAlpha 1 D 0 jar Converter prerusetreemi 1 0 0 jar converter prefusexgami 3 0 0 jar edu iu nvb converter tablegraph 1 0 0 jar Edu iu nub converter treegraph 1 0 0 jar 4 Algorithm Plugins 4A Preprocessing edu lu nub composite extractauthorpapernetwork_0 0 1 jar edu lu nub composite extractcowordiromtable 1 0 0 jar fdu lu nub composite extractpapercitarionnerwork U 0 1 jar Kaul iu nui composite lai 1oedande lean 0 0 1 Jar edu ius nub preprocessing bibeouplingsinilarity 0 9 0 jar Zdu lu nub preprocessing cocitationsimilarity 1 0 0 Jar aul iu nub preprocesajng cev 1 0 0 jar du iu nub preprocesslag duplicacencdedetectar 1 0 0 jar Zdu lu nub preprocessing ext ractnodesandedges 0 0 1 jar edu lu nub preprocesalng prefuse beta directaryhiorarchyreader 1 0 0 jar edu lu nub preprocesalng tabletiiter 1 0 0
34. a user friendly interface as well as providing significant visualization functionality through the integrated GUESS tool 9 2 HistCite by Eugene Garfield Compiled by Angela Zoss HistCite was developed by Eugene Garfield and his team to identify the key literature in a research field As stated on the Web site HistCite analyzes ISI data retrieved via a keyword based search or cited author search and identifies important papers most prolific and most cited authors and journals other relevant papers keywords that can be used to expand the collection It can also be used to analyze publication productivity and citation rates of individuals institutions countries By analyzing the result of an author search highly cited articles important co author relationships a time line of the authors publications and historiographs showing the key papers and timeline ofa research field can be derived A trial version of the tool is available at hip www histcite com An interactive version of the FourNetSciResearchers isi analysis result is at http ella slis indiana edu katy outgoing combo Subsequently we compare paper paper citation networks created by NWB Tool and HistCite for the FourNetSciResearchers isi dataset HistCite identifies 360 nodes in this network while NWB identifies 361 unique records The discrepancy is the result of two records that have identical Cite Me As values ANDERSON CJ 1993 J MATH PSYCHOL V37
35. abeled using odes n2 emp at glo ieitationcount n2 globaleltationcount gt ford in range 0 20 tepteji labelviaible true Alternatively run GUESS File gt Run Script and select yournwbdirectory sampledata scientometrics isilpaper citution nw py Vei AL 2 R MEP BENET v3 eis Figure 3 Giant components of the paper citation network Compare the result with Figure 2 right and note that this network layout algorithm and most others are non deterministic That is different runs lead to different layouts observe the position of the highlighted node in both layouts However all layouts aim to group connected nodes into spatial proximity while avoiding overlaps of unconnected or sparsely connected subnetworks 6 1 2 Author Paper Consumed Produced Network There are active and passive units of science Active units e g authors produce and consume passive units e g papers patents datasets software The resulting networks have multiple types of nodes e g authors and papers Directed edges indicate the flow of resources from sources to sinks e g from an author to a written produced paper to the author who reads consumes the paper Presently NWB cannot derive this network type from ISI data it can however simulate such a network using TARL for more specific information see the section Modeling Scholarly Networks 6 2 Co Occurrence Linkages 6 2 1 Author Co O
36. alculated with Analysis gt Unweighted amp Directed gt Dyad Reciprocity This will give you a network level reciprocity measure In this network 17 5 percent of dyads are reciprocal 4 Visualization There are several visualization options under development in the NWB tool To replicate the one shown at this poster presentation you will need to use the GUESS package which is an implementation of the open source program developed by Eytan Adar To learn more about this package visit hitp jraphexploration cond orp index html and be sure to look at the manual and wiki site With the original PSYCHCONSULT file you loaded into NWB highlighted should be the top of the Data Manager choose Visualization gt GUESS For this relatively small network the loading time will be a few seconds You will see a second window appear with a network In this new window choose Layout gt Kamada Kawai You may choose to repeat this layout multiple times The basic shape and layout of the network will remain the same but you will notice the orientation changing You can also drag and enlarge the window that the network is in To see specific information about nodes or edges choose Display gt Information Window As you pass your mouse over the edges and nodes in the network you will see specific information about these nodes In order to zoom into the network right click on the black space around the network and drag to the right Drag left in or
37. all Vaughan amp Bjomeborn 2005 communication theory Monge amp Contractor 2003 sociology of science Lenoir 2002 and several other disciplines Authors institutions countries as well as words papers journals patents funding etc are represented as nodes and their complex interrelations as edges For example author and paper nodes exist in a delicate ecology of evolving networks Given a set of papers diverse networks can be extracted Typically three types of linkages are distinguished direct linkages e g paper citation linkages co occurrences e g of words authors references and co citations e g of authors or papers Linkages may be among units of the same type e g co authorship linkages or between units of different types e g authors produce papers Units of the same type can be interlinked via different link types e g papers can be linked based on co word direct co citation or bibliographic coupling analysis Linkages might be directed and or weighted Nodes and their linkages can be represented as adjacency matrix edge list and visually as structure plot or graph Each non symmetrical occurrence matrix e g paper citations has two associated symmetrical co occurrence matrices eg a bibliographic coupling and a co citation matrix Figure 1 shows a sample dataset of five papers A through E published over three years together with their authors x Y 2 references blue references are pa
38. ard D White N Nazer 2004 Does Citation Reflect Social Structure Longitudinal Evidence from the Globenet Interdisciplinary Research Group JASIST 55 111 126 White Howard D Katherine W McCain 1998 Visualizing a Discipline An Author Co Citation Analysis of Information Science 1972 1995 Journal of the American Society for Information Science 494 327 355 Williams Thomas Colin Kelley 2008 gnuplot homepage hip www gnuplot info accessed on 7 17 08 Wilson C S 2001 Informetries In M E Williams Ed Annual Review of Information Science and Technology Vol 37 pp 107 286 Medford NJ Information Today Inc American Society for Information Science and Technology accessed on License The Network Workbench Tool NWB Tool nsed under the Apache License Version 2 0 the License You may obtain a copy of the License at hitp www apache org licenses LICENSE 2 0 Unless required by applicable law or agreed to in writing software distributed under the License is distributed on an AS IS BASIS WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND either express or implied See the License for the specific language governing permissions and limitations under the License
39. ate run and validate network models Use different visualizations to interactively explore and understand specific networks Share datasets and algorithms across scientific boundaries In September 2008 the NWB Toll provides access to over 80 algorithms and 30 sample datasets for the study of networks The loading processing and saving of seven file formats NWB GraphML Pajek net Pajek matrix XGMML TreeML CSV and an automatic conversion service among those formats are supported Additional algorithms and data formats can be integrated into the NWB Tool using wizard driven templates Although the CIShell and the NWB Tool are developed in JAVA algorithms developed in other programming languages such as FORTRAN C and C can be easily integrated Among others JUNG O Madadhain Fisher amp Nelson 2008 and Prefuse libraries Heer Card amp Landay 2005 have been integrated into the NWB as plug ins NWB also supplies a plug in that invokes the GnuPlot application Williams amp Kelley 2008 for plotting data analysis results and the GUESS tool Adar 2007 for rendering network layouts LaNet vi Alvarez Hamelin Dall Asta Barrat amp Vespignani 2008 uses the k core decomposition to visualize large scale complex networks in two dimensions 2 Download and Install The Network Workbench tool is a standalone desktop application that installs and runs on all common operating systems NWB tool 0 7 0 and later versions require
40. ation and Graphical Ves AlMsjor iCvtoscap analysis tol focusing on Consortium Biological networks with 2008 particularly nice Visualizations Tip a fes Visualization Graph visualiza Tor networks over 1 000 000 elemen a sowane Graphical Vex Al Major Auber 2003 Tp es Auiysrani Alwayfordasian Li Yer AI Mor Cuir Manipulation cutina edge network analysis Nopus 2006 sable with many programming Languages Cice OT scenam ATV too to analyze and Gama Ye ANI Cien 30067 visualize scientific Iiterature particularly co citation HR 07 Samon amp V Rnalysis and viala Graphical Ro Windows Gata tool for dala from the Web of 2008 T o Susa ATV Command Vo ATM MEZE Tanguay tine Gentleman for sophisticated network 1996 analyses Peke 39 Visalia Vaaa A general saaan Ti Ye AO erat ion framework with many 2005 capabilities to support network visualization and analysis GUESS 2007 Nemoris Visualization A toi for val graph amar Ys AUN Air 30077 exploration that integrates a Scripting environment Tapa p xu Nemore Vialia Flexible graph visualization Graphical Ye Ao TATE software Research Group 2008 KWE Tool e mo 47V Newark anala E Caia Ye AFERO ang 2077 ES ion tool conduci Sciemom ormo supporti
41. b Levelete recuse 1 Iv Recurs the entire tiee Rend directores ony sp files Figure 1 Directory Hierarchy Reader parameter settings window 2 Error and Attack Tolerance of Networks Please see http iv lis indiana edu Im Im errorattack html 3 Studying Peer to Peer Networks Please sce htip iv slis indiana edu Im Im p2p seareb html Domain Specific Social Science Tutorial originally prepared for Sunbelt 2008 by Ann MeCranie Department of Sociology Indiana University Bloomington amccraniciindiana edu For this example we will use PSYCHCONSULT an extract from the Staff Study of the Indianapolis Network Mental Health Study This is a file in the Network Workbench nwb format a basic edge list format that can include node and edge attribute information in a text file It is a directed asymmetric unvalued unweighed network that represents the consultation choices among the staff that work in a psychiatric hospital 1 Load Data Load the PSYCHCONSULT data after launching the NWB Tool with File gt Load Choose the PSYCHCONSULT data located in the Sampledara Nerworks folder This folder will be located in your NWB Installation Directory Once you have loaded this network you will see it appear in the right hand Data Manager window You may right click on this network and choose View to look at the contents You may also open this file separately in a text editor to explore it or make changes Please note that as y
42. ccording to their BC values using the following GUESS interpreter commands resizeLinear 825 409 nodes colori aa 200 2001 9 0 0 Colorize numberofesauthoredworka 127 193 65 10 0 01 Y edges genLayout binPackLayout 1 nunberofcoauthoreduo The elimination of the top 5 nodes with the highest BC values leads to unconnected clusters of nodes see Figure 3 right Note how Garfield s network top right disintegrates into many smaller subnetworks while the joint Barabasi Vespignani network exhibits a giant connected component even after the removal of the top 4 edges The Wasserman network bottom right is unaffected since no nodes or edge have been removed Figure 3 Layout of FourNetSciResearchers dataset with nodes size coded according to their BC value left and with the top 5 nodes with the highest BC values removed right 732 Other Community Detection Algorithms The NWB team is currently working on integrating additional community detection algorithms into Network Workbench We hope to include the following algorithms in the near future e Girvan and Newman 1999 which cuts edges in order of descending betweenness centrality to give clusters Girvan amp Newman 2002 Blondel et al 2008 which agglomerates communities hierarchically based on improvements in modularity Blondel Guillaume Lambiotte amp Lefebvre 2008 e Palla et al 2005 also called CFinder which finds communities based on ove
43. ccurrence Co Author Network Having the names of two authors or their institutions countries listed on one paper patent or grant is an empirical manifestation of scholarly collaboration The more often two authors collaborate the higher the weight of their joint co author link Weighted undirected co authorship networks appear to have a high correlation with social networks that are themselves impacted by geospatial proximity B rner Penumarthy Meiss amp Ke 2006 Wellman White amp Nazer 2004 To produce a co authorship network in the NWB Tool select the table of all 361 unique ISI records from the FowNetSciResearchers dataset in the Data Manager window Run Scientometrics gt Extract Co Author Network using the parameter The result is two derived files in the Data Manager window the co authorship network and a table with a listing of unique authors also known as merge table The merge table can be used to manually unify author names e g Albet R and Albert R see example below In order to manually examine and if needed correct the list of unique authors open the merge table e g ina spreadsheet program Sort by author names and identify names that refer to the same person In order to merge two names simply delete the asterisk in the last column of the duplicate node s row In addition copy the uniquelndex of the name that should be kept and paste it into the cell of the name that should be dele
44. cessed on Barab si A L 2002 Linked The New Science of Networks Cambridge UK Perseus Batagelj Vladimir Andrej Mrvar 1998 Pajek Program for Large Network Analysis Connections 21 2 47 57 Blondel Vincent D Jean Loup Guillaume Renaud Lambiotte Etienne Lefebvre 2008 Fast unfolding of community hierarches in large networks hp arxiv org abs 0803 0476 accessed on 7 17 08 Borgatti S P M G Everett L C Freeman 2002 Ucinerfor Windows Software for Social Network Analysis hitp www analytictech comVucinet ucinet_5_description htm accessed on 7 15 08 Borgman C L J Fumer 2002 Scholarly Communication and Bibliometrics In B Cronin amp R Shaw Eds Annual Review of Information Science and Technology Medford NJ Information Today Ine American Society for Information Science and Technology accessed on Bomer Katy Chaomei Chen Kevin W Boyack 2003 Visualizing Knowledge Domains In Blaise Cronin Ed Annual Review of Information Science and Technology Vol 37 Medford NJ American Society for Information Science and Technology accessed on Bomer Katy Bruce W Herr II Jean Daniel Fekete 2007 July 3 2007 IYO Software Infrastructures Workshop Paper presented at the IVO Zurich Switzerland accessed on omer Katy Shashikant Penumarthy Mark Meiss Weimao Ke 2006 Mapping the Diffusion of Information Among Major U S Research Institutions Scientometrics Dedicated issue on the 10t
45. d Search site To retrieve all projects funded under the new Science of Science and Innovation Policy SciSIP program simply select the Program Information tab do an Element Code Lookup enter 7626 into Element Code field and hit Search button On Sept 21 2008 exactly 50 awards were found Award records can be downloaded in CSV Excel or XML format Save file in CSV format and a sample csv file is available in yourmwbdirectory sumpledata scientometrics nsfiscipolicy csv In the NWB Tool load the file using File gt Load File A table with all records will appear in the Data Manager Right click and view file in Microsoft Office Excel To show how to analyze and visualize funding data we use the active NSF awards data from Indiana University 257 records Cornell University 501 records and University of Michigan Ann Arbor 619 records which were downloaded on 11 07 2008 Save files as csv but rename into nsf Or simply use the files provided in yourmibdirectory sampledata scientometricsinsf 23 1Extracting Co PI Networks Load NSF data selecting the loaded dataset in the Data Manager window run Seientometries gt Extract Co Occurrence Network using parameters Esra robo from aded able abn hone amm Test Donker Tr Aggregeticn Function File C Documents ard Settings atyipesitogkrmblsareledatalscientametrislpropertiesy sf CoPI properties Browse Eze Two derived fl
46. d line driven interactive data and function plotting utility for UNIX IBM OS 2 MS Windows DOS Macintosh VMS Atari and many other platforms For more information see tp vww gnuplot info General Tutorial 1 Load View and Save Data In the NWB tool use File gt Load File to load one of the provided in sample datasets in yournwhdirectory sampledata or any dataset of your own choosing see section 2 on supported Data Formats and Figure 1 Figure 1 Select a file The result will be listed in the Data Manager window see Figure 2 Figure2 Display of loaded network in the Data Manager window Any file listed in the Data Manager can saved viewed renamed or discarded by right clicking it and selecting the appropriate menu options If File gt View Wilh was selected the user can select among different application viewers sce Figure 7 Choosing Microsoft Office Excel for a tabular type file will open MS Excel with the table loaded Mictosft Office Excel Comma Separated Values File Figure 3 Select Application Viewer for selected file in Data Manager The NWB tool can save a network using File gt Save which brings up the Output Data Type window see Figure 3 Note that some data conversions are lossy ie not all data is preserved see also section on Sample Data and Supported Data Formats DUTIES Desi Ne Fiere fleets Pajak mat ostan SHINE reu prs p
47. der to zoom out Dragging with the left mouse button will move the entire network If you lose the network choose Display gt Center Currently there are only a few menu based interfaces for the rich options of GUESS For instance you ean change the background color under Display To take advantage of some of the other options of GUESS you can actually type commands into the console located in the bottom at gt gt gt or you can create a script file a simple text file with the extension py Either type the following commands into the console or create a py file in a text editor To run the script fle chose File gt Run Script and choose your newly created script file colorize Label 2abel s payehiatriat color red labelcs med dir inpt snit colorcred tlusta groupBylarea eskui or Afer you have entered these commands or run this script file your network might disappear off screen Choose Display gt Center to return to the network These commands make the labels visible change the sizes of the nodes and edges colors the labels the same color is assigned to nodes that have the same label of job type thus psychiatrists and the medical director are red while nurses are colored blue The script also creates hulls or demarcations of areas of the network based on the attribute of area You can experiment with the random colors assigned to the hulls by rerunning this script or line
48. dination or the assignment of coordinates to each unit and 6 use of the resulting visualization for analysis and interpretation Table 1 General steps involved in a scientometric study adopted from B rner et al 2003 w 2 Unit or a Layout often one code does both similarity and ordination 6 Display Extraction Analysis Measures steps 4 Similarity 5 Ordination Searches Common Choices Counts Scalar unit by unit matri Dimensonaliy Reduction Interaction Li Paper Frequencies Direct linkage ElgenvectorElgenvalue Browse Seopus Journal Ariues Paperceiaton solutions Pan Google Scholar legem Aor paper Factor Analysis FA and Zoom Medline Heind Author Croccutence Principal Components fer Aitor cations Conathor Analysis PCA Query Patents LabfCenter Covcianions Bibliographic coupling Mubi dimensional scaling Detail on Gras DepiSchoot By year Cowordieotera DS demand tiation Co asificanion Pa hfnder Neworka PENeO Broadening Thresholds Co ciarion Setf organizing maps SOM Analysis amp Byehain Geotocation Bycounte Author co citation ACA Topies Model Interpretation By terms County Document cvciaion DCA Stare Combined linkage Custer Analysis Coury Se Continent Vector unit by attribute Triangularion matri Force directed placement Topical Vector space model FDP Tem wonders Keyword Latent Semantic Analysis or ontologies vonduhenma ineuding lasificatins Singular Value Decomp SVD
49. e bottom right corner to entera search terms and matching files will highlight Figure 1 Tree View visualization with sampledata directory expanded and florentine nwb file highlighted 42 Tree Map Visualization Tracing its ancestry to Venn diagrams the Treemap algorithm was developed by Ben Shneiderman s group at the HCI Lab at the University of Maryland It uses a space filling technique to map a tree structure e g file directory into nested rectangles with each rectangle representing a node A rectangular area is first allocated to hold the representation of the tree and this area is then subdivided into a set of rectangles that represent the top level of the tree This process continues recursively on the resulting rectangles to represent each lower level of the tree with each level alternating between vertical and horizontal subdivisions The parent child relationship is indicated by enclosing the child rectangle by its parent rectangle That is all descendents of a node are displayed as rectangles inside its rectangle Associated with each node is a numeric value e g size ofa directory and the size of a node s rectangle is proportional to its value Shneiderman s Treemaps for space constrained visualization of hierarchies webpage http www cs umd eduheil treemaps provides the full story In the NWB tool select a tree dataset e g generated using the Directory Hierarchy Reader in the Data Manager
50. e developed recently particularly as the publications with very strong bursts include those by Barabasi and Albert on statistical mechanics of scale free networks Watts and Strogatz on small world networks and Newman on random networks It also reflects a change in dynamics however as the historically social science dominated field of network science now sees major contributions by physicists 4 Geospatial Analysis Geospatial analysis has a long history in geography and cartography Geospatial analysis aims to answer the question where something happens and with what impact on neighboring areas Geospatial analysis requires spatial attribute values such as geolocations for authors or their papers extracted from affiliation data or spatial positions of nodes generated from layout algorithms Geospatial data can be continuous ie each record has a specific position or discrete ie a position or area shape file exists for sets of records e g number of papers per country Spatial aggregations e g merging via zip codes counties states countries continents are common Cartographic generalization refers to the process of abstraction such as 1 graphic generalization i e the simplification enlargement displacement merging or selection of entities that does not affect their symbology and 2 conceptual symbolization merging selection plus symbolization and enhancement of entities e g representing high density areas by a n
51. e graph button see HistCite result in Figure 1 Figurel Paper citation graph of the FourNetSciResearcher network in HistCite As can be seen this graph includes several isolates i e nodes that have no links to or from other nodes in the network Le Local Cited References 0 and Local Citation Score 0 According to the textual summary of the dataset given by HistCite there are 59 such isolates in this network These nodes can be marked manually and deleted from the network for a cleaner version of this graph 93 CiteSpace by Chaomei Chen Compiled by Hanning Guo CiteSpace II is a tool to visualize patterns and trends in scientific literature The Java based tool was developed by Dr Chaomei Chen at Drexel University and can be downloaded from hitp cluster cis drexel edu cchen CiteSpace For means of comparison we apply CiteSpace to the FourNetSciResearcher dataset containing all of the ISI records of Garfield Wasserman Vespignani and Barab si Specifically we derive the document co citation co authorship word co occurrence networks and burst detection 9 3 1 Document Co Citation Network The document co citation network for the FourNetSciResearcher dataset was derived using NWB Tool and CiteSpace see Figure 2 In both cases the size of the nodes stand for betweenness centrality Node color coding in NWB Tool was set to reflect betweenness centrality CiteSpace color codes nodes based on ten 2 year time slices
52. e result for the given example is shown in Figure 1 LEQSLOCDELEBEREDLUDLIDETLIMINIOIRIOIRISITIWIV WI CAE nami A AH UNI asl Ne ju Si mim o ai ode Fn Nl gt ai NENNEN j o Zr IX mi um CR c Figure 1 Visualizing burst results in MS Excel Running burst detection on the combined Wasserman Vespignani and Barab si ISI file for authors ISI keywords and cited references results in Figure 3 To generate the later two results select Text Column New ISI Keywords and Cited References instead of Authors in the burst parameters The results reveal many of the trends one would expect to see among these major network science researchers For instance the ISI keywords burst with terms related to diffusion and growth in the early 905 criticality and critical behavior in the late 90s and finish with small world networks complex networks and metabolic networks starting in the early 20005 and not being finished at the end of the dataset as opposed to ending in 2005 the last year of the dataset but for three papers Another pattern is hat almost all of the authors with bursts in the dataset were graduate students of Wasserman Vespignani or Barabssi during this period One notable example of this is Reka Albert who bursts from 1997 to 2000 corresponding to work on her Ph D with Barabasi The result can be visualized as a chart with word and time dimensions see Figure
53. ers word co citation networks in a more legible way Citespace II also provide spectral clustering and expectation maximization clustering see Figure 8 Figure 8 Spectral clustering lef and expectation right maximization clustering of Document co citation network of FourNetSciResearchers Network Extraction A comparison of networks that are extracted by NWB Tool and CiteSpace II can be found in Table 1 As for word co occurrence networks NWB Tool can process the words from any data field but only from one field ata time CiteSpace Il can process words occurring in title abstract descriptors and or identifiers fields Burst Detection Both tools support burst detection for time stamped author names journal names country names references ISI keywords or terms used in title and or abstract ofa paper NWB can detect different types of burst according to the need of research CiteSpace II can detect burst phrases form noun phrases or plain text and visualize them It notes that Noun Phrases are identified using part of speech tagging Plain text terms are identified by sliding window term selector 9 Pajek by Vladimir Batagelj et al A comparison of NWB Tool with Pajek http pajelcimfm si doku php supporting network analysis and visualization is in preparation 9 5 Software by Loet Leydesdortr Loet Leydesdorff s software is a serial of program with different analysis functions It contains the analysis of co authorsh
54. es access lo more than one database see also Bosman Mourik Rasch Sieverts amp Verhoeff 2006 de Moya Aneg n et al 2007 Fingerman 2006 Meho amp Vang 2007 Nisonger 2004 Pauly amp Stergiou 2005 In the NWB Tool load yournwbdirectory sampledata scientometries isi savedrecs barabasi isi yournwhdirectory sampledata scientometries scopus barabasi scopus yournwbdirectory sampledata scientometries bibrewbarabasi bib downloaded from Google Scholar is also available in the respective subdirectories in yourmebdirectory sampledata scientometries It is interesting to compare the result set retrieved from ISI Scopus and Scholar Google 2 2 Personal Bibliographies EndNote and Bibtex Personal references collected via reference management software such as EndNote Thomson Reuters 2008a Reference Manager The Thomson Corporation 2008 or the Bibtex format Feder 2006 can also be read Sample datasets are included in sampledata bibrewendnote Simply load the file and a csv file with all unique files will appear in the Data Manager 2 3 Funding NSF Data Funding data provided by the National Science Foundation NSF can be retrieved via the Award Search site hitp www nstgov awardsearch Search by PI name institution and many other fields see Figure 3 nano mEREN 95 esw National Science Foundarion Em Sette Figure 3 NSF Awar
55. es will appear in the Data Manager window the co PI network and a merge table In the network nodes represent investigators and edges denote their co P relationships The merge table can be used to further clean PI names l Choose the Extracted Network on Column All Investigators and run the Analysis gt Network Analysis Toolkit NAT which reveals the number of nodes and edges but also of isolate nodes that can be removed by running Preprocessing gt Delete Isolates Select Visualization gt GUESS to visualize Run co PI nw py script Visualizations of co PI network of three universities are given in Figure 4 rd ligure 4 Co PI network of Indiana University top lefi Cornell University bottom lefi and University of Michigan right Select network after removing isolates and run Analysis gt Unweighted and Undirected gt Weak Component Clustering with parameter E weak componen custera el Creates new graphs containing the top connected compenenis Number of top chsters 0S Cancel Indiana s largest component has 19 nodes Cornell s has 67 nodes Michigan s has 55 nodes Visualize Cornell s network in GUESS using same py script and save via File gt Export Image as jpg Wiliam Walch oger Cappallo kasmitn Daniel Ralph Abruna Gennady Samoradnitsky Sidnev Resnick argest component of Cornell University co PI network Node size and color encode the t
56. ew city symbol Kraak amp Ormeling 1987 Geometric Generalization aims to solve the conflict between the number of visualized features the size of symbols and the size of the display surface Cartographers dealt with this conflict mostly intuitively until researcher like Friedrich T pfer attempted to find quantifiable expressions for it Skupin 2000 Tobler 1973 T pfer 1974 T pfer e Pillewizer 1966 5 Topical Analysis The topic also called semantic coverage of a unit of science can be derived from text associated with it For example topic coverage and topical similarity of e g authors or institutions can be derived from units associated with them e g papers patents or grants Topical aggregations e g over journal volumes scientific disciplines or institutions are common Topic analysis extracts the set of unique words or word profiles and their frequency from a text corpus Stop words such as the of eic are removed Stemming ie the reduction of words such as scientific science to scien can be applied Porter 1980 Co word analysis identifies the number of times two words are used in the title keyword set abstract and or full text of e g a paper The space of co occurring words can be mapped providing a unique view of the topic coverage of a dataset Similarly units of science can be grouped based on the number of words they share in common Callon Courtial Turner
57. from Google scholar for a specific data and supports the export into BibTex bib CSV txt or EndNote enw that can be read by NWB Tool Personal references collected via reference management software such as EndNote Thomson Reuters 20082 Reference Manager The Thomson Corporation 2008 or the Bibtex format Feder 2006 can also be read as can funding data downloaded from the National Science Foundation and other scholarly data available in plain comma separated value files Examples are given here 2 1 Publication Data ISI Scopus and Google Scholar Today most science studies use either Thomson Scientific s databases or Scopus as they each constitute a multi disciplinary objective internally consistent database Google Scholar constitutes a thirds choice Please see comparison of three sources and discussion of coverage in section 2 1 4 2 11 ISI Data For the purposes of this tutorial we exemplarily downloaded publication records from four major network science researchers three of whom are principal investigators of the Network Workbench project Their names ages retrieved from Library of Congress number of citations for highest cited paper h index Bommann 2006 and number of papers and citations over time as caleulated by the Web of Science by Thomson Scientific Thomson Reuters 2008 are given in Table 2 ISI formatted files of all papers including references were downloaded for all four researchers in December 20
58. gs of InfoVis 2000 91 97 Small Henry 1973 Co Citation in Scientific Literature A New Measure of the Relationship Between Publications JASIS 24 265 269 Small Henry G E Greenlee 1986 Collagen Research in the 1970 s Scientometrics 10 85 117 Takatsuka M M Gahegan 2002 GeoVISTA Studio A Codeless Visual Programming Environment for Geoscientifie Data Analysis and Visualization The Journal of Computers amp Geosciences 28 10 1131 1144 The Thomson Corporation 2008 Reference Manageron 7 15 08 Thelwall M L Vaughan L Bj mebom 2005 Webometrics In Blaise Cronin Ed Annual Review of Information Science and Technology Vol 39 pp 179 255 Medford NJ Information Today Inc American Society for Information Science and Technology accessed on Thomson Reuters 2008 Endnote http www endnote com encopyright asp accessed on 7 15 08 Thomson Reuters 2008b Web of Science http scientiic thomsonreuters com products wos accessed on 7 17 08 Tobler Waldo R 1973 A Continuous Transformation Useful for Districting Science 219 215 220 Topfer F 1974 Kartographisehe Generalisierung Gotha Leipzig VEB Herrmann Haack Geographi Kartographische Anstalt T pfer F W Pillewizer 1966 The Principles of Selection Cartographic Journal 3 10 16 Wasserman S K Faust 1994 Social network Analysis Methods and Applications New York Cambridge University Press Wellman B How
59. gth or success of co author inventor investigator relations etc The geospatial and topic distribution of funding input amp research output the structure and evolution of research topics evolving research areas e g based on young yet highly cited papers or the diffusion of information people or money over geospatial and topic space can be studied 1 24 Ordination Ordination techniques such as triangulation or force directed placement take a set of documents their similarities distances and parameters and generate a typically 2 dimensional layout that places similar documents closer together and dissimilar ones further apart Note that the table covers measures and algorithms commonly used in bibliometrics scientometrics research yet few of the new temporal geospatial topical and network based approaches in existence today 1 25 Display Analysis results can be communicated via text tables name just a few options harts maps that are printed on paper or explored online to Steps 3 5 will be discussed separately for Temporal analyses in section 2 Geospatial analyses in section 3 e Topical analyses in section 4 and Network analyses in section 5 2 Bibliographic Data Acquisition and Preparation The NWB Tool reads publication data from Thomson Scientific ISI or Scopus Google Scholar data can be acquired using 3 party tools such as Publish or Perish Harzing 2008 that retrieves papers
60. h International Conference of the International Society for Scientometries and Informetrics 68 3 415 426 B mer Katy Soma Sanyal Alessandro Vespignani 2007 Network Science In Blaise Cronin Ed Annual Review of Information Science amp Technology Vol 41 pp 537 607 Medford NJ Information Today Inc American Society for Information Science and Technology accessed on Bormann Lutz 2006 H Index A New Measure to quantify the Research Output of Individual Scientists htp wwwforschungsinfo dera aora Index h index asp accessed on 7 17 08 Bosman Jeroen Ineke van Mourik Menno Rasch Eric Sieverts Huib Verhoef 2006 Scopus Reviewed and Compared The Coverage and Functionality of he Citation Database Scopus Including Comparisons with Web of Science and Google Scholar Utrecht University Library Brandes Ulrik 2001 A Faster Algorithm for Betweeness Centrality Journal of Mathematical Sociology 2502 163 177 Brandes Ulrik Dorothea Wagner 2008 Analysis and Visualization of Social Networks htip visone infoj accessed on 7 15 08 Callon M J P Courtial W Tumer S Bauin 1983 From Translations to Problematic Networks An Introduction to Co Word Analysis Social Science Information 22 191 235 Callon M J Law A Rip Eds 1986 Mapping the Dynamics of Science and Technology London Macmillan accessed on Carrington P J Scott S Wasserman 2005 Models and Methods in Social Netwo
61. icine 14 491 498 Kessler Michael M 1963 Bibliographic coupling between scientific papers American Documentation 14 1 10 25 Kleinberg J M 2002 Burst and Hierarchical Structure in Streams Paper presented at the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining ACM Press pp 91 101 accessed Kohonen Tuevo 1995 Self Organizing Maps Berlin Springer Kraak Menno Jan Ferjan Ormeling 1987 Cartography Visualization of Spatial Data Delft NL Delft University Press accessed on Krebs Valdis 2008 Orgnet com Software for Social Network Analysis and Organizational Network Analysis bitp vwew orgnetconvinflow3 html accessed Kruskal 1B 1964 Multidimensional Scaling A Numerical method Psychometric 29 115 129 Landauer T K P W Foltz D Laham 1998 Introduction to Latent Semantic Analysis Discourse Processes 25 259 284 Landauer T K S T Dumais 1997 A Solution to Plato s Problem The Latent Semantic Analysis Theory of the Acquisition Induction and Representation of Knowledge Psychological Review 104 211 240 Lenoir Timothy 2002 Quantitative Foundation forthe Sociology of Science On Linking Blockmodeling with Co Citation Analysis In John Scott Ed Social Nenworks Critical Concepts in Sociology New York Routledge accessed on LeydesdoriT Loet 2008 Software and Data of Loet Leydesdonff hip users img uva nlleydesdorf so ware htm accessed on 7 15 2
62. ik Brandes Mark Gerstein Stephen North and Tom Snijders Open Source Libraries and Tools used please see sections 5 Supporting Libraries and 6 Integrated Tools The Network Workbench is supported in part by the 21 Century Fund and the National Science Foundation under grants IIS 0238261 and IIS 0513650 The Scholarly Database is funded by the School of Library and Information Science and the Cyberinfrastructure for Network Science center at Indiana University the National Science Foundation under Grants No I18 023826 119 0513650 SBE 07381 11 and a James S McDonnell Foundation grant in area Studying Complex Systems Any opinions findings and conclusions or recommendations expressed in this material are those of he author s and do not necessarily reflect the views of the National Science Foundation References Adar Eytan 2007 Guess The Graph Exploration System htip graphexploration cond org accessed on 4 22 08 Alvarez Hamelin Ignacio Luca Dall Asta Alain Barrat Alessandro Vespignani 2008 LaNet vi bitp xavier informaties indiana edulanet vil accessed on 7 17 07 Anthonisse JM 1971 The rush in a directed graph Amsterdam NL Stichting Mathematisch Centrum AT amp T Research Group 2008 Graphvie Graph Visualizaiton Software hp www graphviz org Credits php accessed on 7 17 08 Auber David Ed 2003 Tulip A Huge Graph Visualisation Framework Berlin Springer Verlag ac
63. in Specific Scientometrics 1 Introduction 2 Bibliographic Data Acquisition and Preparation 29 3 Temporal Analysis 36 4 Geospatial Analysis 40 5 Topical Analysis 40 6 Network Analysis 40 7 Analyzing and Visualizing Large Networks 50 9 Comparison with Other Tools E Acknowledgements References Getting Started 1 Introduction The Network Workbench NWB Tool Herr I Huang Penumarthy amp B rner 2007 is a network analysis modeling and visualization toolkit for physics biomedical and social science research It is built on Cyberinfrastructure Shell CIShell Cyberinfrastructure for Network Science Center 2008 an open source software framework for the easy integration and utilization of datasets algorithms tools and computing resources ClShell is based on the OSGi R4 Specification and Equinox implementation OSGi Alliance 2008 The Network Workbench Community Wiki provides a one stop online portal for researchers educators and practitioners interested in the study of networks It is a place for users of the NWB Tool CIShell or any other ClShell based program to get upload and request algorithms and datasets to be used in their tool so that it truly meet their needs and the needs of the scientific community at large Users of the NWB Tool can Access major network datasets online or load their own networks Perform network analysis with the most effective algorithms available Gener
64. increase and decrease at different rates and respond with different latency rates to internal and external events Temporal analysis aims to identify the nature of phenomena represented by a sequence of observations such as patterns trends seasonality outliers and bursts of activity Data comes as a time series i e a sequence of events observations which are ordered in one dimension time Time series data can be continuous i e there is an observation at every instant of time see figure below or discrete i e observations exist for regularly or irregularly spaced intervals Temporal aggregations e g over journal volumes years decades are common Temporal analysis frequenily involves some form of filtering is applied to reduce noise and make patterns more salient Smoothing e g averaging using a smoothing window of a certain width and curve approximation might be applied The number of scholarly records over time is plotted to get a first idea of the temporal distribution of a dataset It might be shown in total values or in of total Sometimes it is interesting to know how long a scholarly entity was active how old it was in a certain year what growth latency to peak or decay rate it has what correlations with other time series exist or what trends are observable Data models such as the least squares model available in most statistical software packages are applied to best fit a selected function to a data se
65. ip network word co occurrence network international collaboration network institute collaboration network etc The results can be visualized by Pajek Ucinet and NWB etc NWB is the program which specializes in the analysis of Scientometrics and visualizations In this case we simply compare the analysis functions between hem The data we use is from Network Work Bench sample data file which contains all of Eugene Garfield s Wasserman s Vespignani s and Barab si s ISI records for part 1 3 However in part 4 we also use the NWB sample data of Brain Cancer from Scopus 9 5 1 Co authorship Network Figure 9 Co author network for FourNetSciResearchers with Loet s software The figures indicate that the structures of the two networks which are derived by both two tools are similar However the network derived by Leydesdorff s software is directed As we ve seen in 6 2 1 NWB s co authorship network is undirected In addition the former has four weakly connected component the later has two which caused by the linkage between Anderson and three authors in the component of Wasserman See the detailed differences between two networks in Table 3 Table 3 Comparison of Co authorship network structure between two software Attributes Toets software NWE Direction in the Graph Directed Undirected Nodes 21 247 Edges 1705 EJI Weakly connected component 7 3 Nodes in the largest connected 19
66. iribution 1 0 0 jar du iu nvb analyaia totaldegreesequence 1 0 0 jar dulju nub analysis transitivity adjacency 1 0 0 jar edu lu nub analysis undirectedknn 1 0 0 jar edu iu nub analysie_weakcomponenteluscering 1 0 0 jar edu iu iv search pap bfs 1 0 0 jar fdu lu iv search p2p randomualk 1 0 0 jar 43 Modeling bert 1 0 0 jar andomgraph 1 0 0 jar du lu mb modeling amallworld 1 0 0 jar edu iu Av modeLin Bop edu u Lv nodeling p2p balicongraph 1 0 0 jar dei 1 00 jar 0 0 jae Jungnetvorklayout_1 0 0 jar Jon radialgraph_1 0 0 jar 45 Tools 5 Supporting Libraries p dyehoniib 2 2 1 apice commons collections 3 1 0 6 Integrated Tools 64 GUESS GUESS is an exploratory data analysis and visualization tool for graphs and networks The system contains a domain specific embedded language called Gython an extension of Python or more specifically Jython which supports the operators and syntactic sugar necessary for working on graph structures in an intuitive manner An interactive interpreter binds the text that you type in the interpreter lo the objects being visualized for more useful integration GUESS also offers a visualization front end that supports the export of static images and dynamic movies For more information see https nwb slisindiana edu community n VisualizeData GUESS 62 Gnuplot Gnuplot is a portable comman
67. jar fdu iu nub preprocesaing text normallzatlon 1 0 0 jar edu lu nub preprocessing timesi ice 1 0 0 jar edu iu nub preprocessing trimedges 1 0 0 jar du iu nub shared iaiutil 1 0 0 jar edu lu nub tools mergenodes 1 0 0 jar 42 Analysis giu luun analysis averageshartestpath 1 0 0 jar edu lu nub analysis clustering 1 0 0 jar a u nub analysis clustering va k 1 0 0 jar Sau Is Seb analysis conneetedeonponents 1 00 jar edu iu nub analysis extractattractors 0 0 1 jar edu lu nub analysis ext ractcoauthorship 1 0 0 jar edu lu nub analysis ext ractdirectednetfrontable 1 0 0 jar edu lu nub analysis ext cactnet fromtabie 1 0 0 jar edu iucnvb analysis indegreedistribution_1 0 0 Jar edu lu nub analysie_indegreesequence 1 0 0 jor du iu nui analyaia iaiduprenaver 0 0 1 jar Rd lu nub analysis isolates 1 0 0 jar edu lu nub analysis multipartitoj S u lu nub analyeis onepolncorrelationz 1 D D jar du lu nub analysis outdegreedistribution 1 0 0 jar Sau iu nub analysis cutdegreesequence 1 070 jar du lu nub analysis pagerank 1 0 0 jar ning 1 0 0 jar fdu lu nub analysis pathfindergraphnetworkscaling 1 0 0 jar du lu nub analysis reciprocity arc 1 0 0 ar edu lu nub analysis reciprocity dyad 1 0 0 jar S u lu nub analysis sampling 1 0 0 jar edu lu nub analysis selfloops 1 U U jar Kaul iu nui analyaia ahortestpathdiatr 1 0 0 jar S u iu nub anolyeis toraldegrecdis
68. l ofthe American Society for Information Science 41 391 407 Feder Alexander 2006 BibTeX org Your BibTeX resource bitp iwww bibtex ora accessed on 7 15 08 Fekete Jean Daniel Katy Bimer 2004 Workshop on Information Visualization Sofware Infrastructures Austin Texas accessed on Fingerman Susan 2006 Electronic Resources Reviews Web of Science and Scopus Current Features and Capabilities Issues in Science and Technology Librarianship Fall bip wwwistor 6 allelectronic html accessed on 9 23 08 Freeman L C 1977 A set of measuring centrality based on betweenness Sociomerry 40 35 41 Garfield Eugene 2008 HistCite Bibliometric Analysis and Visualization Software Version 8 5 26 Bala Cynwyd PA HistCite Software LLC hup Nwww histcite com accessed on 7 15 08 Girvan M M E I Newman 2002 Community Structure in Social and Biological networks PNAS 99 7821 7826 Granovetter Mark 1973 The Strength of Weak Ties American Journal of Sociology 78 1360 1380 Griffiths Thomas L Mark Steyvers 2002 A Probabilistic Approach to Semantic Representation Proceedings of he 24th Annual Conference of the Cognitive Science Society Fairfax VA Harzing Anne Wil 2008 Publish or Perish A citation analysis software program hitp Avww harzing com resources him accessed on 4 22 08 Heer Jeffrey Stuart K Card James A Landay 2005 Prefuse A toolkit for interactive information visualization
69. le we show the application of Preprocessing gt Extract Edges Above or Below Value to the FourNetSciResearchers for different thresholds in Figure 2 Higher thresholds result in fewer edges and more network components Figure 2 Layout of FourNetSciResearchers dataset with no threshold left with three or more eo authorships middle and with 5 or more eo authorships right 7 3 1 Betweenness Centrality BC refers to the number of times a path from any node in a network to any other node in this network goes through a specific node or edge Freeman 1977 A node or edge that interconnects two sub networks has a high BC value and is also called a gatekeeper or weak link Granovetter 1973 The original algorithm is computationally very expensive and only applicable to networks of up to several hundred nodes In 2001 Ulrik Brandes proposed a more efficient algorithm for betweenness that exploits the extreme sparseness of typical networks Brandes 2001 Other shortest path based indices like closeness or radiality can be computed simultaneously within the same bounds Anthonisse 1971 In the NWB Tool application of Analysis gt Unweighted amp Undirected gt Node Betweenness Centrality to the FourNetSciResearchers co authorship network adds BC values to each node The top five nodes with the highest BC value are listed below Figure 3 left shows a GUESS visualization with nodes size coded and color coded a
70. lgorithms available e g Analysis and Visualization A V see also description column Table 1 Network analysis and visualization tools commonly used in scientometries research Tar Yer Dons ce Demon Wr Ups Tom KET Source System SET TRE SETON Scenum Tools Wom ION TETO Command No Windows ISM Dynamics for organization analysis and ine 2008 Toolbox visualization of scholarly Em mH WaT sosi ATV Social network analysis Gps No Windows rs 20087 software for organizations with support for what it pro PIE Toe sus AT A network analysand Gamal No Windows Bagea visualization program with Mrvar 1998 many analysis algorithms particularly for social network Analysis UCING 393 ses AV Social network aya Gama ww Windows nara software paniculariy useful Everett amp For exploratory analysis Freeman 2002 Boot 2000 Analysis and Extremely efficient and Livy ves Aumajor Siek Lee a Graph Manipulation flexible C library for Lumsdaine Library extremely large networks 2002 BOT Susa ATV Social network analysis ool Graphical No ANNE Brandes a For research and teaching with a focus on innovative and advanced visual methods Wagner 2008 GISTA xw us evi sofware that can be used Graphical Ves AWMaor itu to lay out networks om Gahegan osti substrates 2002 Crema AE Ha Vimalizsion Network visualiz
71. m A query for papers by Albert L szl Barab si run on Sept 21 2008 results in 111 papers that have been cited 14 343 times see Figure 2 TTC aac 5 nimrihamn ele pa maranana fo KEN Fe a t ira a E eerta nnum rm homon E mtas Kamen ey km FE pcr kawan reao aran Fe Barta Aa ant GENS NN a EE KE EN EI erum E kuali ive ene a eee off Figure 2 Publish or Perish interface with query result for Albert L szl Barab si To save records select from menu File gt Save as Bibtex or File gt Save as CSV or File gt Save as EndNote All three file formats can be read by NWB Tool The result in all three formats named barabasi is also available in the respective subdirectories in yournwhdirectory sumpledata scientometrics and will be used subsequently 2 14 Comparison of ISI Scopus and Google Scholar A number of recent studies have examined and compared the coverage of Thomson Scientific s Web of Science Wos Scopus Ulrich s Directory and Google Scholar GS It has been shown that the databases have a rather small overlap in records The overlap between WoS and Scopus was only 58 2 The overlap between GS and the union of WoS and Scopus was a mere 30 8 While Scopus covers almost twice as many journals and conferences than WoS it covers fewer journals in the arts and humanities A comprehensive analysis requir
72. n compared to matching the term breast am maj ma SCHOLARLY DATABASE SCHOLARLY DATABASE ma e Emm per cud Figure 6 Scholarly Database home and search interface Results are displayed in sets of 20 records ordered by a Solr intemal matching score The first column represents the record source the second the creators third comes the year then title and finally the matching score Datasets can be downloaded as a dump for future analysis EET k o MMESEERE S fm ITT TT am e SCHOLARLY DATABASE e SCHOLARLY DATABASE Figure 7 Scholarly Database search results and download interfaces To run burst detection see section 3 2 over Medline abstracts simply download matching Medline records load medline medline master csy to NWB run Preprocessing gt Normalize Text with a space as New Separator and select abstract Then Run Analysis gt Textual gt Burst Detection with parameters Patera Burs Detect on to snos total ta a _ wire 20 fee o X LL NE Date coum pubikned yezr ze Detar my LLL imam ea zb lemo y TE caca and space as a text separator The result is a table that shows bursting words together with their length weight strength start and end of burst 3 Temporal Analysis Science evolves over time Attribute values of scholarly entities and their diverse aggregations
73. nature and society Natur 435 814 818 Pauly Daniel Konstantinos I Stergiou 2005 Equivalence of Results from two Citation Analyses Thomson ISIS Citation Indx and Google Scholars Service Ethics in Science and Environmental Politics 2005 33 35 Persson Olle 2008 Bibexcel Ume Sweden Umed University htip www umu solinforskiBibevcel accessed on 7 15 08 Porter M F 1980 An Algorithm for Suffix Stripping Program 14 3 130 137 htp Asrtarus ore martin PorterStemmeridef txt accessed on 9 23 08 Reichardt Jorg Stefan Bomholdt 2004 Detecting Fuzzy Community Structure in Complex Networks with a Potts Model Physical Review Letters 93 21 218701 is of LIS Faculty r Information Solton Gerard C S Yang 1973 On the Specification of Term Values in Automatic Indexing Journal of Documentation 29 351 372 Schvaneveldt R Ed 1990 Pathfinder Associative Networks Studies in Knowledge Organization Norwood NI Ablex Publishing accessed on Schvaneveldt R W F T Durso D W Dearholt 1985 Pathfinder Scaling with network structures MCCS 85 Scott J P 2000 Social Network Analysis A Handbook London Sage Publications Sick Jeremy Lie Quan Lee Andrew Lumsdaine 2002 The Boost Graph Library User Guid and Reference Manual New York Addison Wesley Skupin Andr 2000 From Metaphor to Method Cartographic Perspectives on Information Visualization Proceedin
74. nd run Scientometrics gt Detect Duplicate Nodes using the same parameters as above and a merge table for references will be created To merge identified duplicate nodes select the merge table and the co authorship network holding down the Cl koy Run Scientometrics gt Update Network by Merging Nodes This will produce an updated network as well as a report describing which nodes were merged The updated co author network can be visualized using Visualization gt GUESS see the above explanation on GUESS Figure 4 shows a layout of the combined FourNerSciResearchers dataset afer setting the background color to white and using the command lines gt 1 na 4 define a function for comparing nodes En empal numberofuorks n2 humberofuorka 3 Sodesbynumuorks sort bynuzvorka A sore di gt nodesbynumvarks reverse 4 reverse sorting Liat starts with highest 4 2 fori in range 50 4 make labelo of most productive authors visible whmiorke i labelvisibie true Alternatively run GUESS File gt Run Script and select yournwbdirectory sampledatu scientometrics isi aulhor mw py That is author nodes are color and size coded by the number of papers per author Edges are color and thickness coded by the number of times two authors wrote a paper together The remaining commands identify the top 50 authors with the most papers and make their name labels visible
75. oints quality reliability or certainty and strength Network properties refer to the number of nodes and edges network density average path length clustering coefficient and distributions from Which general properties such as small world scale free or hierarchical can be derived Identifying major communities via community detection algorithms and calculating the backbone of a network via Pathfinder network scaling or maximum flow algorithms helps to communicate and make sense of large scale networks 6 1 Direct Linkages 6 1 1 Paper Paper Citation Network Papers cite other papers via references forming an unweighted directed paper citation graph It is beneficial to indicate the direction of information flow in order of publication via arrows References enable a search of the citation graph backwards in time Citations to a paper support the forward traversal of the graph Citing and being cited can be seen as roles a paper possesses Nicolaisen 2007 In the NWB Tool load the file yourmwbdirectory sampledata scientometricslisi FourNetSciResearchers isi using File gt Load and Clean ISI File A table of the records and a table of all records with unique ISI ids will appear in the Data Manager In this file cach original record now has a Cite Me AS attribute that is constructed from the first author PY J9 VL BP fields of its ISI record and will be used when matching paper and reference records To ext
76. otal award amount The top 50 nodes with the highest total award amount are labeled 24 Scholarly Database Medline U S patent as well as funding data provided by the National Science Foundation and the National Institutes of Health can be downloaded from the Scholarly Database SDB at Indiana University SDB supports keyword based cross search of the different data types and data can be downloaded in bulk Register to get a free account or use Email nwh indiana edu and Password nw to try out functionality Search the four databases separately or in combination for Creators authors inventors investigators or terms occurring in Title Abstract or All Text for all or specific years If multiple terms are entered in a field they are automatically combined using OR So breast cancer matches any record with breast or cancer in that field You can put AND between terms to combine with AND Thus breast AND cancer would only match records that contain both terms Double quotation can be used to match compound terms e g breast cancer retrieves records with the phrase breast cancer and not records where breast and cancer are both present but not the exact phrase The importance of a particular term in a query can be increased by putting a and a number after the term For instance breast cancer L0 would increase the importance of matching the term cancer by te
77. ou work new files will be created in the Data Manager window You may choose to save these files in various formats and you will need to make sure that you have highlighted the network fle that you wish to work in 2 Basic Network Properties Asa matter of practice you may want to earn a little about your network and confirm that it was read correctly into NWB The Graph and Network Analysis Toolkit provides a quick overview of your network Analysis gt Network Analysis Toolkit If you run this on the PSYCHCONSULT network you will find that you have a directed network with 113 nodes and no isolated nodes There are two node attributes present the node label which is in this case the job title of all of the employees and the area in this network the unit in which the employee generally worked You can also see that you have 861 edges no self loops or parallel edges and no edge attributes A common edge attribute is weight value but this network is unweighted The network is weakly connected each node is connected to another node in the main component with no isolates IL is not strongly connected however as some nodes are unreachable they send but do not receive ties You will also sce the network s density reported Network Analysis Toolkit NAT was selected Inplenenter a Timothy K Integratorlaj Timothy Ke Reference Robert Sedgewick Algorithms in Java Th Algorithms Addison Wesley 2002 ISBN 0 201 31863
78. pers outside this se and citations green ones go to papers outside this set as well as some commonly derived networks The extraction and analysis of these and other scholarly networks is explained subsequently Papo AE visa by aton toter ys Paper Paper Citation Network Aste ich paper happens 10 hame eines pe cre ode catis tats E B Amcwcpeaisbmernsr un B Mp x LAJ cider Popes to pounger papo tc AotiorAnthor Co Author Network Bano gemar paper A and E tether and z aca papers A and E and Bare cn cial by Cani D Aas D ate costal ty E i Reference Co Oceumence Bisingraphic Coupling Nenvork Cani at biographical coupled i hey KO er crn hand BD 2000 7 2001 Los station counts an the datei are garan m black and poka tation count IS tine cel regen m green above each paper Figure 1 Sample paper network left and four different network types derived from it right Diverse algorithms exist to calculate specific node edge and network properties see BOmer Sanyal et al 2007 Node properties comprise degree centrality betweenness centrality or hub and authority scores Edge properties included are durability reciprocity intensity weak or strong density how many potential edges in a network actually exist reachability how many steps it takes to go from one end of a network to the other centrality whether a network has a center p
79. pse eclipse equinox de 1 0 0 20070226 jar fequinox event_1 0 100 v20070S16 jar equinox launches 1 0 0 V2O070606 Jar equinox Launches gtk2inux x86_1 0 0 v20070606 equinox log 1 0 100 020010226 Jaz equinox metatypo 1 0 0 v20070226 jar equinox preferences 3 2 101 R33x v20080117 jar equinox registry 3 3 1 R194 20070802 jar equinox useradni 1 0 0 v20070226 jar equinox util 1 0 0 200803111100 jar help 3 3 0 20070526 jar tace 3 3 0 120070606 0010 jar tace databinding l 1 0 120070606 0020 jar 531 3 3 0 020070530 jar asal services 3 1 200 v20070605 jar asl veil 3 1 200 v20070805 jaz Sue cat linux xe 3 3 0 v3346 jar 3i 3 3 0 120070614 0800 jar ui Workbench 3 3 0 120010508 1100 jar update contigaator 3 2 100 v20070815 jar Update tore 3 2 100 v20070615 jar Update core linux 3 2 0 V20070815 jar update ui 3 2 100 420070615 jar 2 ClShell Plugins edu iu nub gul brand 1 0 0 jar E ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES ES framework aduer reference gui datamanaqer 1 0 0 jar Feference gui guinuilder Zwi 1 0 0 jar zeterence qui 1og 1 0 0 jer reference Qui menumanaqer 1 0 0 jar reference gui persistence 1 0 0 jar Feference gui prets swt 0 0 1 jar feterence gui scheduler 10 01 jar reference gui workspace 1 0 0 Eekorence preto sais id teet o testing convertertester algorithm 0 0 1 jar testing convertertester core new D 0 1 jar testa Progress
80. ract the paper citation network select the 36 Unique ISI Records table and run Scientometries gt Extract Directed Network using the parameters p E NNO De a Ma 3 Mmm feres zi mam T EZ E TT C a cum The result is a directed network of paper citations in the Data Manager Each paper node has two citation counts The local citation count LCC indicates how often a paper was cited by papers in the set The global citation count GCC equals the times cited TC value in the original ISI file Paper references have no GCC value except for references that are also ISI records Currently the NWB Tool sets the GCC of references to 1 except for references that are not also ISI records This is useful to prune the network to contain only the original ISI records To view the complete network select the network and run Visualization gt GUESS and wait until the network is visible and centered Layout the network e g using the Generalized Expectation Maximization GEM algorithm using GUESS Layout gt GEM Pack the network via GUESS Layout gt Bin Pack To change the background color use GUESS Display gt Background Color To size and color code nodes select the Interpreter tab at the bottom left hand comer of the GUESS window and enter the command lines globalettat toncount 1 50 Note The Interpreter tab will have 555 as a prompt for the
81. raphs is also called a simple undirected connected acyclic graph or equivalently a connected forest A tree with n nodes has 1 graph edges All trees are bipartite graphs Many trees have a root node and are called rooted trees Trees without a root node are called free trees Subsequently we will only consider rooted tres In rooted trees all nodes except the root node have exactly one parent node Nodes which have no children are called leaf nodes All other nodes are referred to as intermediate nodes This section introduces different algorithms to visualize tree data using tree views tree maps radial tree graph and balloon graph layouts 4A Tree View Visualization The tree view layout places the root node on the left of the canvas First level nodes are placed on an imaginary vertical line to the right of it Second level nodes are placed on an imaginary vertical line left of the first level nodes eu In the NWB tool select a tree dataset e g generated using the Directory Hierarchy Reader in the Data Manager window then use Visualization gt Tree View prefuse beta and a window similar to Figure 6 will appear displaying the Tree View visualization If you press and hold the right or middle buton of the mouse while moving it back and forth you can zoom in and out on the Tree By clicking a folder name such as sompledata all sub folders and files inside the sampledata folder will display Use the search box in th
82. re N Geospatial Maps y CiteSpace II colors nodes and edges by time e g In the co citation network papers by citation year making it easy to observe the growth of a network over time see the detailed explanation in Figure 6 le time slice LW 1980 SCIENCE V208 P1095 Times cited Year of first co citation Year of publication Figure 6 Sketch of the figure produced by Citespace II CiteSpace I highlights high betweenness centrality BC nodes also called pivotal points by purple circle borders This can be replicated in NWB Tool by coloring all nodes above a certain BC threshold value However Citespace II can also show the nodes with red colors whose citation has a sharp increase in certain time slice From Figure 15 we can see some remarkable nodes which colored with purple ring and red color Take BARABASI AL 1999 SCIENCE V286 P5909 for example it is the node which has purple ring and red color Using the function of Citation History Citespace II provides we can see the typical characteristic of this node see Figure 7 It shows that the node not only has high betweenness but also has the sharp increase on citation in certain time slices p 5 p 188 1880 1997 1994 1885 15908 3000 2002 300i 2006 Figure 7 Citation History of Node BARABASI AL 1999 SCIENCE V286 P5909 In the given example co authorship networks generated using NWB Tool appear to be easier to read while CiteSpace II rend
83. rees such controlled vocabularies taxonomies or classification hierarchies can be displayed Users can focus on particular parts of those trees without losing context In the NWB tool select a tree dataset e g generated using the Directory Hierarchy Reader first three levels directories only Run Visualization gt Radial Tree Graph prefuse alpha A window similar to Figure 4 will appear displaying the Radial Tree Graph visualization Double click on a node to focus on it and observe the change of the layout Hovering over a node e g the nwb root directory colors it red and al its neighbors blue As in all other Prefuse layouts hold down left mouse button to pan and right button to pan neu Figure 4 Radial Tree Graph visualization of first three levels of nwb directories 45 Radial Tree Graph with Annotation Highlight the same tree dataset and select Visualization gt Radial Tree Graph prefuse beta to set data mapping parameters such as node size node color node shape ring color edge size and edge color A legend will be generated automatically 5 Graph Visualizations Most visualization plug ins provided in the NWB Tool are designed to layout graphs Examples are the e JUNG based Circular layout Kamada Kawai Fruchterman Rheingold and interactive Spring layout e prefuse based Specified Radial Tree Graph with or without annotation Force Directed with
84. ringLength to spread out nodes sur me me Sirozzi V E gudai e l Medici A paui gm a Tornabuoni grinder o Pn NI Albizzi p o es Figure 7 Force Directed with Annotation Prefuse layout 5 3 GUESS Visualizations Here we use the sample dataset of yourmwbdirectory sampledata networks florentine vb to demo how to visualize a network in GUESS Feel free to compute an additional node attribute Betweenness Centrality before you visualize the network Simply run Analysis gt Unweighted amp Undirected gt Node Betweenness Centrality With default parameters Select the network to be visualized and run Visualization gt GUESS to open GUESS with the file loaded It might take some time for the network to load The initial layout will be random Wait until the random layout is completed and the network is centered GUESS has three windows called l Information window to examine node and edge attributes see Figure 8 left 2 Visualization window to view and manipulate network see Figure 8 right X Interpreter Graph Modifier Window to analyze change network properties below the Visualization window in Figure 8 Figure 8 GUESS Information window Visualization window and Interpreter Graph Modifier window LI Network Layout and Interaction GUESS provides different layout algorithms under menu item Layout Apply Layout gt GEM to the Florentine network Use
85. rk Analysis New York Cambridge University Press Chen Chaomei 1999 Visualizing semantic spaces and author co Processing und Management 35 3 401 420 Chen Chaomei 2006 CiteSpace II Detecting and Visualizing Emerging Trends and Transient pattems in scientific Literature JASIST 54 5 359 377 Cisco Systems Inc 2004 Network Analysis Toolkit p www cisco com univered ce tdidoe productnatkivindex hum accessed on 7 15 08 Csirdi G bor Tam s Nepusz 2006 The ieraph software package for complex nenwork research tip necsi org evenisfccstipapers el602a3c126ba822d0bc429337 e pdi accessed on 7 17 08 Cyberintfastructure for Network Science Center 2008 Cyberinfrastructure Shel http cishell org accessed on 7117 08 Cytoscape Consortium 2008 Cyroscape hup wvw cytoscape orp index php accessed on 7 15 08 Davidson G S B N Wylie K W Boyack 2001 Cluster stability and the use of noise in interpretation of clustering Paper presented at the Proceedings of IEEE Information Visualization pp 23 30 accessed on de Moya Aneg n Felix Zaida Chinchilla Rodriquez Benjamin Vargas Quesada Elena Corera Alvarez Francisco Jos Munoz Fernindez Antonio Gonzilez Molina Victor Herrero Solanao 2007 Coverage Analysis of Scopus A Journal Metric Approach Scientometrics 7301 53 78 Deerwester S T Dumais G W Furnas T K Landauer R Harshman 1990 Indexing by Latent Semantic Analysis Jouma
86. rlapping cliques Palla Der nyi Farkas amp Viesek 2005 and Reichardt and Bornholdt 2004 which models community structure as identical equilibrium spin states Reichardt amp Bomholdt 2004 74 Large Network Layout The NWB Tool supports the layout of networks with up to 10 million nodes via DrL formerly called VxOrd Davidson Wylie amp Boyack 2001 For example it is possible to select a co citation network and run Visualization gt DrL VxOrd with parameters The result is a Laid Out Network file that contains x y positions for all nodes The file can be visualized using Visualization gt Pre defined Positions prefuse beta with parameters or using GUESS Note that only node positions are generated Color and size coding of nodes and edges has to be done in a separate step 9 Comparison with Other Tools 9 1 General Comparison The NWB Tool has a number of unique features It is open source highly flexible easily extendable and scalable to very large networks Table 1 provides an overview of existing tools used in scientometrics research see also Bom Herr I amp Fekete 2007 Fekete amp Borner 2004 The tools are sorted by the date of their creation Domain refers to the field in which they were originally developed such as social science SocSci scientometrics Scientom biology Bio geography Geo and computer science CS Coverage aims to capture the general functionality and types of a
87. s Figure 14 Co authorship network with the data format from Scopus liae N w hes Pgs Figure 15 Co authorship network with converted data format 9 5 5 Other Functionality Except for the programs above Leydesdorff s software also includes AccZISLexe for the reverse route of turning databases exported from MS Access into the tagged format of the Web of Science IntColl exe for the analysis and visualization of international collaboration InstColLexe for the analysis and visualization of institutional collaboration GScholar exe for the organization of Google Scholar files into files for relational database management MS Access dBase and Goople exe for the organization of Google files into fles for relational database management MS Access dBase Simultaneously some of them are available for Chinese Korean and Dutch data NWB does not currently possess these functionalities NWB however can extract directed networks paper citation networks author paper networks documentation co citation networks detect bursts analyze time slices and so on Acknowledgements Network Workbench investigators are Dr Katy B rner Dr Albert L szl Barab si Dr Santiago Schnell Dr Alessandro Vespignani Dr Stanley Wasserman and Dr Eric A Wemert The NWB project benefits from input by advisory board members Craig Alan Stewart Noshir Contractor James Hendler Jason Leigh Neo Martinez Michael Macy Ulr
88. s 241 nodes and 1 508 edges in 12 weakly connected components This network can be visualized in GUESS see Figure x and the above explanation Nodes and edges can be color and size coded and the top 20 most cited papers can be labeled by entering the following lines in the GUESS Interpreter gt resizeLinear giobaleitationcount 2 40 gt Colorize a unt 200 200 200 9 0 0 gt resize ET gt colorize w 3 65 255 black gt def bytelni nl en cmplnl globelcitationcount n2 globaleltationcount for i in range o 20 topte i labelvisibie true Alternatively run GUESS File gt Run Script and select yournwbdirectory sampledata isiireforence co occurence nw py Figure 6 Reference Co occurrence Network Layout for FourNetSciResearchers dataset 63 Co Citation Linkages Two scholarly records are said to be co cited if they jointly appear in the list of references of a third paper see Figure 4 The more often two units are co cited the higher their presumed similarity 6 3 1 Document Co Citation Network DCA DCA was simultaneously and independently introduced by Small and Marshakova in 1973 Marshakova 1973 Small 1973 Small amp Greenlee 1986 It is the logical opposite of bibliographic coupling The co citation frequency equals the number of times two papers are cited together i e they appear together in one reference list see Figure 7 In the NWB Tool select
89. se commands It is not necessary to type 5 at the beginning of the line You should type each line individually and hit enter to submit the commands to the Interpreter For more information refer to the GUESS tutorial This way nodes are linearly size and color coded by their GCC and edges are green as shown in Figure 2 left Any field within the network can be substituted to code the nodes To view the available fields open the Information Window Display gt Information Window and mouse over a node as in Figure 2 right Also note that each ISI paper record in the network has a dandelion shaped set of references The GUESS interface supports pan and zoom node selection and details on demand see GUESS tutorial For example the node that connects the Barab si Vespignani network in the upper left to Garfield s network in the lower left is Price 1986 Little Science Big Science The network on the right is centered on Wasserman s works Figure 2 Directed unweighted paper paper citation network for FourNetSciRescarchers dataset with all papers and references in the GUESS user interface left and a pruned paper paper citation network after removing all references and isolates right The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of 1 Simply run Preprocessing gt Extract Nodes Above or Below Value with parameter values
90. t and to determine if the trend is significant 3 1 Charting Trends Documentation coming soon 3 2 Burst Detection A scholarly dataset can be understood as a discrete time series in other words a sequence of events observations which are ordered in one dimension time Observations here papers exist for regularly spaced intervals e g each month volume or year Kleinberg s burst detection algorithm Kleinberg 2002 identifies sudden increases in the usage frequency of words These words may connect to author names journal names country names references ISI keywords or terms used in ttle and or abstract of a paper Rather than using plain frequencies of the occurrences of words the algorithm employs a probabilistic automaton whose states correspond to the frequencies of individual words State transitions correspond to points in time around which the frequency of the word changes significantly The algorithm generates a ranked list of the word bursts in the document stream together with the intervals of time in which they occurred This can serve as a means of identifying topics terms or concepts important to the events being studied that increased in usage were more active for a period of time and then faded away In the NWB Tool the algorithm can be found under Analysis gt Textual gt Burst Detection As the algorithm itself is case sensitive care must be taken if the user desires KOREA and korea and
91. t count until one or more sinks nodes with no out links are reached In Hummon and Doreian there is almost always a single main path that starts at the source having the highest count outwards on the first step rarely there might be a tie in which case there could be multiple main paths but later research prefers the network of all main paths which is more analogous to pathfinder network scaling Verspagen 2007 73 Community Detection Diverse algorithms exist to identify sub networks or communities in large scale networks The simplest method is to extract individual connected components which are any maximal set of nodes where every node can be reached from every other node This Weak Component clustering is a useful technique from graph theory because network algorithms generally work independently on each component as no edges exist between components In the NWB Tool running Analysis gt Unweighted amp Undirected gt Weak Component Clustering on the FourNetSciResearchers co authorship network results in three unconnected components that can be visualized separately see Garfield s co authorship network in Figure 2 and the complete co authorship network in Figure 3 In weighted networks e g co occurrence or co citation networks see section Co occurrence Linkages thresholds can be applied e g all edges below a certain weight can be omitted leading to a less dense possibly unconnected network As an examp
92. ted Table 1 shows the result for merging Albet R and Albert R where Albet R will be deleted yet all of the nodes linkages and citation counts will be added to Albert R Table Merging of author nodes using the merge table A merge table can be automatically generated by applying the Jaro distance metric Jaro 1989 1995 available in the open source Similarity Measure Library htip sourceforge net projects simmetries to identify potential duplicates In the NWB Tool simply select the co author network and run Scientometrics gt Detect Duplicate Nodes using the parameters The result is a merge table that has the very same format as Table 1 together with two textual log files The log files describe which nodes will be merged or not merged in a more human readable form Specifically the first log file provides information on which nodes will be merged right click and select view to examine the file while the Second log file lists nodes which will not be merged but are similar Based on this information the automatically generated merge table can be further modified as needed In sum unification of author names can be done manually or automatically independently or in conjunction It is recommended to ereate the initial merge table automatically and to fine tune it as needed Note that the same procedure can be used to identify duplicate references simply select a paper citation network a
93. ure 5 Figure 5 Undirected weighted word co occurrence network for FourNetSciResearchers dataset 6 2 3 Cited Reference Co Occurrence Bibliographie Coupling Network Papers patents or other scholarly records that share common references are said to be coupled bibliographically Kessler 1963 see Figure 4 The bibliographic coupling BC strength of two scholarly papers can be calculated by counting the number of times that they reference the same third work in their bibliographies The coupling strength is assumed to reflect topic similarity Co occurence networks are undirected and weighted In NWB Tool a bibliographic coupling network is derived from a directed paper citation network see section Paper Paper Citation Networks Select the paper citation network of the FourNetSciResearchers dataset in the Data Manager Run Scientometries gt Extract Reference Co Occurrence Bibliographic Coupling Network and the bibliographic coupling network becomes available in the Data Manager Running Analysis gt Network Analysis Toolkit NAT reveals that the network has 5 335 nodes 5 007 of which are isolate nodes and 6 206 edges Edges with low weights can be eliminated by running Preprocessing gt Extract Edges Above or Below Value with parameter values eaten ihanat bran eS xxx Toon v Maeenue po y mage Isolate nodes can be removed running Preprocessing gt Delete Isolates The resulting network ha
94. ve of many dara formats HE 39 scenam ATO Transforms Bibliographie daia Graphical No Windows Person into forms usable in Excel 2008 Pajek NetDraw and other programs Tubo or Scenom Dur Tarve aaay aa Webiesed Na Windows Tamo Perish Collection from Google Scholar Linux 2008 and Analysis focusing on measures of research impact Many of these tools are very specialized and capable For instance BibExcel and Publish or Perish are great tools for bibliometric data acquisition and analysis HistCite and CiteSpace each support very specific insight needs from studying the history of science to the identification of scientific research frontiers The S amp T Dynamics Toolbox provides many algorithms commonly used in scientometrics research and it provided bridges to more general tools Pajek and UCINET are very versatile powerful network analysis tools that are widely used in social network analysis Cytoscape is excellent for working with biological data and visualizing networks The Network Workbench Tool has fewer analysis algorithms than Pajek and UCINET and less flexible visualizations than Cytoscape Network Workbench however makes it much easier for researchers and algorithm authors to integrate new and existing algorithms and tools that take in diverse data formats The OSGi component architecture and CIShell algorithm architecture built on top of OSGi make this possible
95. ve the file then open using the spreadsheet program of your choice The table has 6 columns The first column lists bursting words here author names the length of the burst the burst weight burst strength together with the burst start and end year Note that words can burst multiple times If they do then the burst weight indicates how much support there is for the burst above the previous bursting level while strength indicates how much support there is for the burst over the non bursting baseline Since the burst detection algorithm was run with bursting state 1 i e modeled only one burst per word the burst weight is identical to the burst strength in this output To generate a visual depiction of the bursts in MS Excel perform the following steps 1 Sort the data ascending by burst start year 2 Add column headers for all years i e enter first start year in GIH until highest burst end year here 2004 in cell AFI 3 In the resulting word by burst year matrix select the upper left blank cell G2 and using Formar gt Conditional Formatting color code the cells according to burst value To color cells for years with a burst total power value of more or equal 10 red and cells with a higher value dark red use the following formulas and format patterns here 1980 Continue e g using formulae ann AND G51 631 7572 F2 SD25 10 Apply the format to all cells in the word by year matrix Th
96. veal shared schools of thought or methodological approach common subjects of study collaborative and student mentor relationships ties of nationality etc Some regions of scholarship are densely crowded and interactive Others are isolated and nearly vacant The NWB Tool will support this functionality in the near future 7 Analyzing and Visualizing Large Networks Most network analysis and visualization algorithms do not scale to millions of nodes Even ifa layout is produced it is often hard to interpret Therefore it is beneficial to employ algorithms that help identify the backbone i e the major connections in a network or identify communities 7 1 Basic Network Properties The Analysis gt Network Analysis Toolkit NAT Cisco Systems 2004 can be applied to any size network to compute basic network properties see sample output for FourNetSciResearchers dataset in Figure 1 Did not detect any nonnumerie attributes Density disregarding weights 0 02909 ligure 1 Network Analysis Toolkit NAT result for FourNetSciResearchers dataset This is especially important if networks are large as the network properties suggest certain data reduction approaches For example ifa network is unconnected it might be beneficial to layout components separately identify existing components using Analysis gt Unweighted amp Undirected gt Connected Components Ifa network is very dense

NWB manual - Network Workbench

Contents

Download Pdf Manuals

Related Search

Related Contents