Home
Science of Science (Sci2) Tool User Manual
Contents
1. M l i i i i i SCIENTISTS p i i i i PUBLICATIONS amp nne nVVLSLLRRLSL R I I I 1 i I 1 Ay sinis pmR M M RM M a h 4 UNIVERSITIES I I 1 l DYNAMICS LLL i lj I I 1 i l INDICATORS k I 1 1 1 l IUE ALD m j i i IMPACT FACTOR jp 5 5 5 5 gu EL j i KEXRAKCH PERPORMANCE C i i i i gp EY I I I 1 I Seer a i i ARTICIES belli 1 I I BOON OLS i i i i i i WEOVATION i i i 1 i ie erras l l J l UNIVERSITY l l I I i IMPORMATIOM LL E E E og o E o Jj I I i TECHNOLOGY i eee I I I 1 1 go pp di X a ss COMMUNICATION pur EES SSS Ss ATION ANALYSE j IMPACT FACTORS l l l COUNTRIES l I I L COLLABORATION r a I I I EESFARCH PERFORMANCE l NEWEST VERIION l PHYSICS i i PUBLICATION OUTPUT i i i i I PACTS 1 r I l I I RELATIVE CITATION IMPACT i j j i i i FIGURES T I I I I 1990 19r 1992 1993 ora 1595 Lee I pug Hg 2000 DOL 2007 2M 2004 2005 2004 2007 2006 2008 2030 1 I I 1 I i 1 I 1 I 1 1 I 1 I I I I I I I I I I I I l l l l i l l l l l L i l l l I l l i J I Li 1 D I i I Li Li Li i L i I i I DL i I I Li Li 1 1 I 1 I I 1 I I 1 i i 1 i I 1 I 1 Tahan J H Ve Piz I j 1 I 1 I i 1 Man 1 2005 CITATION ANAL RIS IV LLL l I l I l l l l I l I Waren 5 19534 SCXTTAT Cs i i I I 1
2. Interpreter panel supports Jython a version of Python that runs on the Java Virtual Machine Users can write code in the interpreter to modify the layout and its design at a high level of detail Here we list some exemplary GUESS commands which can be used to modify the layout Color all nodes uniformly g nodes color red Circle filling g nodes strokecolor red circle ring g nodes labelcolor red circle label colorize numberofworks gray black for n in g nodes n strokecolor n color Size code nodes Label g nodes size 30 resizeLinear numberofworks 25 8 for i in range 0 50 make labels of most productive authors visible nodesbynumworks il labelvisible true Science of Science Sci Tool User Manual Version Alpha3 39 Print for i in range 0 10 print str nodesbydegree i label str nodesbydegree i indegree Edges g edges width 10 g edges color gray Color and resize nodes based on their betweenness colorize wealth white red resizeLinear sitebetweenness 5 20 The result is shown in Figure 4 14 Read https nwb slis indiana edu community n VisualizeData GUESS on more information on how to use the interpreter Visualization GUESS Cf x File Edit Layout Script View Help Acciaiuoli Field Value FF color 255 24 fixed false i height 5 6635 label Acciaiu labelcolor 0 0 0 labelsize 12 Peruzzi labelvi true ERN S n
3. M Text Delimiter M Aggregate Function File C Documents and Settings katy Desktop nwb sampledatalscientometrics properties isiPaper Citation properties M DK Cancel The result is a directed network of paper citations in the Data Manager Each paper node has two citation counts The local citation count LCC indicates how often a paper was cited by papers in the set The global citation count GCC equals the times cited TC value in the original ISI file Paper references have no GCC value except for references that are also ISI records Currently the Sci Tool sets the GCC of references to 1 except for references that are not also ISI records This is useful to prune the network to contain only the original ISI records To view the complete network select the network and run Visualization gt Networks gt GUESS and wait until the network is visible and centered Layout the network e g using the Generalized Expectation Maximization GEM algorithm using GUESS Layout gt GEM Pack the network via GUESS Layout gt Bin Pack To change the background color use GUESS Display gt Background Color To size and color code nodes select the Interpreter tab at the bottom left hand corner of the GUESS window and enter the command lines gt resizeLinear globalcitationcount 1 50 gt colorize globalcitationcount gray black gt for e in g edges e color 127 193 65 255 enter a tab afte
4. Preprocessing Analysis and Visualization Similarly a geographic map requires only Geospatial algorithms Find detailed information on each menu item in section 3 1 Sci2 Tool Plugins 2 2 1 4 Analysis Once data is loaded prepared and processed with whatever features needed analysis is possible in each of the four domains temporal geospatial topical or network Analysis results can be used on their own or in conjunction with visualizations to gain insight into a dataset The Sci Tool features predominantly network analysis algorithms however the tool also supports geocoding of table data and burst analysis for topical or temporal studies see section 4 Workflow Design Find detailed information on each menu item in section 3 1 Sci2 Tool Plugins 2 2 1 5 Modeling The Sci Tool supports the creation of new networks via pre defined models Learn more about modeling in section 4 10 Modeling Why 2 2 1 6 Visualization Once all previous data steps are complete the Sci Tool can visualize the results The most popular choice for visualizing networks is the GUESS toolkit or DrL for much larger scale networks Geocoded data can be represented on a map of the United States or a map of the world and temporal or topical data can be viewed using the horizontal bar graph Find detailed information on each menu item in section 3 1 Sci2 Tool Plugins 2 2 1 7 Help The Help menu leads to online documentation ad
5. Science of Science Sci Tool User Manual Version Alpha3 104 7 Extending the Sci Tool 7 1 ClShell Basics The ClShell Platform has been specifically design around the idea of the algorithm It is the central and most important concept Algorithms are fully defined and self contained bits of execution They can do many things data conversion data analysis and even spawn whole external programs if needed Algorithms are well defined black boxes which can contain either Java code or any program which can be compiled Creating new algorithms is the primary method to extend ClShell tool s functionality or creating new ClShell based tools ClShell is based on OSGi which is a plugin and service based framework Practically this means that OSGi functionality is divided into plugins or bundles Java jar files with some additional special files each of which contains code to create some number of services at runtime These services are the main actors in the OSGi environment In ClShell almost all services are algorithms which means they conform to a certain interface allowing algorithms to interoperate with the ClShell environment and each other in a well defined way Specifically algorithms accept Data user input parameters and a ClShellContext They output Datal Metadata key Value key Vvalue Da ta key value User entered parameters Data ClShell Context Figure 8 1 Input data paramters metadataof dataset and alg
6. 6216352 Marlon Pierce 25073 3 Geoffrey Fox 5 08 70 9 ao 6216352 171474 C Craig Stewart Wheeler Edge Size amp Color Q David McCaulay Q Beth Plale 2 aio Yogesh Simmhan 1 9 ME David Leake 2 1 s Matthew Shepherd C Eric Werner OC Thomas Hacker Figure 5 16 Largest component of Indiana University co PI network Node size and color display the total award amount 5 2 2 Funding Profiles of Three Universities NSF Data Using Database The Sci Tool supports the creation of databases for NSF files Database loading improves the speed and functionality of data preparation and preprocessing To use this feature select the Indiana NSF file in the Data Manager and go to File Load Into Database Load NSF File Into Database Cleaning should be performed before any other task using Data Preparation Database NSF Merge Identical NSF People To view a breakdown of each investigator from Indiana run Data Preparation Database NSF Extract Investigators and then right click on the table in the Data Manager to view it Next to each investigator will be listed their total number of awards total as the PI and as a Co PI the total amount awarded to date and their earliest award start date and latest award expiration date Science of Science Sci Tool User Manual Version Alpha3 68 To create Co PI networks like those from the previous workflows simply run Data Preparation gt
7. Taverna allows users to integrate many different software tools including over 30 000 web services from many different domains such as chemistry music and social sciences The myExperiment http www myexperiment org social web site supports finding and sharing of workflows and has special support for Taverna workflows De Roure Goble et al 2009 Currently Taverna uses Raven at its core but a reimplementation using OSGi is underway MAEviz https wiki ncsa uiuc edu display MAE Home managed by Shawn Hampton NCSA is an open source extensible software platform which supports seismic risk assessment based on the Mid America Earthquake MAE Center research in the Consequence Based Risk Management CRM framework Elnashai Spencer et al 2008 It uses the Eclipse Rich Client Platform RCP that includes Equinox a component framework based on the OSGi standard The 125 MAEvis plugins consist of 6 core plugins 7 plugins related to the display of hazard building and bridges and lifeline data 11 network and social science plugins and 2 report visualization plugins Bard previously NCSA GIS has 11 in core plugins 2 relevant for networks and 10 for visualization The analysis framework has 6 core plugins Ogrescript has 14 core plugins A total of 54 core Eclipse OSGI plugins are used such as org eclipse core org eclipse equinox org eclipse help org eclipse osgi org eclipse ui and org eclipse update https wiki ncsa uiuc edu
8. To use the database support load yoursci2directory sampledata scientometrics isi FourNetSciResearchers isi using File gt Load instead of File gt Load and Clean ISI File Now run File gt Load Into Database gt Load ISI File Into Database View the database schema by right clicking on the loaded database in the Data Manager and clicking View File Data Preparation Preprocessing Analysis Modeling Visualization Help E console rimary investigators are Katy Barner Indiana University nd Kevin W Boyack SciTech Strategies Inc The Sci ool was developed by Micah W Linnemeier Russell J A i Data Manager uhon Patrick 4 Phillips Chintan Tank and Joseph iberstine It uses the Cyberinfrastructure Shell http cishell org developed at the yberinfrastructure For Network Science Center http cns slis indiana edu at Indiana University any algorithm plugins were derived from the Network orkbench Tool http nwb slis indiana edu gt lisi FourNetSciResearchers isi PN si EaurietSciResearchers isi Save View With Rename Discard E scheduler Remove From List Remove completed automatically F Date 03 27 2010 03 27 2010 P Untitled Notepad File Edit Format View Help l ADDRESS PK INTEGER ADDRESS CITY VARCHAR ADDRESS COUNTF AUTHORS AUTHORS DOCUMENT FK INTEGER AUTHORS PERSON FK
9. 2006 Graphical Yes All Major Huang 2007 Network analysis amp visualization tool w ge SocSci conducive to new Science of Science Sci Tool User Manual Version Alpha 3 109 Scientom algorithms supportive of many data formats BibExcel 2006 Scientom Transforms bibliographic Graphical Windows Persson data into forms usable in 2008 Excel Pajek NetDraw and other programs Publish or 2007 Scientom Data Harvests and analyzes Web based No Windows Harzing Perish Collection data from Google Linux 2008 and Scholar focusing on Analysis measures of research impact Many of these tools are very specialized and capable For instance BibExcel and Publish or Perish are great tools for bibliometric data acquisition and analysis HistCite and CiteSpace each support very specific insight needs from studying the history of science to the identification of scientific research frontiers The S amp T Dynamics Toolbox provides many algorithms commonly used in scientometrics research and it provided bridges to more general tools Pajek and UCINET are very versatile powerful network analysis tools that are widely used in social network analysis Cytoscape is excellent for working with biological data and visualizing networks The Network Workbench Tool has fewer analysis algorithms than Pajek and UCINET and less flexible visualizations than Cytoscape Network Workbench however makes it much easier for researche
10. By Katy Borner Luca Dall Asta Weimo Ke amp Alessandro Vespignani This article introduces a suite of approaches and measures to study the impact of co authorship teams based on the number of publications and their citations on a local and global scale In particular we present a novel weighted graph representation that encodes coupled author paper networks as a weighted co authorship graph This weighted graph representation is applied to a dataset that captures the emergence of a new field of science and comprises 614 articles published by 1036 unique authors between 1974 and 2004 To characterize the properties and evolution of this field we first use four different measures of centrality to identify the impact of authors A global statistical analysis is performed to characterize the distribution of paper production and paper citations and its correlation with the co authorship team size The size of co authorship clusters over time is examined Finally a novel local author centered measure based on entropy is applied to determine the global evolution of the field and the identification of the contribution of a single author s impact across all of its co authorship relations A visualization of the growth of the weighted co author network and the results obtained from the statistical analysis indicate a drift toward a more cooperative global collaboration process as the main drive in the production of scientific knowledge hne with coll
11. Databases gt NSF gt Extract Co PI Network on the cleaned database Delete the isolates by running Preprocessing gt Networks gt Delete Isolates As before to visualize the network select Visualization gt Networks gt GUESS and run Layout gt GEM followed by Layout gt Bin Pack Run the yoursci2directory scripts GUESS co PI nw database py script to apply the standard Co PI network theme 5 2 3 Mapping CTSA Centers NIH RePORTER Data CTSA2005 2009 xls Time frame 2005 2009 Region s Miscellaneous Topical Area s Clinical and Translational Science Analysis Type s PI Institution Network Co Authorship Network A study of all NIH Clinical and Translational Science Awards CTSA awards and resulting publications from 2005 2009 requires advanced data acquisition and manipulation to prepare the required data Data comes from the union of NIH REPORTER downloads see Section 4 2 2 2 NIH RePORTER and NIH ExPORTER data dumps http projectreporter nih gov exporter CTSA Center grants were identified first and then matched with resulting publications using a project specific ID The result file is available as an Excel file in yoursci2directory sampledata scientometrics nih The file contains two spreadsheets one with publication data and one with grant data Save each spreadsheet out as grants csv and publications csv First load grants csv in the Sci Tool using File gt Load as a
12. Here a search was exemplarily conducted for Katy Borner in the Principal Investigator field while keeping the Include CO PI box checked Science of Science Sci Tool User Manual Version Alpha3 45 The resulting data is available at yoursci2directory sampledata scientometrics nsf KatyBorner nsf Load the data using File gt Load select the loaded dataset in the Data Manager window and run Data Preparation gt Text Files gt Extract Co Occurrence Network using these parameters E Extract Network from Table i 7 xj Extracts a network From a delimited table Column Name fan Investigators P Text Delimiter Aqgregatian Function File C Documents and Settings gquohiDesktop scipolicy windows scipolicy sampledatalscientometrics properties insFCoPL properties Browse Cancel Select the Extracted Network on Column All Investigators network and run Analysis gt Networks gt Network Analysis Toolkit NAT to reveal that there are 13 nodes and 28 edges in the network without isolates Select Visualization gt Networks gt GUESS to visualize the resulting Co PI network Select GEM from the layout menu Load the default Co PI visualization theme via File gt Run Script and load yoursci2directory scripts GUESS co PI nw py Alternatively use the Graph Modifier to customize the visualization The resulting network in Figure 5 2 was modified using
13. Version Alpha3 35 4 9 1 1 2 Author Author Citation Network Authors cite other authors via document references forming a weighted directed author citation graph Like document document networks author citation networks represent the flow of information over time Unlike document citations however these networks have weighted edges representing the volume of citations from one author to the next 4 9 1 1 3 Source Source Citation Network For larger scale studies it is often useful to explore citation patterns between entire journals and other varieties of publications These networks can reveal both the relative importance of certain publications and the underlying connections between disciplines These networks are directed and weighted by volume of citations between journals 4 9 1 1 4 Author Paper Consumed Produced Network There are active and passive units of science Active units e g authors produce and consume passive units e g papers patents datasets software The resulting networks have multiple types of nodes e g authors and papers Directed edges indicate the flow of resources from sources to sinks e g from an author to a written produced paper to the author who reads consumes the paper 4 9 1 2 Co Occurrence Linkages 4 9 1 2 1 Author Co Occurrence Co Author Network Having the names of two authors or their institutions countries listed on one paper patent or grant is an empirical manifestation of
14. Y E Sizi S005 ntermatonal Journal i Bub EOD b Df Modern Phyris C Is ganajd nodal with synchronous updating on complex natvenrks Abstract Bets View at Publisher F ghor Lau ML E 16 75 pp Abe prey 1149 1161 13 A dynamical characterization of the Arab T 2003 Ayaks batters 3 Sechon d Genera Ae aed Sold State Physics 319 3 4 pp 285 203 small vrorid phase Mendes B V Abstract Refs Woo at Falslisfarr E Hur Rolle en Ab gts pet 14 Curvature of co links uncowers hidden Ez ub 2002 Procoodngr af the oy thematic layers In the World Wide Moses E Motion Academy oF Web Sewer of the Abstract ets Wo at Fualslisdiscr E He United States oF Abiti America 38 8 np 525 5829 15 F Harrriari iri tho snail yiri Marchion M 2000 Ayes A a2 Abstract En woe at Fulslisher ihe Latona V Siss isticat PLTITT Mechanics are m Applicanpons 285 3 pp 539 546 De Em output 1 CRation tracker 5 Add ta list la Gorndoad alg aferences E caesi Select Alle Page Biches Tap Exsplay 20 result amp per page 1 ta 15 Your query TITLE ABS KEV swatts stragatz clustering coofficiant Search History Ede Gave faseapAlam ERIS Alternativ 5pelkrig i zAB EEYI i ip Search ES a B x Find Bet d grec 7 rere Match case Figure 4 4 Scopus search interface At the output window select Comma separated file csv e g Excel and Complete format from the drop down menus a
15. cccccccccssssecccceessecceeeesecceseeesecceseeeeeesessegeeeeeteeees 65 5 2 2 Funding Profiles of Three Universities NSF Data Using Database ccccccccsssecccceeeseeeceeeeeeeceeseeeneses 68 52 3 Mapping CTSA Centers NIH REPORTER Dat a eus te ichs du oer DOR EIE SURE a Oe e Pv ei eR ud 69 5 2 4 Biomedical Funding Profile of NSF NSF Data cccccccsssscccscssssccssesssecscseeseccsseceeessssseusesssssensesesessenes 71 5 2 5 Mapping Scientometrics IS Data sisti tutta eov m oseu du iro use oa etica uncos eue ds coti int oam tt o DEOR ru Ne dE 73 5 2 6 Burst Detection in Scientometrics ISI Data cccccccccsssseecccccessecceceesecceeceesecceeeueseeesesseuseesessegneeeeeseees 74 5 2 7 Mapping the Field of RNAi Research SDB Data seeeeeeesssseeeeennnmenennenenn nnne nnne nennen 77 5 9 Glopal Level Studies MaCcrOs scmderesetitiedactdeed ai esie deum c pn ec tlie T Maior eua Ee ordo esee eu Du dere apdos 82 Bock Geo USPTO SDB Data s ediiiticizsdediesies etos einittalca t upeMicisue edis ci LED UM tan ott o DM LIE D Dep Lini eeuL dE 82 6 Sample Science Studies amp Online Services ecce eene eene DO 61 Sence Dynami CE EE TOI 86 6 1 1 Mapping Topics and Topic Bursts in PNAS 2004 nennen ener nnne eene 86 6 2 Eocal Inmpact Output ROLSEUGIOS 7 255 9 285 00r p IIR Cet deae et eot GA DO Pte eb octo DERE ERO o Pans 87 6 2 1 Indicator Assisted Evaluation and Fund
16. gt Networks gt Unweighted and Directed gt Node Indgree then select Network with indegree attribute added to node list in the data manager and run Analysis gt Unweighted and Directed gt Node Outdegree Run Visualization gt Networks gt GUESS followed by Layout gt GEM In graph modifier interface 1 Select Resize Linear gt Node gt outdegree gt From 1 gt To 30 gt Do Resize Linear 2 Select Object Nodes based on gt gt Property outdegree gt Operator gt gt Value 1 gt Colour gt gt Show Label 3 Select Object Nodes based on gt gt Property indegree gt Operator gt gt Value 1 gt Colour W The resulting bimodal network of NSF organizations and programs is given in Figure 5 20 TIP z Node Size by Outdegree NV Ww o 2 NI um ai A eEF 11 7 eHY il e NSF Organization Program s Figure 5 20 Bimodal network of NSF Organization and Program s that support Medical and Health projects Science of Science Sci Tool User Manual Version Alpha3 72 5 2 5 Mapping Scientometrics ISI Data Scientometrics isi Time frame 1978 2008 Region s Miscellaneous Topical Area s Scientometrics Analysis Type s Document Co Citation Network This study aims to increase our understanding of the Scientometrics a discipline which uses statistical and computational techniques to understand the structur
17. o Network Analysis o Principle Investigator o Co PI Name s o Organization o Temporal Analysis o Expiration Date o Start Date Science of Science Sci Tool User Manual Version Alpha3 29 O O Geospatial Analysis o Organization City o Organization State o Organization Street Address o Organization Zip Topical Analysis o Abstract o NSFOrganization o Title 4 2 2 2 NIH RePORTER Funding data provided by the National Institutes of Health NIH and associated publications and patents can be retrieved via the NIH RePORTER site http projectreporter nih gov reporter cfm The database draws from eRA MEDLINE PubMed Central NIH Intramural and iEdison Search by location PI name category etc see Figure 4 8 te c x fat Ex help prc mc nuca br nih gov repre cfr i DE k M i An vectes Me centies Sate s Latent Henle E gery Form MIH REPORTER MIH Bas at T ENT Health r1 Human Servicis RoPORT EXPEHMPDHTLI EPORTTI CATECLOHSCAL SPENDING REPORTER feo NIH A RePORTER Elicia Shir aniy projects mupporied by MIM Recovery Act D AgencyaeetRuDe Cener lal e E Admin Funding MN Recovery Act Projects Terns Search 7 Pandang Elschanisna All Logie And C i Award Type all ink Muliple bera ice isohplid Seperate sach tenn Atiy Code all with a space Vou may aoo ups berer in oublie Sts Tor exact Terres malin u Project Humber o Fis
18. org eclipse platform auncher XxMaxPermsize 256m vInarges Adack iconsz Resources nwb icnz Astart nFirstThread AnmsaBm Anxl55nm Dborg eclipse swt internal carbon smallFants Figure 3 3 Updated scipolicy ini file 3 4 Memory Limits The Sci Tool s database functionality greatly improves the volume of data which can be loaded into and analyzed by the tool Whereas most scientometric tools available as of March 2010 require powerful computing resources to perform large scale analyses each time a network needs to be extracted the pre loaded database runs network extraction algorithms quickly and allows the users to run custom queries This functionality has some front heavy memory requirements in order to initially load the data into the database the upper limits of which can be seen in Tables 3 1 and 3 2 Science of Science Sci Tool User Manual Version Alpha3 21 Table 3 1 The number of seconds to perform each action on a dataset of the given size on a computer with 12GB of memory Computer with 12GB of RAM UOHne3 5 09 JoUuINY 9405 uone115 0 JUSLUNDOG Pex UOHe1 5 0 1uauunooq eI UOHE3 soUuINY pexa 493NO YUM uone1r JUBLUNDOG 198433 sJOUINY OD 312e4x3 Sjuauunooq 19843 sJouiny pex U23elAN s euanof 98J49 A jdo d eSJ9 A peo salu 10 20 2 3300 50 500 5000 10000 20000 10 100 210 400 35 20 60 164 18 30 70 180 48 100 250 10 22 5 1
19. the background color to white and using the command lines gt resizeLinear numberofworks 1 50 gt colorize numberofworks gray black gt for n in g nodes PS n strokecolor n color border color same as its inside color gt resizeLinear numberofcoauthoredworks 25 8 gt colorize numberofcoauthoredworks 127 193 65 255 black gt nodesbynumworks g nodes make a copy of the list of all nodes gt def bynumworks n1 n2 define a function for comparing nodes Sut return cmp nl numberofworks n2 numberofworks nodesbynumworks sort bynumworks sort list gt nodesbynumworks reverse reverse sorting list starts with highest gt for i in range 0 50 make labels of most productive authors visible nodesbynumworks il labelvisible true Alternatively run GUESS File gt Run Script and select yoursci2directory scripts GUESS co author nw py That is author nodes are color and size coded by the number of papers per author Edges are color and thickness coded by the number of times two authors wrote a paper together The remaining commands identify the top 50 authors with the most papers and make their name labels visible GUESS supports the repositioning of selected nodes Multiple nodes can be selected by holding down the Shift key and dragging a box around specific nodes The final network can be saved via GUESS File gt Export Image and opened in a graphic design
20. weak or strong density how many potential edges in a network actually exist reachability how many steps it takes to go from one end of a network to another centrality whether a network has a center point quality reliability or certainty and strength Network properties refer to the number of nodes and edges network density average path length clustering coefficient and distributions from which general properties such as small world scale free or hierarchical can be derived Identifying major communities via community detection algorithms and calculating the backbone of a network via pathfinder network scaling or maximum flow algorithms helps to communicate and make sense of large scale networks 4 9 1 Network Extraction Networks are extracted using three types of linkages direct linkages between nodes of same or different types co occurrence linkages and co citation linkages as explained below 4 9 1 1 Direct Linkages 4 9 1 1 1 Document Document Citation Network Papers cite other papers via references forming an unweighted directed paper citation graph It is beneficial to indicate the direction of information flow in order of publication via arrows References enable a search of the citation graph backwards in time Citations to a paper support the forward traversal of the graph Citing and being cited can be seen as roles a paper possesses Nicolaisen 2007 Science of Science Sci Tool User Manual
21. 641 OSGi Alliance 2008 OSGi Alliance Retrieved 7 15 08 from http www osgi org Main HomePage Persson O 2008 Bibexcel Umea Sweden Umea University Shannon P A Markiel et al 2002 Cytoscape a software environment for integrated models of biomolecular interaction networks Genome Research 13 11 2498 2504 Siek J L Q Lee et al 2002 The Boost Graph Library User Guide and Reference Manual New York Addison Wesley Small H 1973 Co Citation in Scientific Literature A New Measure of the Relationship Between Publications JASIS 24 265 269 Small H G and E Greenlee 1986 Collagen Research in the 1970 s Scientometrics 10 95 117 Takatsuka M and M Gahegan 2002 GeoVISTA Studio A Codeless Visual Programming Environment for Geoscientific Data Analysis and Visualization The Journal of Computers amp Geosciences 28 10 1131 1144 Watts D J and S H Strogatz 1998 Collective dynamics of small world networks Nature 393 440 Wellman B H D White et al 2004 Does Citation Reflect Social Structure Longitudinal Evidence from the Globenet Interdisciplinary Research Group JASIST 55 111 126 Wikimedia Foundation I 2009 Poisson Distribution Wikipedia The Free Encyclopedia Retrieved 8 31 2009 from http en wikipedia org wiki Poisson distribution Science of Science Sci Tool User Manual Version Alpha3 112
22. 7056704 6 16 E ee iver 96 gt 1 x ncc os p e 097202564 e Citing papers Cited papers E KA Figure 5 28 USPTO Patent citation network on RNAi The SDB also outputs other tables that can be used in additional analyses An example is yoursci2directory sampledata scientometrics sdb RNAi Medline_master_table csv This table includes full records of MEDLINE papers and can be used to find bursting terms from MEDLINE abstracts dealing with RNAi Load the file as a standard csv and run Preprocessing Topical Normalize Text with the default separator and the abstract box checked Run Analysis Topical Burst Detection with date cr year in the Date Column and abstract in the Text Column leaving the rest of the values default Right click on Burst detection analysis date cr year abstract maximum burst level 1 in the Data Manager and view the file There are more words than can easily be viewed with the horizontal bar graph so sort the list by Strength and prune all but the strongest 10 words Save the file as a new csv and load it into the Sci Tool as a standard csv file Select the new table in the data manager and visualize it using Visualize Temporal Horizontal Bar Graph with the following parameters Science of Science Sci Tool User Manual Version Alpha3 81 Bl Horizontal Bar Graph Takes tabular data and generates PostScript Far a horizontal bar graph Label IU EE Start
23. AA Summanes and Table EXEFSCEIOFIS x casc ccuaceascesencnnconasesancacacanedausdiaeaednnonaiesausaedionendesdeudadeansonadedausaedoncnduadandceneaedes 33 4 5 Staustical Arialvsis PEOFIIIPIB aione aa Oa 33 4 6 Temporal Analysis When ccccccccccssssseccccccceeceessecccceeseueeseeccceceessauasseeeeceessaenseceeeeeeeeeeaaseeeeesseaganeeeeeeenses 33 46 1 BHISEDBLPSCEIOLIN 2 29 9 9994 129 5 00 25 2 0 00 40050 00000 bINNE Ib IP SEND E MM oaeiai NN ONU NdEREMN DS CAS dod 33 462 Slice Table Dy TIME RR 34 4 7 Geospatial Analysis Where ccccccccsssessssscccssecsssesssccssscsssssesssccssecessesssscsssecessesssssccesscessssssecssccessessssssenss 34 4 8 Topical Analysis What cassssncesaccrtsnndccnvchosuabedavanenecwud suadsaavecsodbe dave citis qu E POS ud dtu dk ED acus toic vU E oUm NM P Tad IAM CUIR dE E UE 34 48 1 Word Co Occurrence NetWork a eusccebskGascanssUPAEe tA ER us pP RMEUR UA Ku pa E AERA VAR An aai naa E taa lo ass Aaaa MUR UA Kaai 34 Science of Science Sci Tool User Manual Version Alpha3 2 49 Network Analysis With WHOM 5 eite eb bibe or a diee b etie b ei dia oh bed eo t ivre eh haley 35 291 NeDWOIK BXEtFaCHOD eo sede evade ttiv o buds Cul ep db eodem ac eo des EN 35 4 9 2 Compute Basic Network Characteristics rn e rdi ie ee EE a eerie vi dev rise a Ue ET EEUU ERE 37 79 3 NetWOFKAnalySlSuaetimide iioi iddsua c eed ta Pes EA eco ela fee do Iudei dallas Died QE T R 37 AOA
24. Award Search see Section 4 2 2 1 NSF Award Search Load the files GeoffreyFox nsf MichaelMcRobbie nsf and BethPlale nsf into the Sci Tool using File gt Load from yoursci2directory sampledata scientometrics nsf Once loaded run Visualization gt Temporal gt Horizontal Bar Graph for each file with the following parameters Bll Horizontal Bar Graph X Takes tabular data and generates PostScript For a horizontal bar graph Label ritie ko Start Date Start Date KJ End Date Expiration Date KA Size By Awarded Amount En Date I9 Date Format Day Month Year Date Format Europe e g 15 10 2010 Kk Cancel The resulting horizontal bar graph visualizations are given in Figure 5 5 The horizontal bar coding and labeling is as follows Area size equals numerical value e g award amount Text e g title Start date End date Note the different time spans over which grants were funded the volume of grants area size and number of grants Science of Science Sci Tool User Manual Version Alpha3 50 uu a T es Cyberinfrastructure Software Sustainability and EX CSR CSI An Adaptiv re Pro gramming Framework f Collaboratrve Research Science of Search Dat OS T DT es und d as NMI Collaborativ ze Proposal Middleware for Gr CISE Research Infrastructure A Research Infra ITR SY Collaborative Research A Unified R
25. DATASET GLOBAL CITATION COUNT LOCAL CITATION COUNT ADDITIONAL NAME FAMILY NAME FIRST INITIAL FULL NAME MIDDLE INITIAL PERSONAL NAME Extract Documents Outputs a table containing one row per document in the database together with columns for TITLE TIMES CITED ABSTRACT TEXT ARTICLE NUMBER BEGINNING PAGE CITED REFERENCE COUNT CITED YEAR DIGITAL OBJECT IDENTIFIER DOCUMENT TYPE DOCUMENT VOLUME ENDING PAGE FIRST AUTHOR FK FUNDING AGENCY AND GRANT NUMBER FUNDING TEXT ISBN ISI DOCUMENT DELIVERY NUMBER ISL UNIQUE ARTICLE IDENTIFIER ISSUE LANGUAGE PAGE COUNT PART NUMBER PUBLICATION DATE PUBLICATION YEAR SOURCE SPECIAL ISSUE SUBJECT CATEGORY SUPPLEMENT Extract Keywords Outputs a table containing one row per keyword in the database together with columns for KEYWORD TYPE OCCURRENCES IN DATASET Science of Science Sci Tool User Manual Version Alpha3 13 Extract Document Sources Outputs a table containing one row per document source in the database together with columns for FULL TITLE ISO TITLE ABBREVIATION TWENTY NINE CHARACTER SOURCE TITLE ABBREVIATION NUM PAPERS CONTAINED FROM DATASET ISSN BOOK SERIES TITLE BOOK SERIES SUBTITLE CONFERENCE HOST CONFERENCE LOCATION CONFERENCE SPONSORS CONFERENCE TITLE Extract Authors by Year Outputs a table containing the number of publications per author per year and author ID Extract References by Year Outputs a table containing the number of re
26. Figure 5 19 Largest connected component of CTSA Center publication co authorship network 5 2 4 Biomedical Funding Profile of NSF NSF Data MedicalAndHealth nsf Time frame 2003 2010 Region s Miscellaneous Topical Area s Biomedical Analysis Type s NSF Organization Program Network What organizations and programs at the National Science Foundation support projects that deal with medical and health related topics Data was downloaded from the NSF Awards Search SIRE http www nsf gov awardsearch on Nov 23 2009 using the query medical AND health in the title abstract and awards field with Active awards only checked see section 4 2 2 1 NSF Award Search for data retrieval details Science of Science Sci Tool User Manual Version Alpha3 71 The 286 awards are available at yoursci2directory sampledata scientometrics nsf MedicalAndHealth nsf Load them as an NSF csv format and run Data Preparation gt Text Files gt Extract Directed Network with parameters Bl Extract Directed Network E x Given a table this algorithm creates a directed network by placing a directed edge between the values in a given column to the values of a different column Source Column NSF Organization Target Column Program s KJ Text Delimiter M x Aggregate Function File I Browse Cancel Select Network with directed edges from NSF Organization to Program s in the data manager and run Analysis
27. Help Author impact analysis Journal impact analysis General citation search Multi query center Check for updates Author impact analysis Perform a citation analysis for one or more authors Query Author s name Albert L szl Barab si IV Biology Life Sciences Environmental Science Iv Business Administration Finance Economics Exclude these names i v Chemistry and Materials Science Year of publication between 0 and 0 v Engineering Computer Science Mathematics v Medicine Pharmacology Veterinary Science IV Physics Astronomy Planetary Science v Social Sciences Arts Humanities About Publish or Perish Help contents What s new Version information 4 ees FAL Sunday September 21 2008 Results Papers 111 Cites paper 129 21 h index 30 AWCR 1711 35 Citations 14342 Cites author 7115 79 g index 111 AW index 41 37 Years 36 Papers author 58 33 he index 25 AWCRpA 850 69 Cites year 398 39 Authors paper 2 56 hl index 10 84 hI norm 21 Cites Per R Auhos 3899 389 90 1 ALBarab si R Al Emergence of Scaling in R 1999 Science 3146 449 43 2 R Albert AL Bar Statistical mechanics of c 2002 Reviews of Modern Physics 1943 138 79 3 ALBarab si HE Fractal Concepts in Surfa 1995 1341 223 50 4 ALBarab si RE Linked The New Science 2003 American Journal of Physics 514 57 11 5 RAlbert L Bar Topology of Evolving Net 2000 Physica
28. I I I I I 1 I 1 Ben T nV PE 4 1 i i i i i i i i i i i i Camel UD ON VT P315 DOR 16 1654 Somn 7 3004 3 12 i 1 i 1 1 i 1 1 i 1 1 1 1 I 1 1 i 1 Bana T 2008 Vn Fle CH 13008 fu Pee D L7 a 1 l l l i i ini ME e a i 1 l Ral P hors NATURE Dim quem jonais L i i i i i i i i Hema T 2205 SORTS PIDANA oranana i i i i i i i i i i Hisab JE 2008 P NATL ACAL SCI USA VIO Piss DO iun ME LLLLL B 1 1 i 1 1 ened OF 2223 SCIENTOMETKICS Vi Pl i I I I I I l I I i I I Fee DA SPA NATURE Von PINI Co Palm u n I i l i i I I i 1 l l 1 i i per NATURE Wali FEA I EM Pad tort 01 04 d ae i Nc 1 F I 1 T I I I I I I i I I 2000 KE POLY mur p J 1 i I i i I I i I I I 1 F J i 1 l l i i I I i I l l Thera M 3901 AM SOC IR SO TEC v54 Pr i i i i i i i i i r r i uh M MES VO PHOS t i i i I 1 I I i I 1 I I 1 I I PET ii Voi Pe T T T T T T 1 1 I l I 1 I l T T T T 1 i I i I I i I I i I i i Ra fs BF esi TC DOK i d 1 l L l l I i i l Mama IET vara I i i i i l I l i Mesi HF 1 EE TM PM a a i I i l i 1 I I 1 I i I T L 1256 q 7 I 1 1 i I I E tur 199 VITAE FLA T PHA I
29. Intellectual Property Resource http search pipra or Intellectual Property e SparklP 2007 Spark IP The World s Leading IP Research and Marketplace Platform http www sparkip com e Funding e eJacket https www ejacket nsf gov e Environmental Impact Database http www epa gov oecaerth nepa eisdata html e National Science Foundation Find Funding http nsf gov funding e NEH Funded Projects Query https securegrants neh gov publicquery main aspx e NIH RePORTER http projectreporter nih gov reporter cfm e Research gov http www research gov e USAspending gov http www usaspending gov Federal Reports e EuroStats http epp eurostat ec europa eu portal page portal eurostat home e National Academies Reports http www nationalacademies org publications e OECD Statistics www oecd org statistics e Science and Engineering Indicators 2006 Arlington VA National Science Foundation e SRSS amp E http www nsf gov statistics Surveys e Taulbee Survey of CS Salaries http www cra org statistics Science Databases e FAO http www fao org agris search search do e NCBI Plant Genomes http www ncbi nlm nih gov genomes PLANTS PlantList html e NCBI GenBank http www ncbi nlm nih gov Genbank Science of Science Sci Tool User Manual Version Alpha3 107 e TAIR http www arabidopsis org Other e Carter Susan B Scott Sigmund Gartner Michael R Haines Alan L Olmstead Richard
30. Journal of Earthquake Engineering 12 S2 92 99 Erd s P and A R nyi 1959 On Random Graphs I Publicationes Mathematicae Debrecen 6 290 297 Fekete J D and K B rner chairs Eds 2004 Workshop on Information Visualization Software Infrastructures Austin Texas Garfield E 2008 HistCite Bibliometric Analysis and Visualization Software Bala Cynwyd PA HistCite Software LLC Gilbert E N 1959 Random Graphs Ann Math Stat 30 1141 Harzing A W 2008 Publish or Perish A citation analysis software program Retrieved 4 22 08 from http www harzing com resources htm Heer J S K Card et al 2005 Prefuse A toolkit for interactive information visualization Conference on Human Factors in Computing Systems Portland OR New York ACM Press Science of Science Sci Tool User Manual Version Alpha3 111 Huang W B Bruce Herr Russell Duhon Katy Borner 2007 Network Workbench Using Service Oriented Architecture and Component Based Development to Build a Tool for Network Scientists International Workshop and Conference on Network Science Hull D K Wolstencroft et al 2006 Taverna A Tool for Building and Running Workflows of Services Nucleic Acids Research 34 Web Server Issue W729 W732 Ihaka R and R Gentleman 1996 R A language for data analysis and graphics Journal of Computational and Graphical Statistics 5 3 299 314 Jaro M A 1989 Advances i
31. Loads an ISI file selected in the Data Manager but not a cleaned ISI file into the database Database schema can be found at https nwb slis indiana edu community n Sci2Algorithm LoadlSIFilelntoDatabase and retrieved via right clicking an NSF Database file in data Manager and selecting View Load NSF File into Database Loads an NSF file selected in the Data Manager into a database Database schema can be found at https nwb slis indiana edu community n Sci2Algorithm LoadNSFFilelntoDatabase and retrieved via right clicking an NSF Database file in data Manager and selecting View Database O ISI Merge Identical ISI People Merges identical author names by removing punctuation and capitalization Suggest ISI People Merges Generates a pre annotated merging table based on a user selected threshold and string similarity metric Merge Journals Merges journals between sources and cited references based upon known name variants and abbreviations see lookup table in yoursci2directory configuration JournalGroups txt Match References to Papers Matches references to papers if they have the same first author source journal start page volume and year Matching references is necessary for several types of analyses e g extracting a paper citation or a co citation network Extract Authors Outputs a table containing one row per author in the database and columns for PAPERS AUTHORED IN
32. Make sure to wait until each cleaning step is complete before beginning the next one LT lolx File Data Preparation Preprocessing Analysis Modeling Yisualization Help console l iio Data Manager AisilFourNetSciResearchers isi amp ISI Database From isi FourNetSciResearchers isi with identical people merged with journals merged Rwith references and papers matched arning while we use Web of Science s official list of ournal Title Abbreviations that list does not cover all pellings of cited sources Additionally in some cited eferences it is not possible to disambiguate between embers of a book or conference series and a journal ith the same name uccessfully merged 84 entities into other entities leaving 2083 entities in the database atch References to Papers was selected 151 references were matched with documents E scheduler m Remove From List Remove completed automatically F m E Toa i W Match References to Papers 03 27 2010 Merge Journals 03 27 2010 T Zl s vt DE vom 4 mm dmm lema cm Figure 5 16 Cleaned database of FourNetSciResearchers Many different tables can be extracted for different views of the data Run Data Preparation gt Database gt ISI gt Extract Authors and right click on the resulting table to view all the authors from FourNetSciResearchers isi The table also
33. Network VisualiZdtiOlcsss ssi soe On adt a bh qe ona aet va hh ve oa lu Oi a Ea VERE RR OR RR 38 Seiler RERUM E HH 41 AEN OT Random GLAM VIO CC Mi ecards ie Pa ee OR de eO ee On eunt d aUe Ore dite ca hl seeker Ds 41 2 10 2 A Watts Strogatz Small WV Ol O esaa ia M utc ES Ca E bud lo dtu Pi Ru ta Pa a aa 41 4 10 3 Barabasi Albert Scale Free Model ccccsssscsssccccccccnssssessccccsannscesscceccectanassesscccesaanssesessceceenenaascessceseas 42 EE lema delli TIPO E E 5 individual Level Studies s MIGEO 2 acte R ETE Deb La DR C Ca EC NN ETE ED uoa eR eS doe od todos 44 5 1 1 Mapping Collaboration Publication and Funding Profiles of One Researcher EndNote and NSF Data 44 5 1 2 Time Slicing of Co Authorship Networks ISI Data ccccccccccssssssseeecceeeceeeeseeeceeeeeeeeeecceeeeseeeueneeeeeeeeeas 47 5 1 3 Funding Profiles of Three Researchers at Indiana University NSF Data cccccccccsesecceeeeesecseeeeeeneees 49 5 1 4 Studying Four Major NetSci Researchers ISI Data cccccccsssssssseececeeeeeeeeseecceeeeeeeaeeeceeseeseeaenseeeeeeeeas 54 5 1 5 Studying Four Major NetSci Researchers ISI Data using Database cccccccsssececcceesseeceeeeeeeecesseeeeess 61 5 2 dnstitution Level Studies lt MESO uuo dodo io o imt ef eva cua deor mit ute dif oundied tuo ovra evaei t une du eura Cum 65 5 2 1 Funding Profiles of Three Universities NSF Data
34. Overview b Visual Index Date and input directory For each PDF file Basie counts Basic counts and thumbnail science map Overlay of all matched Max 18 per page journal references from all PDF files on 554 scientific disciplines nodes in UCSD Map of Science Circle size denotes references Listing of all references grouped by 13 science areas occgopgcocopucuodgg ac 28 c Details Dial for borer pali ekrar mat ue at For each PDF file Overlay of all matched pn z s _ n journal references on MES P WT i n54 scientific fields nodes UC mm 2 E te Tir ET hi a S al T in UCSD Map of Science oo n Top n most similar POF files identified based on journal NAmE cos 0eocrgpericeoessx The similarity of each PDF file to itself is i Circle size denotes referenees Colors and names of science areas that are cited Overlay of matched journal refe rerit p from all above listed PDF files on UCSD Map of Science and grouping by 17 science areas Alphabetie listing of cited journals and of times cited Presentation of RefMapper analysis results Reference Klavans Richard Kevin W Boyack 2007 Is There a Convergent Structure to Science In Daniel Torres Salinas amp Henk F Moed Eds Proceedings of the 11th International Conference of the International Society for Scientometrics and Informetrics pp 437 448 Madrid CSIC NWB Team 2006 Network Workbench Tool Indiana
35. Plugins The Sci Tool is an empty shell filled with plugins Some plugins run on the core architecture OSGi and ClShell Others convert loaded data into in memory objects formatted for different algorithms to read it The algorithm plugins themselves can be divided into different menus in this case data processing preprocessing analysis modeling visualization and scientometrics Users are not limited to using pre packaged plugins instead they can create download share and import their own To use an alternative plugin simply copy the jar file you created or downloaded into yoursci2directory plugins and then look for the plugin s name in the Sci Tool menu structure A step by step guide to developing new plugins can be found at http cishell org n DevGuide NewGuide Science of Science Sci Tool User Manual Version Alpha3 105 7 4 Tools That Use OSGi and or CIShell Recently a number of other efforts adopted OSGi and or ClShell Among them are Cytoscape http www cytoscape org lead by Trey Ideker UCSD is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data Shannon Markiel et al 2002 Taverna Workbench http taverna sourceforge net lead by Carol Goble University of Manchester UK is a free software tool for designing and executing workflows Hull Wolstencroft et al 2006
36. PostScript files o GnuPlot Plots data e Temporal o Horizontal Line Graph Generates a bar graph whose x axis is time and whose bars are sized based on a user specified value Result is a Postscript file e Geospatial o Geo Map Circle Annotations Generates a map of the US or the world upon which circles of user defined size and color are projected Result is a Postscript file o Geo Map Colored Region Annotations Generates a map of the US or the world with regions colored based on a user defined metric Result is a Postscript file e Networks o GUESS Interactive data analysis and visualization tool o Radial Tree Graph prefuse alpha A single node is placed at the center and all others are laid around it in a tree structure o Radial Tree Graph with Annotation prefuse beta A single node is placed at the center and all others are laid around it in a tree structure with labels o Tree View prefuse beta Visualizes directory hierarchies in a tree structure o Tree Map prefuse beta Visualizes hierarchies using the Treemap algorithm o Force Directed with Annotation prefuse beta Sorts randomly placed nodes into a desirable layout that satisfies the aesthetics for visual presentation o Fruchterman Reingold with Annotation prefuse beta Visualization which lays out nodes based on some force between them o DrL VxOrd A force directed graph layout toolbox focused on real world large scale graphs o Specified pre
37. Sutch Gavin Wright Eds 2006 The Historical Statistics of the United States Millennial Edition ed Cambridge University Press http hsus cambridge org HSUSWeb HSUSEntryServlet accessed on 4 29 2009 e State of Utah 2009 Foreign Labor Certification Data Center Online Wage Library http www flcdatacenter com accessed on 5 1 2009 e The University of California Davis 2009 The Public Intellectual Property Resource for Agriculture http www pipra org accessed on 4 30 2009 e United States Central Intelligence Agency 2008 The World Factbook United States https www cia gov library publications the world factbook print us html accessed on 9 29 2008 e World Intellectual Property Organization WIPO 2007 WIPO Patent Report Statistics on Worldwide Patent Activities e Census Data http factfinder census gov 8 2 Network Analysis Tools Table 7 1 provides an overview of existing tools used in scientometrics research see also Fekete and B rner chairs 2004 The tools are sorted by the date of their creation Domain refers to the field in which they were originally developed such as social science SocSci scientometrics Scientom biology Bio geography Geo and computer science CS Coverage aims to capture the general functionality and types of algorithms available e g Analysis and Visualization A V see also description column Table 6 1 Network analysis and visualization tools commonly used
38. Text using parameters New Separator Abstract Check this box The performed text normalization utilizes the StandardAnalyzer provided by Lucene http lucene apache org It separates text into word tokens normalizes word tokens to lower case removes s from the end of words removes dots from acronyms deletes stop words then applies the English Snowball stemmer http snowball tartarus org algorithms english stemmer html which is a version of the Porter2 stemmer designed for the English language The result is a derived table in which the text in the abstract column is normalized Select this table and run Data Preparation Text Files Extract Word Co Occurrence Network using parameters Node Identifier Column Cite Me As Text Source Column Abstract Text Delimiter Aggregate Function File None The outcome is a network in which nodes represent words and edges denote their joint appearance in a paper Word co occurrence networks are rather large and dense Running the Analysis Networks Network Analysis Toolkit NAT reveals that the network has 2 888 word nodes and 366 009 co occurrence edges There are 235 isolate nodes that can be removed running Preprocessing Networks Delete Isolates Note that when isolates are removed papers without abstracts are removed along with the keywords The result is one giant component with 2 653 nodes and 366 009 edges To visualize this rather large network run Visualizati
39. Version Alpha3 75 Area size equals numerical value e g award amount Text e g title Start date End date I Todorov R Braun T l l l Schpbert A mam l I l da Me n de ma 4 da a ge ee a d s sw d p ee a eS dd a a a Oe des ads Figure 5 23 Visualization of bursts for authors To identify and visualize ISI keywords or cited references that experience a sudden increase in usage frequency follow the same workflow as above but change parameters in Burst Detection window as shown in Figure 5 24 xl xi Perform Burst Detection on time series textual data Perform Burst Detection on time series textual data Gamma 1 0 KJ gamma hoo o oo KJ General Ratio 2 0 KJ General Ratio an 0 fej First Ratio 2 0 9 First Ratio 20 K Bursting States 1 i Bursting States K Date Column Publication Year Date Column Publication Year K Date Formak y Date Format C K Text Column New ISI keywords 6j Text Column Cited References Text Separator i 8 Text Separator Cancel Cancel Figure 5 24 Burst detection parameters for ISI keywords and cited references The results are shown in Figure 5 25 Note the different burst times and strengths There are 247 records for cited references bursts which can be printed on a wall size piece of paper but not on a letter size piece of paper To
40. W Talley Edmund M Burns Gully APC Newman David amp La Rowe Gavin 2009 The NIH Visual Browser An Interactive Visualization of Biomedical Research Proceedings of the 13th International Conference on Information Visualization IVO9 Barcelona Spain IEEE Computer Society pp 505 509 http ivl slis indiana edu km pub 2009 herr iv visual browser pdf Science of Science Sci Tool User Manual Version Alpha 3 103 6 7 2 Interactive World and Science Map of S amp T Jobs 2010 By Angela Zoss Michael Connover Katy Borner This paper details a methodology for capturing analyzing and communicating one specific type of real time data advertisements of currently available academic jobs The work was inspired by the American Recovery and Reinvestment Act of 2009 ARRA that provides approximately 100 billion for education creating a historic opportunity to create and save hundreds of thousands of jobs Here we discuss methodological challenges and practical problems when developing interactive visual interfaces to real time data streams such as job advertisements Related work is discussed preliminary solutions are presented and future work is outlined The presented approach should be valuable to deal with the enormous volume and complexity of social and behavioral data that evolve continuously in real time and analyses of them need to be communicated to a broad audience of researchers practitioners clients educators and inter
41. dazrandirg Amie Ir zemmian Pemma joterretign mer Search All Piles Hore batien Tb vins Pha bone Click pon thi sward namiar pe vila Clock an cha dura in aca columen np pariarm a fas Daath with vidc pararmarar Ealing Search L gerard foused diplopi 1 fo 24 Ferst Prev 1 3 3 3 35 0 Fe 09 00 11 125 23 Sexti Lt Mi Tha pas Bald Gales Gaarck Santd For parched the cla abani ed deed mumbar Pales ferant Aem Fert Bagtrict be Tute Dahe JE Awardee Information FI Lookup Hiki Dai adir CEP eil real in aleve deere l ci2099 ghb Include CO Els E Organization La ckup Figure 4 7 NSF Award Search site To retrieve all projects funded under the Science of Science and Innovation Policy SciSIP program simply select the Program Information tab do an Element Code Lookup enter 7626 into the Element Code field and hit Search button On Sept 21 2008 exactly 50 awards were found Award records can be downloaded in CSV Excel or XML format Save file in CSV format and replace rename the file extension from csv to nsf A sample nsf file is available in yoursci2directory sampledata scientometrics nsf BethPlale nsf n the Sci Tool load the file using File gt Load File A table with all records will appear in the Data Manager Right click and view file in Microsoft Office Excel Data in NSF files can be used for the following types of analyses
42. each graph at different clustering levels to give a measure of structural accuracy for each map The best co citation and inter citation maps according to local and structural accuracy were selected and are presented and characterized These two maps are compared to establish robustness The inter citation map is then used to examine linkages between disciplines Biochemistry appears as the most interdisciplinary discipline in science WM di IC Ic Ic Ma i Raw Cosine Jaccard p MS Ic Ic ET 4 Pearson RFavg wu cc cc cc Raw Pearson kKS0 Maps of science generated from eight different journal journal similarity measures Dots represent journals Lines represent the edges remaining at the end of the VxOrd runs Similarity measures corresponding to the various map panels are listed in the middle right panel Reference Boyack Kevin W Klavans Richard amp B rner Katy 2005 Mapping the Backbone of Science Scientometrics Vol 64 3 351 374 http ivl slis indiana edu km pub 2005 boyack mapbckbn pdf Science of Science Sci Tool User Manual Version Alpha3 98 6 5 2 Toward a Consensus Map of Science 2009 By Richard Klavans amp Kevin Boyack A consensus map of science is generated from an analysis of 20 existing maps of science These 20 maps occur in three basic forms hierarchical centric and noncentric or circular The consensus map generated from consensus edges that occur in at least half of the input maps em
43. f csv 72 kB usPTO inventor table EN s abo T gt r USPTO master burst format rm Details a USPTO inventor t OW 69 KB hE SA USPTO_master_ burst_format csv 308 KB sdb amp USPTO master table csv 37 KB _Download File Folder G amp L UsPTO_Patent_Cooperation_Treaty_table csv 2KB dee d Figure 4 11 Scholarly Database Download interface Data from the SDB can be used in a great number of ways The following is an abridged list of suggested uses o Statistical Attributes o expected total amount o times cited o citing patents o Temporal Analysis o issue date year date expires project end issue date started project start volume o published year o Geospatial Analysis o address street city state country zipcode o residence o Topical Analysis O O O O O OQ O O 0 o Science of Science Sci Tool User Manual Version Alpha3 32 abstract descriptorname nsf_org title article title Title o Network Analysis name inventor authors cited patents investigators pi title DO OQ U O O O O O o O 4 3 Database Loading and Manipulation Coming soon but see Sections 5 1 5 and 5 2 2 4 4 Summaries and Table Extractions Coming soon 4 5 Statistical Analysis Profiling Coming soon 4 6 Temporal Analysis When Science evolves over time Attribute values of scholarly entities and their diverse aggregations increase and decrease at different rates and respond with different laten
44. friendly reducing the difficulty of common scientometrics tasks as well as allowing scientometrics functionality to be exposed to non experts Network Workbench embodies both of these trends providing an environment for algorithms from a variety of sources to seamlessly interact in a user friendly interface as well as providing significant visualization functionality through the integrated GUESS tool Science of Science Sci Tool User Manual Version Alpha3 110 9 References Adar E 2007 Guess The Graph Exploration System Retrieved 4 22 08 from http graphexploration cond org AT amp T Research Group 2008 Graphviz Graph Visualizaiton Software Retrieved 7 17 08 from http www graphviz org Credits php Auber D Ed 2003 Tulip A Huge Graph Visualisation Framework Graph Drawing Softwares Mathematics and Visualization Berlin Springer Verlag Barabasi A L and R Albert 1999 Emergence of scaling in random networks Science 286 509 512 Barabasi A L and R Albert 2002 Statistical mechanics of complex networks Reviews of Modern Physics 74 47 97 Batagelj V and U Brandes Efficient Generation of Large Random Networks Physical Review E 71 036113 036118 Batagelj V and A Mrvar 1998 Pajek Program for Large Network Analysis Connections 21 2 47 57 Borgatti S P M G Everett et al 2002 Ucinet for Windows Software for Social Network Analysis Retrieved 7 15 08 f
45. has columns with information on how many papers each person in the dataset authored their Global Citation Count how many times they have been cited according to ISI and their Local Citation Count how many times they were cited in the current dataset The queries can also output data specifically tailored for the burst detection algorithm see Section 4 6 1 Burst Detection Run Data Preparation gt Database gt ISI gt Extract Authors gt Extract References by Year for Burst Detection on the cleaned database followed by Analysis gt Topical gt Burst Detection with the following parameters W Burst Detection x Perform Burst Detection on time series textual data Gamma fio x General Ratio 210 Kk First Ratio 20 ko Bursting States ln YI Date Column vear Date Format vvv Text Column Reference Text Separator Hl Cancel Now visualize the burst analysis with Visualize gt Temporal gt Horizontal Bar Graph with the following parameters x Takes tabular data and generates PostScript For a horizontal bar graph Label wd mj K Start Date start bd K End Date nd ej Size By Bewh ex Date Format Month Day Year Date Format U 5 e g 10 31 2010 ko Year Label Font Size 000 2 YI Bar Label Font Size food e Cancel See Section 2 4 Saving Visualizations for Publications to save and view the graph show
46. in scientometrics research Coverage Description Open Operating References Source System S amp T Scientom Scientom Tools from Loet Command Dynamics Leydesdorff for line Toolbox organization analysis and visualization of scholarly data Windows Leydesdorff 2008 Windows Krebs 2008 In Flow A V Social network analysis Graphical No software for organizations with support for what if analysis Pajek SocSci A V A network analysis and Graphical No Windows Batagelj and visualization program Mrvar 1998 with many analysis algorithms particularly for social network analysis Windows Borgatti Everett et al 2002 UCINet SocSci Social network analysis Graphical software particularly useful for exploratory analysis Boost CS Analysis Extremely efficient and Library Graph and flexible C library for Library Manipulati extremely large on networks Science of Science Sci Tool User Manual Version Alpha3 108 All Major Siek Lee et al 2002 No All Major Brandes and Wagner 2008 Social network analysis Graphical tool for research and teaching with a focus on innovative and advanced visual methods Visone 2001 SocSci GIS software that can be used to lay out networks on geospatial substrates GeoVISTA 2002 Graphical Yes All Major Takatsuka and Gahegan 2002 Graphical Yes All Major Cytoscape Consortium 2008 Visualizatio Network visualization n and analysis tool
47. integration and utilization of datasets algorithms tools and computing resources ClShell is based on the OSGi R4 Specification and Equinox implementation OSGi Alliance 2008 The subsequent sections of this tutorial are organized as follows Section 2 provides a general introduction on how to get started by installing the Sci2 Tool managing the user interface reading and writing different data formats using sample datasets and saving visualizations for publication Section 3 discussed different types of algorithm and tool plugins It also presents the results of scalability tests and provides information on extending memory allocation for larger datasets on different operating systems Section 4 gives an introduction to the design of meaningful workflows comprising data acquisition and preparation for different datasets using text files or databases temporal analysis geospatial analysis topical analysis network analysis and modeling Section 5 exemplifies and details specific workflows at the micro individual meso local and macro global levels Last but not least section 6 reviews sample science studies and online services as inspiration for future research and practice Science of Science Sci Tool User Manual Version Alpha3 6 2 Getting Started 2 1 Download Install Uninstall The Sci Tool is a stand alone desktop application that installs and runs on all common operating systems To download the tool please register
48. is too dense to layout in GUESS we run Visualization gt DrL VxOrd with the parameters W Dr xOrd x This algorithm lays out nodes based on the VxOrd Force directed layout algorithm Edge Weight Attribute weight Y M New X Position Attribute Name xpos M New Position Attribute Name ypos M Do not cut edges M Edge Cutting Strength 0 8 M Cancel Next select Laid out with DrL in the Data Manager and run Visualization gt Network gt GUESS Use the following commands in the GUESS interpreter gt for n in g nodes n x n xpos 10 sa n y n ypos 10 gt resizeLinear localcitationcount 1 50 gt colorize localcitationcount gray black gt resizeLinear weight 25 8 gt colorize weight 127 193 65 255 black Science of Science Sci Tool User Manual Version Alpha3 73 Go to Graph Modifier and choose Object nodes based on gt gt Property localcitationcount gt Operators gt gt Value 20 gt Hide Label See Figure 5 21 86 SCIENTOMETRICS V9 P231 libedA 1969 SCIENTOMETRICS V16 P3 RP 1935 SCIENTOMETRICS V10 P157 ISSCIENTOMETRICS V39 P381 M NETEICS vdd E97 Pebubert A 1 an ides MEE E oF Pide O me den sone coin rn Do Ja 1931 SCENTOMETRICS V27 P155 Rip A 1984 SCIENTOMETRICS V6 P381 E Figure 5 21 Document co citation network for Scientometric isi in GUESS without DrL edge cutting left and
49. largest component right of Cornell University s co investigator network 67 nodes Reference B rner Katy Huang Weixia Bonnie Linnemeier Micah Duhon Russell Jackson Phillips Patrick Ma Nianli Zoss Angela Guo Hanning amp Price Mark 2009 Rete Netzwerk Red Analyzing and Visualizing Scholarly Networks Using the Scholarly Database and the Network Workbench Tool Birger Larsen Jacqueline Leta Eds Proceedings of ISSI 2009 12th International Conference on Scientometrics and Informetrics Rio de Janeiro Brazil July 14 17 Vol 2 Bireme PAHO WHO and the Federal University of Rio de Janeiro pp 619 630 http ivl slis indiana edu km pub 2009 borner issi pdf Science of Science Sci Tool User Manual Version Alpha3 102 6 7 Interactive Online Services 6 7 1 The NIH Visual Browser An Interactive Visualization of Biomedical Research 2009 By Bruce W Herr Il Edmund M Talley Gully A P C Burns David Newman amp Gavin LaRowe This paper presents a technical description of the methods used to generate an interactive two dimensional visualization of 60 568 grants funded by the National Institutes of Health in 2007 The visualization is made intelligible by providing interactive features for assessing the data in a web based visual browser see http www nihmaps org The key features include deep zooming selection full text querying overlays color coding schemes and multi level labeling Major insights
50. m s eR M o sme r Z5 qr e M i A 7 pact a UN E on 2 384 d qo Se La e Y t DL N ij LZ PSY aW ew T ws F Wu P PONE ny 78 AT ee AT ae Ki A SFT er T uel Scl tos a P wha Ya Sas BS F Ed 4i p d Vr p isee RSK Cagy 173191130 CT b x P a x 3 t Ere MAN NBN RI CLS 0 C ES S I SC eia i I Stag RA S h TATV Sd Die ORA INC IS SOR SC PL A SAN DOOD Sabrina sa KITES NII Author supplied linkage patterns light gray lines from grants to publications with links highlighted as dark lines for grant 01 P50 AG11715 01 T i i vi Reference Boyack Kevin W amp B rner Katy 2003 Indicator Assisted Evaluation and Funding of Research Visualizing the Influence of Grants on the Number and Citation Counts of Research Papers Journal of the American Society of Information Science and Technology Special Topic Issue on Visualizing Scientific Paradigms Vol 54 5 447 461 http ivl slis indiana edu km pub 2003 boyack indasst pdf Science of Science Sci Tool User Manual Version Alpha3 87 6 2 2 Mapping Transdisciplinary Tobacco Use Research Centers Publications forthcoming By Angela Zoss amp Katy Borner This paper reports the results of a scientometric study aimed at evaluating and comparing investigator initiated RO1 research and Transdisciplinary Tobacco Use Research Centers TTURC funded by the National Inst
51. scholarly collaboration The more often two authors collaborate the higher the weight of their joint co author link Weighted undirected co authorship networks appear to have a high correlation with social networks that are themselves impacted by geospatial proximity Wellman White et al 2004 Borner Penumarthy et al 2006 4 9 1 2 2 Document Cited Reference Co Occurrence Bibliographic Coupling Network Papers patents or other scholarly records that share common references are said to be coupled bibliographically Kessler 1963 The bibliographic coupling strength of two scholarly papers can be calculated by counting the number of times that they reference the same third work in their bibliographies The coupling strength is assumed to reflect topic similarity Co occurrence networks are undirected and weighted 4 9 1 2 3 Author Cited Reference Co Occurrence Bibliographic Coupling Network Authors who cite the same sources are coupled bibliographically The bibliographic coupling BC strength between two authors can be said to be a measure of similarity between them The resulting network is weighted and undirected 4 9 1 2 4 Journal Cited Reference Co Occurrence Bibliographic Coupling Network Like document and author bibliographic coupling networks journal cited reference co occurrences provide a measurement of similarity between journals Edge strength between two journals is determined by the summing number of unique references bot
52. standard csv To view a bimodal network visualizing which main Pls associate with which institution run Data Preparation gt Text Files gt Extract Bipartite Network with the following parameters lll Extract Directed Network X Given a table this algorithm creates a directed network by placing a directed edge between Ehe values in a given column to the values af a different column Source Column Organization Name a Target Column Pris Name JA Text Delimiter Aggregate Function File Cilbacuments and Settings kakty Desktop Mw B SCT fscipolicy Ock NIH demo Fri Browse A wr we Cancel The resulting network can be visualized in GUESS and laid out using GEM see Figure 5 17 Science of Science Sci Tool User Manual Version Alpha3 69 at LJ BERGLUND LARS F u Gun vERSITY OF CAUFORTNA DAVIS B e uCPHERSON OAVIDO GARNETT FRANK C e OOo O PRINS UNVERSITY OUNNERSITY OF TEXAS HLTH SO CTR HOUSTON c STACPOOLE PETER W FORD DANIEL ERNEST GUNVERSITY Of FLORIDA Li gt OUNVERSITY OF ALABAMA AT BIRMINGHAM cLAW DANEEL J y UNVERSITY OF TEXAS HLTH SCI CTR SANANT GOoUAY OOOF ORD USA M c M GUNVERSITY OF MICHIGANAT ANN ARBOR CLARW RORERTA ONORTHNESTERN UNIVERSITY GPIENTA KENNETH J ee 4 OGREENLANO PHILIP BOSTON UNIVERSITY MEDICAL CAMPUS GUNNVERSITY OF TEXAS MEDICAL BR GALVESTON t B e c Ge CENTER DAVIOM ORASIER ALAN P Inst
53. that can be removed running Preprocessing gt Networks gt Delete Isolates Select Visualization gt Networks gt GUESS and run Layout gt GEM followed by Layout gt BinPack to visualize the network Run the yoursci2directory scripts GUESS co PI nw py script Visualizations of the three university s co PI networks are given in Figure 5 15 Science of Science Sci Tool User Manual Version Alpha3 66 ee s Fifa Ancinictssa Hines Y Li gt a gJanies Gordes E fates Leiter eR Peas VA MR o4 sEric Friedman Charles Driscell Timothy Fahe l li ip s a St iz Bik Kwoon Tye 1 Sandip Tiwari n Carl Batt ri o y 1 La i M Wes C amp Be a Rich Caruana M ou i Wan i S UT X AMireeletathachka F lethenl lcker D a ll 7 i B Dp on E Kenneth Birman inia x4 o09 Stey meeaippondt gt TT lo Wi Ls ate i Andes Lumsdaine i T wi TS M uM EE m Bim Nguyen MW Ub 0 V 1 hs u i b iX i k Node Size amp Color Total award money eC ang ty avi moys v WS caso ma T ow Node Size amp Color 7436828 Total award money l ASS 6 1698347 6 107216976 FOU 00 T e 492885 e 5740 Pika 7436828 7404 ld Scott Page sRidiamieioBrown wtegory Keoleian P 1 d lt NT cde greng Lo S qtt Manik amp Hyman Bass vom a o j Stephane
54. the Sci Tool starts 3 3 2 Mac On a Mac you need to configure the scipolicy ini file inside the Sci Tool application bundle Open the Sci Tool application folder and control click on the scipolicy icon A menu should appear select Show Package Contents Science of Science Sci Tool User Manual Version Alpha3 20 converter_test_files features Get Info Show Package Contents ian Move to Trash Duplicate plugins sampli Make Alias Create Archive of nwb I Copy nwb Color Label x 6 oO 6 e o 6 logs zip 1 of 10 selected 2 Scan with ClamXav Automator Enable Folder Actions Configure Folder Actions Figure 3 1 Find scipolicy in the sci2 directory This should bring up another window with a folder labeled Contents Open the Contents folder and then open the MacOS folder found inside Open the scipolicy ini file in TextEdit or other text editor that will leave the contents as plain text ey C 0 nwb ini c lean showep Lash org eclipse platform auncher xXxMaxPermsize 256m VInargs Adack icanz Resources nwb icns Astart nFirstThread Ans Amxzb m Dorg eclipse zswt internal carbon smallFonts Figure 3 2 Original scipolicy ini file To enable the Sci Tool to access more memory increase the number following Xmx in the scipolicy ini file If the value is increased too much the Sci Tool will not function eoe nwb ini c legn showsep Lash
55. the following workflow 1 Resize Linear gt Nodes gt totalawardmoney gt From 5 To 35 gt Do Resize Linear Resize Linear Edges coinvestigatedawards From 1 To 2 Do Resize Linear Colorize gt Nodes gt totalawardmoney From to gt Do Colorize Colorize gt Edges gt coinvestigatedawards From a To gt Do Colorize Object all nodes gt Show Label Type in Interpreter gt for n in g nodes n strokecolor n color a a m N d li n LO Snehasis Mukhopadhy ay Node Size amp Color Total Award Money Jayed Mostafa Stephen Uzzo 3527728 Weixia Huang 604860 11890 3527728 11890 Katy Borner Edge Size amp Color Co Investigated Awards 2 1 EN 2 1 Santiago Schnell Albert Laszlo Barabasi Figure 5 2 NSF Co PI network of Katy Borner Science of Science Sci Tool User Manual Version Alpha3 46 For asummary of the grants themselves with a visual representation of their award amount select the NSF csv file in the Data Manager and run Visualization gt Temporal gt Horizontal Bar Graph entering the following parameters lll Horizontal Bar Graph l X Takes tabular data and generates PostScript For a horizontal bar graph Label Title Start Date start Date End Date Expiration Date Size By Awarded Amount to Date Date Format Month Day Year Date Format 0 5 e g 10 15 2010 8 Cancel The generated postscript file can be viewed usin
56. with DrL VxOrd right 5 2 6 Burst Detection in Scientometrics ISI Data Scientometrics isi Time frame 1978 2008 Region s Miscellaneous Topical Area s Scientometrics Analysis Type s Scientometrics Next we want to know what topics drive research in scientometrics research and which of these topics and author names experienced a sudden increase in usage frequency over the 31 years this dataset covers This section demonstrates the application of burst detection described in Section 4 6 1 Burst Detection Load Scientometrics isi from yoursci2directory sampledata scientometrics isi using File gt Load and Clean ISI To identify authors that published a large number of papers rather suddenly perform the following steps First normalize the authors using Preprocessing gt Topical gt Normalize Text with the following parameters Hl Normalize Text x rS New Separator i Abstract x w Authors Suthors Full Mames Beginning Page lle le le le Book Series Subtitle Science of Science Sci Tool User Manual Version Alpha3 74 Next run Analysis gt Topical gt Burst Detection on the normalized table with the following parameters ll Burst Detection x Perform Burst Detection on time series textual data Gamma i0 Kk General Ratio h Kk First Ratio 2 0 Kk Bursting States 1 Ka Date Column Publication Year K Date Formak
57. 2 1 Funding Profiles of Three Universities NSF Data Cornell nsf Indiana nsf Michigan nsf Time frame 2000 2009 Region s Cornell University Indiana University Michigan University Topical Area s Miscellaneous Analysis Type s Co PI Network Load Cornell nsf Michigan nsf and Indiana nsf from yoursci2directory sampledata scientometrics nsf and use the following workflow for each of the three nsf files loaded Select one of the loaded datasets in the Data Manager window and run Data Preparation Text Files Extract Co Occurrence Network using the parameters M8 Extract Network from Table Extracts a network from a delimited table Column Name m Investigators a Text Delimiter fi J Aggregation Function File C Documents and Settings katy Desktopnwb sampledatalscientometrics properties nsfCoPI properties r Science of Science Sci Tool User Manual Version Alpha3 65 Two derived files will appear in the Data Manager window the co PI network and a merge table In the network nodes represent investigators and edges denote their co PI relationships The merge table can be used to further clean PI names see Section 5 1 4 2 Author Co Occurrence Bibliographic Coupling Network Choose the Extracted Network on Column All Investigators and run Analysis gt Networks gt Network Analysis Toolkit NAT This will display the amount of nodes and edges as well as the amount of isolate nodes
58. 3 I I I I 1 l i l Seiler A PPS a a LL B E l l I l I i i I i i I I l Pal A LITTLE a es SCI p a a i i i I I I I I I i 1 I 1 I I I ee S Vid P3 I I I 1 I I I I I 1 1 l 1 Boma T 1 Wiel FHS I i 1 I 1 l I 1 1 I I i iman bai T A I i I I I l I l l REF l 7 gy BEE aA h we COI T pe T Fog i i i i i i i i i i Marin BR 1 EES POLICY Ta j 1 1 i J L l l 1 emi T 1988 5CTE FIFI L d l E 4 J l L l l l I J 1 I I 1 I Fama T 128 TIA P4 a I I I 1 I l I I I I 1 i i _ Mes HI 1985 RES POLICY VIA PLH i i i i i i i i i I i i haind sry HS SCIENTOMPTRICS WA i l l l Hass S 1982 SCIENTUBIETRICS Va PUBL i i I l ablcenky D Wk SCIENTOIME TRIES V IS l I l I l I l 1 Hagan SD MEC TT Va PS I i i l i I I Emm DD 1983 LITTLE SEI G SET t 1 i i I i i i i i i i i i i I i i Foii etn PE P I Li Li 1 L L p L J I E E i T T T T T T T T T T 1 1 I 1 I I I 1 1 wm ma e gm eG Figure 5 25 Bursts for new ISI keywords top and top 50 bursts for cited references bottom 5 2 7 Mapping the Field of RNAi Research SDB Data RNAi Science of Science Sci Tool User Manual Version Alpha 3 How many papers patents and funding awards exist on a specific topic Here we selected research on RNA interference RNAi is a system within livin
59. 5 120 480 300 1200 20 1 Table 3 2 The number of seconds to perform each action on a dataset of the given size ona computer with 1 2GB of memory Computer with 1 2GB of RAM UOHE3 5 09 JOUINY 9405 uone115 0 1ueuunooq Pex UOne1 5 0 1ueuunoog Pex UOHe3 JOu1ny pex 491nO YUM UO HeUD JUSLUNDOG pexa sJou1ny o 312e4 x3 Sjuauunooq 19e4 x3 sJouijny pes U93elN sjeuanof 98J49 A jdo d eSJ9 A peo S9l1U3 13 13 180 40 55 420 50 500 50 500 75 13 52 2 12 90 1 25 5000 5000 3600 1080 480 22 Science of Science Sci Tool User Manual Version Alpha 3 4 Workflow Design 4 1 Overview A typical science of science study is given in Figure 4 1 It starts with a NEEDS ANALYSIS of a selected stakeholder group that informs the subsequent workflow design involving DATA ACQUISITION AND PROCESSING DATA ANALYSIS MODELING AND LAYOUT and DATA COMMUNICATION VISUALIZATION LAYERS All datasets algorithms and parameter values used in a study have to be documented in detail in support of replication and interpretation The resulting VALIDATION AND INTERPRETATION should then proceed in collaboration with domain experts and stakeholders Insights gained might generate additional insight needs or inspire changes to the workflow The process is highly incremental often demanding many cycles of revision and refinement to ensure the best datasets are used optimal algorithm par
60. A tool for science of Science research amp practice z a i EH Science of Science Sci Tool User Manual Version Alpha 3 Updated 3 24 2010 Project Investigators Katy B rner and Kevin W Boyack SciTech Strategies Inc Programmers Micah W Linnemeier Russell J Duhon Patrick A Phillips Chintan Tank and Joseph Biberstine Users Testers amp Tutorial Writers Scott Weingart Hanning Guo Katy B rner Cyberinfrastructure for Network Science Center School of Library and Information Science Indiana University Bloomington IN http cns slis indiana edu This work is funded by the School of Library and Information Science and the Cyberinfrastructure for Network Science Center at Indiana University the James S McDonnell Foundation and the National Science Foundation under Grants No IIS 0715303 IIS 0534909 and IIS 0513650 Any opinions findings and conclusions or recommendations expressed in this material are those of the author s and do not necessarily reflect the views of the National Science Foundation INDIANA UNIVERSITY s cb SCHOOL OF LIBRARY i James S McDonnell Foundation 7 AND INFORMATION SCIENCE a Contents COMON TRE 1 Introduction asosni inique veda QENERIEVE IIO TUS UIT 2NVRI QNS VN Or OIN RE Yslt vega Qe vEFRSOUYO VIDE P Cr D 2 GENE SEA CS ETE TETTE 2 1 Download Install Uninstall oc ec ceccceccceccceccesececsecseeseusceueeeeuseesceuuceuseeeeseeeeeeceeeeuseenseuseseuseese
61. Clean ISI File automatically normalizes author names and merges duplicate records and is specifically designed for text based scientometric workflows algorithms within Data Preparation Text Files For database manipulation of ISI or NSF files use File Load followed by File Load into Database and select the appropriate option The converter graph and directory reader produces a sample graph based on file types supported by the Sci Tool and a sample tree based on any directory structure on the hard drive respectively 2 2 1 2 Data Preparation After loading a file use options in the Data Preparation menu to clean the data and create networks or tables which can be used in the preprocessing analysis and visualization steps The Data Preparation Database menu is specifically for ISI or NSF data previously loaded into a database Options in Data Preparation Text Files are for any table based datasets like csv files and are used to extract networks Find detailed information on each menu item in section 3 1 Sci2 Tool Plugins Science of Science Sci Tool User Manual Version Alpgha3 8 2 2 1 3 Preprocessing Use preprocessing algorithms to prune or append networks or tables before analyzing and visualizing them The menu is separated by domain and most simple tasks require staying within the same domain For example to visualize a co authorship network only use algorithms within the Networks domain under
62. Date tato 00 e KJ End Date nd a Size By Bemh 7 m Date Farmat Month Day Year Date Format U 5 e g 10 31 2010 Y KJ Year Label Font Size zn 0 0 0 KJ Bar Label Font Size 0000 0 0 00 000 9 Cancel Right click the resulting postscript file in the data manager and save it as a PostScript file View the resulting file using Section 2 4 Saving Visualizations for Publication 1989 1990 1991 1992 1993 ad 1095 I in 1901 1999 206 2001 Xw 200 24 2009 XXe x Figure 5 29 Top 10 burst terms from MEDLINE abstracts on RNAi 5 3 Global Level Studies Macro 5 3 1 Geo USPTO SDB Data usptoInfluenza csv Time frame 1865 2008 Region s Miscellaneous Topical Area s Influenza Analysis Type s Geospatial Analysis Warning Geo Map is currently being redesigned and some screenshots may not match up with this documentation The file usptolnfluenza csv was generated with an SDB search for patents containing the term Influenza and was heavily modified to produce a simple geographic table Load it using File gt Load gt sampledata gt geo gt usptolnfluenza csv and then select Standard csv format See the data format in Figure 5 30 left Once loaded select the dataset in data manager and click Visualization gt Geo Map Circle Annotation Style inputting the parameters Figure 5 30 right The tool will output a PostScript visualization which can be viewed using GhostView
63. Ebroul T Roberts Jonathan C A gu nne Bad p n 9 Zhang Jian J 5 oss Muhammad e a Weeber Marc Melton William Packer Abel e D Bairoch Amos Melissen Gerar 4 Moeller Erik Lewis Suzanna Muse Roes Peter Jan e Chicester Christine e Berkeley Alfred D Ashbumer Michael oo a Klein Julie Thompson Van Mulligan Erik a Mons Albert Siri Program Associate Michael J e E ilia Hicolas j vid van Ommen Gert checo Roberto sonin Aliam DEP a Wich 6 o den Dunn an Feeney Hill Chair Christopher Hermjakob Henning n Koizumi Ke 9 oun Nancy J ae ba eat alltel e Roessner J David a ace dak Mons Barend Premgg Gu dy Director Thomas J e Taylor Martha M Figure 5 1 Co authorship network of Katy Borner This is a so called ego centric network i e almost complete data is available and shown for exactly one ego The publication records for all other authors in the network are most likely incomplete 5 1 1 2 NSF KatyBorner nsf Time frame 2003 2008 Region s Indiana University Topical Area s Network Science Library and Information Science Informatics and Computing Statistics Cyberinfrastructure Information Visualization Cognitive Science Biocomplexit Analysis Type s Co PI Network Grant Award Summary Free online services such as NSF s Award Search See Section 4 2 2 1 NSF Award Search support the retrieval of ego centric funding profiles
64. Elmore 2 1 kenneth Crews Dennis Gannon EN 3 1 Figure 5 8 Co PI network of Michael McRobbie in Indiana University Science of Science Sci Tool User Manual Version Alpha3 53 5 1 4 Studying Four Major NetSci Researchers ISI Data FourNetSciResearchers isi Time frame 1955 2007 Region s Miscellaneous Topical Area s Network Science Analysis Type s Paper Citation Network Co Author Network Bibliographic Coupling Network Document Co Citation Network Word Co Occurrence Network 5 1 4 1 Paper Paper Citation Network In the Sci Tool load the file yoursci2directory sampledata scientometrics isi FourNetSciResearchers isi using File gt Load and Clean ISI File A table of the records and a table of all records with unique ISI ids will appear in the Data Manager In this file each original record now has a Cite Me As attribute that is constructed from the first author PY J9 VL BP fields of its ISI record and will be used when matching paper and reference records To extract the paper citation network select the 361 Unique ISI Records table and run Data Preparation gt Text Files Extract Directed Network using the parameters lll Extract Directed Network x Given a table this algorithm creates a directed network by placing a directed edge between the values in a given column to the values of a different column Source Column Cited References M Target Column cte Me As
65. F IMFORIMETRICS Voume 3 iioue J Pages 17479 Pubisdet eo nals i M 2G fi iil amp ik and arbi r Teves Cee ceanh Char eE in U Link inbermational proceedings caeemege vit 1 ein Current Limits Foe Lire ard Temngs To dme Tiki pairiireney dign in or negii h as SS ros la tee The rng referent qeamtirg Citron Maas y Ti u t ete REN EM r curs MERLTHIMEORBIULTION AND LERARES JOURNAL n 2S Pager S242 i Al Years Qupedabed 2009 11 11 Want lo krere mare a Soppiement Sappl Pulteted DEC 0000 i Training in multiple linequages mu cer o EC Fran 1HE tf piini detas ig all para D T Gi tint Cate Dara Custamire Your Experience P a B xience Cit dex Expanded ScHEXPAHDELO i present Sin in Fogister r a3 TMw gs oo rede CANHA In nct 5nd oh orbi Gave and manage your ru reme zn Daoa DNI OF SGH SOENS Vome M upd Paper 6273 45 dx PALMS T 1AE an nem EN EE Fr Pataas MN 2008 X prt iere oS aaa RAE r s Mom Set I ru bera tone Figure 4 2 ISI Web of Knowledge search interface Download the first 500 records using the output box at the bottom of the page Enter records 1 to 500 select Full Record and plus Cited Reference select Save to Plain Text in the drop down menu and then click save Wait for the processing to complete and then save the file as EugeneGarfield isi Part of the resulting file can be seen in Figure 4 3 right A file with 99 records c
66. Fj Cancel Isolate nodes can be removed running Preprocessing gt Networks gt Delete Isolates The resulting network has 241 nodes and 1 508 edges in 12 weakly connected components This network can be visualized in GUESS see Figure 5 12 and the above explanation Nodes and edges can be color and size coded and the top 20 most cited papers can be labeled by entering the following lines in the GUESS Interpreter gt resizeLinear globalcitationcount 2 40 Science of Science Sci Tool User Manual Version Alpha3 58 eolorize globalcitationcount 200 200 200 4 0 0 0 resizeLinear weight 25 8 colorize weight 127 193 65 255 black for n in g nodes n strokecoloreu color toptc g nodes det bytcinl n2 return cmp nl globalcitationcount n2 globalcitationcount toptc sort bytc toptc reverse toptc for i in range 0 20 toptc i labelvisible true Ve V V V M V e VVV WV Alternatively run GUESS File gt Run Script and select yoursci2directory scripts GUESS reference co occurence nw py Figure 5 12 Reference co occurrence network layout for FourNetSciResearchers dataset 5 1 4 4 Document Co Citation Network DCA In the Sci Tool select the paper citation network see section 4 9 1 1 Document Document Citation Network and run Data Preparation gt Text Files s gt Extract Document Co Citation Network The co citation network will become available i
67. I AUTHORS DOCUMENT FK gt DOCUMENT PK AUTHORS PERSON FK gt PERSON PK CITED PATENTS CITED PATENTS DOCUMENT FK INTEGER CITED F CITED PATENTS DOCUMENT FK gt DOCUMENT PK CITED PATENTS PATENT FK gt PATENT PK CITED REFERENCES CITED REFERENCES DOCUMENT FK INTEGER C CITED REFERENCES DOCUMENT FK gt DOCUMENT PK CITED REFERENCES REFERENCE FK gt REFERENCE PK DOCUMENT PK INTEGER ABSTRACT_TEXT VARCHAR ARTICLE_NUME FIRST AUTHOR FK gt PERSON PK DOCUMENT_SOURCE_FK gt SOURCE PK DOCUMENT_KEYWORDS DOCUMENT_KEYWORDS_DOCUMENT_FK INTEGER DOCUMENT KEYWORDS DOCUMENT FK gt DOCUMENT PK DOCUMENT KEYWORDS KEYWORD FK gt KEYWORD PK DOCUMENT OCCURRENCES DOCUMENT_OCCURRENCES_DOCUMENT_FK IM DOCUMENT OCCURRENCES DOCUMENT FK gt DOCUMENT PK DOCUMENT_OCCURRENCES_ISI_FILE_FK gt ISI FILES PK EDITORS EDITORS DOCUMENT FK INTEGER EDITORS PERSON FK I Figure 5 15 Viewing the database schema As before it is important to clean the database before running any extractions by merging and matching authors journals and references Run Data Preparation Database ISI Merge Identical ISI People followed by Data Science of Science Sci Tool User Manual Version Alpha3 61 Preparation gt Database gt ISI gt Merge Journals and Data Preparation gt Database gt ISI gt Match References to Papers
68. Lafortype borah Ball P LI v zy v p 1 Ele pelear Kean f David Karowe Victor Li k neal eines f n y t ooesambippaltn Eh ERRARE Yeung I T a a f gt i A I Philip Myers y Aia i eJoseph Kraj Met ett eTamas Gembosi xeirod Kepler eArthurLupia kensall Wise O 107216976 5740 ne 2 Node Size amp Color ia Total award money d Ug 2 v Tus 6 32541158 XP Xil e 2550 32541158 2550 Figure 5 15 Co PI network of Indiana University top left Cornell University top right University of Michigan middle Science of Science Sci Tool User Manual Version Alpha3 67 To see a more detailed view of any of the components in the network e g the largest Indiana component select the Indiana network with deleted isolates in the Data Manager and run Analysis gt Networks gt Unweighted and Undirected gt Weak Component Clustering with the parameter W Weak Component Clustering X Creates new graphs containing the top connected components Number of top clusters 1 A Cancel Indiana s largest component has 19 nodes Cornell s has 67 nodes Michigan s has 55 nodes Visualize Indiana s network in GUESS using the yoursci2directory scripts GUESS co PI nw py script and save the file as a jpg via File gt Export Image Q Suresh Marru Shrideep Pallicka O Nancy Wilkins Diehr i rideep Pallickara s Ryan Mitchell M Sl i
69. S OF THE ASIST ANNUAL MEETING English Proceedings Paper To support the introduction of bioinformatics education into information science curricula panel members and other participants will attempt to define briefly the nature and scope of bioinformatics and its significance for information science education Discussions will also explore emerging opportunities for program graduates in bioinformatics research professional practice and enterprise Univ Texas Grad Sch Lib amp Informat Sci dustin TX 78712 USA Inst Sei Informat Philadelphia FA 19104 USA Oncol Business Unit Novartis Inst Biomed Bes Summit NJ 07901 USA Univ N Carolina Sch Informat amp Lib Sci Chapel Hill Ne 27559 USA So Illinois Univ Morris Lib Carbondale IL 62901 USA Harmon c Univ Texas Grad Sch Lib amp Informat Sci Austin TX 7G 71z Acca USA pop op ARMOUR PG 2001 COMMUN ACM V44 P13 BATES MJ 1999 J AM SOC INFORM SCI 50 P1043 L Towson meurens COLE NJ 1996 J DOC V52 P51 LEE C 1999 BIOINFORMATICS INTER QUE Prd nme Aet dE even ec nga Mahone PU INFORMATION TODAY INC bona F PI MEDFORD Figure 4 3 Saving and viewing EugenGarfield isi ISI files are loosely based on the RIS file format and data in this format is commonly used for the following types of analyses O Statistical Attributes o NR Cited Reference Count o TC Times Cited o Temporal Analysis o RC Date Date Modified PD Date Published IS Issu
70. THORS DISTINCT SOURCES DISTINCT AUTHOR KEYWORDS DpISTINCT IS KEYWORDS DISTINCT OTHER KEYWORDS 5 18 Longitudinal study of FourNetSciResearchers The largest speed increases from the database functionality can be found in the extraction of networks First compare the results of a co authorship extraction with those from Section 5 1 4 2 Author Co Occurrence Co Author Network Run Data Preparation Database ISI Extract Authors Extract Co Author Network followed by Analysis gt Networks gt Network Analysis Toolkit NAT Notice that both networks have 247 nodes and 891 edges Visualize the extracted network in GUESS using Visualization Networks GUESS and Layout GEM To apply the default co authorship theme go to Script gt Run Script and find yoursci2directory scripts GUESS co author nw database py The resulting network will look like Figure 5 11 The database allows for several network extractions that cannot be achieved with the text based algorithms Journal co citation networks reveal which journals are cited together the most frequently Run Preparation Science of Science Sci Tool User Manual Version Alpha3 63 Database gt ISI gt Extract Authors gt Journal Co Citation Network core and references to create a network of co cited journals and then prune it using Preprocessing gt Networks gt Extract Edges Above or Below Value with t
71. University Northeastern University University of Michigan http nwb slis indiana edu accessed on 3 10 2009 Cyberinfrastructure for Network Science Center 2009 Network Workbench Tool User Manual 1 0 0 beta http nwb slis indiana edu Docs NWB manual 1 0 0beta pdf accessed on 04 13 2009 Science of Science Sci Tool User Manual Version Alpha3 101 6 6 3 Rete Netzwerk Red Analyzing and Visualizing Scholarly Networks Using the Scholarly Database and the Network Workbench Tool 2009 By Katy Borner Bonnie Weixia Huang Micah Linnemeier Russell J Duhon Patrick Phillips Ninali Ma Angela Zoss Hanning Guo amp Mark A Price The enormous increase in digital scholarly data and computing power combined with recent advances in text mining linguistics network science and scientometrics make it possible to scientifically study the structure and evolution of science on a large scale This paper discusses the challenges of this BIG science of science also called computational scientometrics research in terms of data access algorithm scalability repeatability as well as result communication and interpretation It then introduces two infrastructures 1 the Scholarly Database SDB http sdb slis indiana edu which provides free online access to 20 million scholarly records papers patents and funding awards which can be cross searched and downloaded as dumps and 2 Scientometrics relevant plug i
72. Vol 79 1 45 60 http ivl slis indiana edu km pub 2009 boyack mapchem pdf Science of Science Sci Tool User Manual Version Alpha3 95 6 4 Modeling Science 6 4 1 113 Years of Physical Review Using Flow Maps to Show Temporal and Topical Citation 2008 Bruce W Herr Il Russell Jackson Duhon Katy Borner Elisha F Hardy amp Shashikant Penumarthy We visualize 113 years of bibliographic data from the American Physical Society The 389 899 documents are laid out in a two dimensional time topic reference system The citations from 2005 papers are overlaid as flow maps from each topic to the papers referenced by papers in the topic making intercitation patterns between topic areas visible Paper locations of Nobel Prize predictions and winners are marked Finally though not possible to reproduce here the visualization was rendered to and is best viewed on a 24 x 30 canvas at 300 dots per inch 3 Years of Physical Review Tics vr D anes pihe m imes of purah between 89 and 3005 The 92 esu n 2005 are overlad and Ten rie s x NJ to bo rude the left frd on the ans source anta te he heen Ones Cortana n 720 ead be pite M SR ems map in 1977 t do Myriad Redon hiredund fte Pyae and eade Astron omy Clasufcation Scheme PACS codes and the vinutkzathon ataie inte the topine PACS codes The The suli Nobel Prae rodh ruar thee 24 volumes linis artides bom 1877 to 2000 for w
73. Weak Component Clustering x Creates new graphs containing the top connected components Number of top clusters 1 i Cancel Make sure the newly extracted network is selected in the data manager and run Visualization gt Networks gt GUESS followed by Layout gt GEM Use a custom python script to color and size the network The resulting network is shown in Figure 5 27 Engufd PaulT gt a K E p e SS R R oK f XN ae Node Size and Color bv i 1 E NC S Betweenness Centrality k 10311900 A Y E n 4 gt Q 5743080 6488 Figure 5 27 The largest component of MEDLINE Co authorship Network about RNAi Science of Science Sci Tool User Manual Version Alpha3 79 To visualize the citation patterns of patents dealing with RNAi load yoursci2directory sampledata scientometrics sdb RNAi USPTO_citation_table_ nwb_format csv asa standard csv file and run Data Preparation gt Text Files gt Extract Bipartite Network using the following parameters Wl Extract Directed Network Given a table this algorithm creates a directed network by placing a directed edge between the values in a given column to the values of different column First column cited patents E J Second column Text Delimiter n Zi Aggregate Function File C Documents and Settings scbweing Deskbop SoftwarefSciz scizMar27 10 Browse af ok caa Run Analysis Networks Unweig
74. YY K Text Column Authors Kl Text Separator to K Cancel This will produce a table named Burst detection analysis Publication Year Authors maximum burst level 1 Select the table in Data Manager and run Visualization gt Temporal gt Horizontal Bar Graph with parameters lll Horizontal Line Graph x Takes tabular data and generates PostScript for a horizontal line graph Label word v Start Date Start EE e e Je Le e End Date End Y Size By Strength Date Format Month Day Year Date Format U 5 e g 10 15 2010 M Page Width 8 5 Page Height 11 0 p I Scale Output e Cancel A PostScript file will be produced in the Data Manager EE Prefuse ISI File C Documents and Settings quohiDesktop aly Scie tukorialscientometrics isi BE Burst detection analysis Publication Year Authars maximum burst level 1 oe I PostScript Burst detection analysis Publication Year Authors maximum burst level 1 Right click the icon in the data manager and save it as PostScript as shown in Figure 5 22 3 Pick Ehe TE Data ng Raster Image conversion lossless 4l Select Cancel Details gt gt Figure 5 22 Saving a PostScript file Visualize the file using directions from Section 2 4 Saving Visualizations for Publication The result is given in Figure 5 23 The horizontal bar coding and labeling is as follows Science of Science Sci Tool User Manual
75. aboration weight line color year of first co authorship node color number of citations node size number of papers Weighted co author network for papers published in 74 04 Reference B rner Katy Luca Dall Asta Weimao Ke amp Alessandro Vespignani 2005 Studying the Emerging Global Brain Analyzing and Visualizing the Impact of Co Authorship Teams Complexity Special Issue on Understanding Complex Systems Vol 10 4 57 67 Science of Science Sci Tool User Manual Version Alpha3 90 6 3 3 Mapping Indiana s Intellectual Space This project aimed to identify pockets of innovation pathways that ideas take to make it into products and existing academia industry collaborations Submitted and awarded proposals for 2001 2006 were overlaid ona map of Indiana Geolocations of academic investigators are given in red industry collaborators are in yellow Cirlce size denotes total award amount per geolocation Linkages are color and line coded to distinguish within academia within industry and academia industry collaborations The interactive interface supports the selection of different years resulting in year specific data overlays clicking on any circle which brings up a table with all proposals and awards for this geolocation together with their titles investigators and dollar amounts Po Fort Wayne C Academic O Richmond Greenfield a C Industry La 4 M r ej 53 059 833 a wo 1 J Ipu
76. ach time step Older nodes with a higher degree have a higher probability of attracting edges from new nodes The probability of attachment is given by Psi gs The initial number of nodes in the network must be greater than two and each of these nodes must have at least one connection The final structure of the network does not depend on the initial number of nodes in the network The degree distribution of the generated network is a power law with a scaling coefficient of 3 Barab si and Albert 1999 Barab si and Albert 2002 Figure 4 17 shows the network on the left and the probability distribution on a log log scale on the right Figure 4 17 Scale free graph left and its node degree distribution right This is the simplest known algorithm to generate a scale free network It can be applied to model undirected networks such as the collaboration network among scientists the movie actor network and other social networks where the connections between the nodes are undirected However it cannot be used to generate a directed network The inputs for the algorithm are the number of time steps the number of initial nodes and the number initial edges for a new node The algorithm starts with the initial number of nodes that are fully connected At each time step a new node is generated with the initial number of edges The probability of attaching to an existing node is calculated by dividing the degree of an existing node by the total n
77. ame ni N original Acciaiuoli priorates 53 stroke cadetb Q 7 style 2 emm 7 totalities 2 Tornabuoni Acciaiuoli visible true Oo ni m Ginori A 4 Salviati Pazzi idolfi Ve wealth 10 X width 5 6635 x 112 01 y 70 315 oh Pucci gt gt gt resizeLinear totalities 5 20 gt gt gt colorize wealth white red gt gt gt Interpreter m Graph Modifier ri Figure 4 14 Using the GUESS Interpreter 4 9 4 2 DrL Large Network Layout DrL is a force directed graph layout toolbox for real world large scale graphs up to 2 million nodes Davidson Wylie et al 2001 Martin Brown et al in preparation It includes e Standard force directed layout of graphs using algorithm based on the popular VxOrd routine used in the VxInsight program e Parallel version of force directed layout algorithm e Recursive multilevel version for obtaining better layouts of very large graphs e Ability to add new vertices to a previously drawn graph It is one of the few force directed layout algorithms that can scale to over 1 million nodes making it ideal for large graphs However small graphs hundreds or less do not always end up looking good The algorithm expects similarity matrices as input Distance matrices will have to be converted before they can be laid out The version of DrL included in Sci only does the standard forc
78. ameters applied and clearest insight achieved INTERPRETATION i 7 N PA W 4 Tadi P en ee n Bah et buta Brie V Workflow Design selecting Datasets Algorithms and Parameters FO Fe L elerence acte j o 1 Data ACQUISITION Data ANALYSIS MODELING 4 Lavour Data COMMLNICATION amp PREPROCESSING VISUALIZATION LAYERS Figure 4 1 Needs driven workflow design using a modular data acquisition analysis modeling visualization pipeline as well as visualization layers Note that the visualization layers interact with other workflow elements such as analysis algorithms e g network analysis algorithms that compute additional node edge attributes for graphic design clustering techniques that indentify cluster boundaries or layout algorithms e g network layouts that compute a spatial reference system Subsequently we detail major workflow elements and different types of analysis 4 2 Data Acquisition and Preparation Typically about 80 percent of the total project effort is spent on data acquisition and preprocessing yet well prepared data is mandatory to arrive at high quality results Datasets might be acquired via questionnaires crawled from the Web downloaded from a database or accessed as continuous data stream Datasets differ by their coverage and resolution of time days months years geography languages and or countries considered and topics disciplines and selected journal set
79. an be found in scientometrics isi EugeneGarfield isi Science of Science Sci Tool User Manual Version Alpha3 25 Sl en Scenie Feuka Mozila ee E alls De Gi wee Hueey Geckhs Took Hep GD C Xx ar hpi rear e edm iere Cet tie eg Lae Ly A mase F L Mist Vabad Mf geing meted i Latest Headlines ISL Web of Enemiga v4 Webb i TEM Socal netecrk anayen and cation neteork analysis Complementary ipesr ciurg L Pp dludy el ieee Cornett 5X MET Author s Mason LS Garield E Hargens LL et ai Bounce ASIST HEE PEXHCEETHNGS OF THE SGTH ASET AHHUAL MEETING VOL 40 2096 HUMANS OM TKH FLGCHNSOLGOGUT FERA LAS POTS ADU oos cens PSO DG OI TER BT AHHUAL BA THG woii 40 Papes MEJAN Pusished 2003 Times Chet d Linke 1 525 Show 10 per page LE Page t 0159 fe bH ert ie Lasst Chats Output Records Sep t Sep E Hep x F aiii rar js rmm f Loose Tii ree d expert t libcr mcs ee eee an neanii on page F phr anne Prong Reema Aid po arlond dis I fh eee i r po iT raine ires o ndn Mls mecond L D PE FS H E s E EET TUDIN Save vo same Pian PreCi I V Sca neconda eg OP Bras er Ea eos bii viter if mnm Eih Science BYS TV ISIT 00018027 73800002 c Harmon G Garfield E Paris G Marchionini G Fagan J Toms EG Bioinformatics in information science education ASIST 2002 PROCEEDINGS OF THE 65TH ASIST ANNUAL MEETING VOL 39 2002 PROCEEDING
80. and login via http sci slis indiana edu sci2 Make sure to select your operating system from the pull down menu see Figure 2 1 ex TSS Oe a Pah i i a a Pe ad a P i C sie ii i n t T iy ty ts a re ee NL i i v n T A S UR Sur pi E 2 n io SE m rap S Fe Nas ci Too AW A tool for science of science research amp practice Fa zm Welcome Katy Boma m Sci Tool Alpha 2 March 11th 2010 fr Windows XP amp Vista Windows XP amp Vista 32 bit Linux gcj Intel Mac OSX Dece G3 G4 G5S Mac OSX F 64 bit Linux Windows MP amp Vista Figure 2 1 Downloading the Sci Tool Save the zip file in a new empty yoursci2directory directory and extract all files After the files have been extracted double click scipolicy exe in yoursci2directory directory to run the program Mame Size Type features File Folder O sampledata File Folder C scripts File Folder converter test Files File Falder licenses File Folder workspace File Folder plugins File Folder logs File Folder configuration File Folder database File Folder eclipseproduck 1KB ECLIPSEPRODLICT File lt scipolicy exe SZKB Application scipolicy ini 1KB Configuration Settings Figure 2 2 Click scipolicy exe to run the Sci Tool The Sci Tool requires Java SE 5 version 1 5 0 or later to be pre installed on your local machine You can check the version of your Java installation by run
81. arliest and most recent co citations Extract Journal Co Citation Network core and references Extracts a weighted undirected network with journals as nodes and edges between journals which have been cited together by a common document Edge weight is determined by the number of times two journals are cited together and edges are appended with the publication years of their earliest and most recent co citations Extract Author Co Citation Network Extracts a weighted undirected network with authors as nodes and edges between authors who have been cited together by a common document Edge weight is determined by the number of times two authors are cited together and edges are appended with the publication years of their earliest and most recent co citations Extract Author Bibliographic Coupling Network Extracts a weighted undirected network with authors as nodes and edges between authors who cite a common reference Edge weight is determined by the number of common references between authors Extract Document Bibliographic Coupling Network Extracts a weighted undirected network with documents as nodes and edges between documents which cite a common reference Edge weight is determined by the number of common references between documents Extract Journal Bibliographic Coupling Extracts a weighted undirected network with journals as nodes and edges between journals whose documents cite a common reference Edge weigh
82. ata appear interlinked in one map We start with an overview of related work and a discussion of available techniques A concrete example grant and publication data from Behavioral and Social Science Research one of four extramural research programs at the National Institute on Aging NIA is analyzed and visualized using the VxInsight visualization tool The analysis also illustrates current existing problems related to the quality and existence of data data analysis and processing The article concludes with a list of recommendations on how to improve the quality of grant publication maps and a discussion of research challenges for indicator assisted evaluation and funding of research A Da aA re Ro n e n ee n E n Vp Wai N AN E Ae A Nm p RN cn Pru T AS E e Fan n fig dn OG e d V f TATA y ge e e Om a ond deas Tarr ML X DUE CUPS SS Sep e7 T ain Lm PI a 1 Vw z 2 1 Fo 4 S WF diu he ul dn am Se re st Vat 4 SEK 424 10 Sess ee n n i a ua LA Z 4 D ar pd PUT dario Oe anf A E tend ut TU ACE m Cz ic m a P nt CT un m cr mW 3 Dp um TV 2 I Fi 20 T P LE S OUTE ieee a JGHIentWwWOT i er A dio a m z a a BF DUE Pi uc i Ta ye Bas UE VL NC ND CI RC ls s Mc al M di a v cw pem ub E o Arn RE Du uir e a d p em n A ENS a E aurcm J A Pe z o 2 NS PUCTE SEOUL A kva y 7 A c Dy P n are Sa eae f E a t de N AX NA 2 r Pow MOIS li m apud
83. ata soosima 105 7 3 Creating and Sharing New Algorithm Plugins ccccccccssseecceceeessececceeesecceeeeesececessaeseceesseaaeeeesseaaeeeessaees 105 JA Took That Use OSGi and or SREI usted cavetca diese cto vbt dee erbe eec ite Can ee b tate des eee ies C e ono 106 8 Relevant Datasets and Tools ccce eee eee eee ee eee eee eese sssssessssssssssssssessess s LO Z NEA LEE RR TONER 107 8 2 sNELWORCANGIVSIS OOIS TUE E EEE AES 108 9 RETER GING OS aana rne LLL Science of Science Sci Tool User Manual Version Alpha3 4 1 Introduction The Science of Science Sci2 Tool http sci slis indiana edu is a modular toolset specifically designed for the study of science It supports the temporal geospatial topical and network analysis and visualization of datasets at the micro individual meso local and macro global levels Tables 1 1 and 1 2 show examples of different type studies at different levels several of which can be found in Chapter 6 Sample Science Studies amp Online Services Table 1 1 Major analysis types and levels of analysis Analysis Types Micro Individual Meso Local Macro Global and Sample 1 100 records 101 10 000 records 10 000 records Studies Statistical Individual persons Larger labs centers All of NSF all of US Analysis Profiling and their expertise universities research all of science profiles domains or states Temporal Funding portfolio o
84. b Reference B rner Katy Maru Jeegar amp Goldstone Robert 2004 The Simultaneous Evolution of Author and Paper Networks Proceedings of the National Academy of Sciences of the United States of America Vol 101 Suppl 1 5266 5273 http ivl slis indiana edu km pub 2004 borner tarl pdf Science of Science Sci Tool User Manual Version Alpha3 97 6 5 Accuracy Studies 6 5 1 Mapping the Backbone of Science 2005 By Kevin W Boyack Richard Klavans amp Katy Borner This paper presents a new map representing the structure of all of science based on journal articles including both the natural and social sciences Similar to cartographic maps of our world the map of science provides a bird s eye view of today s scientific landscape It can be used to visually identify major areas of science their size similarity and interconnectedness In order to be useful the map needs to be accurate on a local and on a global scale While our recent work has focused on the former aspect 1 this paper summarizes results on how to achieve structural accuracy Eight alternative measures of journal similarity were applied to a data set of 7 121 journals covering over 1 million documents in the combined Science Citation and Social Science Citation Indexes For each journal similarity measure we generated two dimensional spatial layouts using the force directed graph layout tool VxOrd Next mutual information values were calculated for
85. basi is also available in the respective subdirectories in yoursci2directory sampledata scientometrics and will be used subsequently Data from Google Scholar can be used for the following types of analyses Science of Science Sci Tool User Manual Version Alpha 3 28 o Statistical Attributes o Cites o Temporal Analysis o Year o Topical Analysis O Source o Title o Network Analysis o Authors 4 2 2 Datasets Funding 4 2 2 1 NSF Award Search Funding data provided by the National Science Foundation NSF can be retrieved via the Award Search site http www nsf gov awardsearch Search by PI name institution and many other fields see Figure 4 7 NI ENUTDLEETUEECUHEUDT UU SUCI G ee nena neo ehai Gc I pte vane res cent rone rh Ele Ede ye Fote Jek pep comet Pi Select Bie Et yes Fave Jek Heb p he Eu E HSF Bord Starch Ge x MEDI w de ESI C NSE Award Search Ee x Wisteria Awards 7 ES ai EX National Science Foundation ACE won Ed DISCOVER EGIK TAE Acre Awards Daha he Expired Awards On Reset HOHE FUMDIMG DISCOVERIES HEWS PUBLICATIONS STATISTICS AnDUT Fastlane Search Results Bark R sukd br portad by baidd daba with pha fF ARE dew AE FEES Tz Dick e i column hiding ta r dart the r dulni Award Search feed Cosenasks Acracd Fesch Help The up decem raaa ar rha eight of aas galanna nels parral sharkar rba narm iu auraadong ar
86. broader applicability and future directions are discussed s NIH Visual Browser sprs Labels on Search Lear m A7 309 matches zi Lt ijs E Surgical Procedures Lovinioad as Cay nr TTRAROT MCI T Hlead born Th Han Ad dlegdence screening in Afghan rata v pop Pl Todd Cath Weo1TWOOET O01 Risk Factors for HIV infection m ng Young Thai Min Pi Ramgen Ham oe ae 7 yi Cormmurnity bads ed VET for Pregangy Prenatal pTSD HIV in Ru ini Ji Mi wn s in a China Drif Sabet m Tay TEN Screening for STDs u us in d home sampling in Estonia Pl Uusgkula Annali Heath Diaparttiea Google Kl Network Based HIV Risk NIH Visual Browser abale of Pearce re xs er e He y Te a d Ta ATO Status amd Future of Auct panctung Research I0 Years Post NIH C onsensus Confer Pl Lao Leang naeh ai jr A uoa awe Martatonin in Rose L4 Facer ad na It Moaragerwurnt PI Al Y Nihal 4 ais ail Epigallocotechin 3 gallote in a a Anoma Mana gerri a PI T r Nihal Minaksn SELECT Pre Chink Trial ij Prostate Cancer F Ahtt inal Coogle eee bint Ei idis 4 mm Daa dic ierra vh arrari F SAPS fn i ME Ed m Cluster selection with results shown in the right hand column top and Query for NIH in the title with results shown on the map and in the right hand column bottom Reference Herr Il Bruce
87. cal Your FYE arcane Fra erri Pamat SS CAT21 ze Lise Y for wk Cured F be 310 study Section all MH Spending Category Q PAPA D State A EETECT amp Fermi FF AW ATE TULIT Or PAAR D Congressional District Principal ineestigator lest Marte Fired Here Lite S Moa vekicird Eg 7 Pubic Health Fleur F i ganizat ien Query Form MIH RePORTER IH Research Portfolio Online Reporting Tes Expenditures smi Renate Miorita meei slg xj dorm de mE Sear Hekta INI REPTI WN SOL i peanti Portia nie Repertog ef aperada and ema aala ORT E oS Se p mw Epe E W op ee tm fam rin Ij j he ja p dee Search Results Barent ah g pet Tsi E WPA EEEH eA T a UET apan eB PETI m 08 jor e un AH ran TI ag yer i3 Xm ram IH 25 2 2 7 1 ee X e a T i BEHA nr n ANIM RETE z A na ata NS nun ug atm ELIXTY EPEA BUAL gris Bt WRG Lee Da iia LRE DECHCIR z q irera E emm Ai L mom LAN Sh DF xu ux jo Ma Wt TEH LAN ARE TE TEM M MOLNT DW BL CF m ual MN en J MICE E WA LLL Ci NT TL 1 hh Figure 4 8 NIH RePORTER site A sample search of Epidemic in the Public Health Relevance field displays 205 results as of November 11 2009 Up to 500 results can be exported into CSV or Excel format using the Export button at the top of the page Save the file as a csv and load it into the Sci Tool using File gt Load File to perform temporal or top
88. cs University of Paris Sud Topical Area s Informatics Complex Network Science and System Research Physics Statistics Epidemics Analysis Type s Co Authorship Network Science of Science Sci Tool User Manual Version Alpha3 47 The Sci Tool supports the analysis of evolving networks For this study load Alessandro Vespignani s publication history from ISI that was downloaded from Thomson s Web of Science see Section 4 2 1 Datasets Publication and is available at yoursci2directory sampledata scientometrics isi AlessandroVespignani isi using File gt Load and Clean ISI File Slice the data into five year intervals from 1988 2007 using Preprocessing gt Temporal gt Slice Table by Time and the following parameters W Slice Table by Time i X Slice a table into groups of rows by time Date Time Column Publication Year K Date Time Format Pa ooo K Slice Into rears 9 Haw Many E K From Time hse oo Ta Time Eme K Align with Calendar Week Starts On sunday P Cancel Choose Publication Year in the Date Time Column field and leave the default Date Time Format Slice Into allows the user to slice the table by days weeks months quarters years decades and centuries There are two additional options for time slicing cumulative and align with calendar The former produces cumulative tables containing all data from the beginning of the time rang
89. cy rates to internal and external events Temporal analysis aims to identify the nature of phenomena represented by a sequence of observations such as patterns trends seasonality outliers and bursts of activity A time series is a sequence of events or observations that are ordered in time Time series data can be continuous i e there is an observation at every instant of time or discrete i e observations exist for regularly or irregularly spaced intervals Temporal aggregations over journal volumes years or decades are common Frequently some form of filtering is applied to reduce noise and make patterns more salient Smoothing i e averaging using a smoothing window of a certain width and curve approximation might be applied The number of scholarly records is often plotted to get a first idea of the temporal distribution of a dataset It might be shown in total values or as a percentage of those One may find out how long a scholarly entity was active how old it was at a certain point what growth latency to peak or decay rate it has what correlations with other time series exist or what trends are observable Data models such as the least squares model available in most statistical software packages are applied to best fit a selected function to a data set and to determine if the trend is significant Kleinberg s burst detection algorithm Kleinberg 2002 is commonly applied to identify words that have experienced a sudd
90. d Science of Science Sci Tool User Manual Version Alpha3 106 8 Relevant Datasets and Tools 8 1 Datasets Aggregate e NanoBank http www nanobank org e NWB Datasets https nwb slis indiana edu community n Datasets HomePage e Scholarly Database http sdb slis indiana edu Publications e Google Scholar http scholar google com e ISI Web of Knowledge Web of Science http apps isiknowledge com e JSTOR http www jstor org e Ley Michael The DBLP Computer Science Bibliography Universitat Trier e National Federation of Abstracting and Information Services 1990 NFAIS abstract dataset 1957 1990 Available at http www nfais org e Office of Inspector General http www oig hhs gov fraud exclusions as e Psychlnfo http www apa org psycinfo e PubMed http www ncbi nlm nih gov pubmed e PubMed Central http www pubmedcentral nih gov e Research Papers in Economics http repec org e Scopus http www scopus com e Stanford Linear Accelerator Center 2009 SPIRES HEP Database SPIRES HEP http www slac stanford edu spires Patents e cole Polytechnique F d rale de Lausanne 2009 CEMI s PATSTAT Knowledge Base http wiki epfl ch patstat e EPO http www epo org patents patent information subscription gpi html e Patent Lens Initiative for Open Innovation http www patentlens net daisy patentlens patentlens html e PATSTAT http wiki epfl ch patstat whatis e Public
91. d Science of Science Sci Tool User Manual Version Alpha3 14 source Citations are via documents within sources Nodes include all data from the SOURCE table and edges are weighted by the number of citations between sources Extract Document Co Citation Network core only Extracts a weighted undirected network with documents as nodes and edges between documents which have been cited together Only those documents with entries in the dataset are included in the network Edge weight is determined by the number of times two articles are cited together and edges are appended with the publication years of their earliest and most recent co citations Extract Document Co Citation Network core and references Extracts a weighted undirected network with documents as nodes and edges between documents which have been cited together Edge weight is determined by the number of times two articles are cited together and edges are appended with the publication years of their earliest and most recent co citations Extract Journal Co Citation Network core only Extracts a weighted undirected network with journals as nodes and edges between journals which have been cited together by a common document Only those journals containing documents with entries in the dataset are included in the network Edge weight is determined by the number of times two journals are cited together and edges are appended with the publication years of their e
92. d Gogineni Malcolm LeComp O James Bower Qo 3204702 371850 10806925 Simon Catterall Q Carrie Billy Karl Barnes 10806925 35720 Edge Size amp Color Co investigated awards A a j MEthew Alexander Ramirez 2 David Wise O O 9 1 SELENA SINGLE TON Algirdas Kuslikis EET O Randall Bramle E an y 3 1 Andrew Lumsdaine Figure 5 6 Co Pl network of Geoffrey Fox in Indiana University E eel McRobbie e Dennis Ganna allickara ichard Alo Thomas Prince Science of Science Sci Tool User Manual Version Alpha3 52 Beth Plale s Co PI network Node Size amp Color Total award money C Andrew Lumsdai David Wise O 7224522 Randall Bram Stephen Simms 2180678 432954 Craig Stewart O Geoffrey Brown 10806925 131746 Beth Plale Bradley Wheeler Edge Size amp Color Dennis Gannon Co investigated awards rs Yogesh Si Suresh Marru 2 David Leake l E J Figure 5 7 Co PI network of Beth Plale in Indiana University OC Marion Pierce Michael McRobbie s Co PI network Node Size amp Color Total award money eo Anurag Shankar t Stephen Simms Craig Stewart 1 961 1 1 78 BN 3132596 6 394000 Geoffrey Fox P anew Paaa 19011178 394000 Edge Size amp Color Co investigated awards Christopher Peebles PEE CNN Donald McMullen Ds 3 Andrew Dillon e Garland
93. display MAE OSGI Plugins TEXTrend http www textrend org lead by George Kampis E tv s University Hungary develops a framework for the easy and flexible integration configuration and extension of plugin based components in support of natural language processing NLP classification mining and graph algorithms for the analysis of business and governmental text corpuses with an inherently temporal component Kampis Gulyas et al 2009 TEXTrends recently adopted OSGi CIShell for the core architecture and the first seven plugins are IBMs Unstructured Information Management Architecture UIMA http incubator apache org uima the data mining machine learning classification and visualization toolset WEKA http www cs waikato ac nz ml weka Cytoscape Arff2xgmml converter R http www r project org via iGgraph and scripts http igraph sourceforge net and yEd Upcoming work will focus on integrating the Cfinder clique percolation analysis and visualization tool http www cfinder org workflow support and web services Note that the Sci Tool uses plugins from several other efforts tools such as the Information Visualization cyberinfrastructure http iv slis indiana edu the Network Workbench http nwb slis indiana edu and TEXTrend As the functionality of OSGi CIShell based software frameworks improves and the number and diversity of dataset and algorithm plugins increases the capabilities of custom tools will expan
94. ds with the phrase breast cancer and not records where breast and cancer are both present but not the exact phrase The importance of a particular term in a query can be increased by putting a and a number after the term For instance breast cancer 10 would increase the importance of matching the term cancer by ten compared to matching the term breast Be DE fee uber Baiar jab hn BE c x E iim de omm rebana med r Ten Pap PDA Pagid er esis Ds sa acidi Ficus ee Di pli Aen Rere tie Te e iji Hie sm SCHOLARLY DATABASE Crimiinlrmsirmcimrg Ton Nehari Jeinas Gentes BER kilbi Uni reine Beinini 700 000 Medline 600 000 500 000 c 400 000 2 x z 300 000 sagire imn ste Ip ihar Pee wins M 200 000 USPTO Firasr Doy EE UY Es ae LELLILAIT apres S 100 000 ie ra NIH biis pied iri rii A wr S c bd Seen mY ics ter ean 7 g 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 C10 She aTi SChEHCE as Publication Year Figure 4 9 Scholarly Database Home page and data holdings in March 2010 scholarly Database z Search Mozilla Firefor igi xi ELT LT Eia ILL igi xi De ER yew Mgoy Booimarks Tools ep Be Eck View Moy Bookmarks Tools Help G Bexa 3 mmpusbserdweehtew ij we laser ae loeo P Q X a mptisdossndana eduiseschiresitsiae sustanablity AND yei 7 loe P A Most viskted cetung Stated o Lates nesdies A Mos
95. e CY Meeting Date VL Volume o PY Year o Geospatial Analysis o AD Address C1 Author Address CL Meeting Location PA Publisher Address PI Publisher City o RP Reprint Address o Topical Analysis o AB Abstract BS Book Series Subtitle SE Book Series Title CT Conference Title ID Index Keywords CT Meeting Title MH MeSH Terms A2 Other Abstract SO Source TI Title FT Vernacular Title o Network Analysis AU Author CR References IV Investigators AN PubMed ID Q O O Q O O O O OO O0 O 0 0 0 Q O 0 O Oo UO Science of Science Sci Tool User Manual Version Alpha3 26 4 2 1 4 Scopus Elsevier s Scopus like Thomson Reuter s Web of Science has an extensive catalog of citations and abstracts from journals and conferences Subscribers to Scopus can access the service via http www scopus com To find all articles whose abstract title or keywords include the terms Watts Strogatz Clustering Coefficient simply enter those terms in the Article Abstract Keywords field Twenty five results were found as of November 11 2009 Download up to 2 000 references by checking the Select All box and clicking Output Die Ed Mew Hoy Boones Ioe Heb BB c x fal 2 hip reve nope com Formes url Mast ike ile cettrg Sorted s Labest Headines 3 Scopars Basic Search Basic Search Affiliation Search Advanced Search CE Search Tips Search for we
96. e and dynamics of science We also demonstrate the application of large scale network layout using DrL All papers published in Scientometrics between its first appearance in 1978 and the end of 2008 was downloaded from the ISI database see Section 4 2 1 Datasets Publication The data is available at yoursci2directory sampledata scientometrics isi Load the data using File gt Load and Clean ISI Select the loaded dataset in the Data Manager window and run Data Preparation gt Text Files gt Extract Paper Citation Network Two files will appear in the Data Manager window the paper citation network and the paper information table Select the Extracted paper citation network and run Preprocessing gt Networks gt Extract Nodes Above or Below Value with the following parameters B Extract Nodes Above or Below Value X Extract all nodes with an attribute above a certain number Extract From this number 1 0 a Below Numeric Attribute globalCitationCount Cancel The produced network contains only the original ISI records Select this file and run Data Preparation gt Text Files gt Extract Document Co Citation Network Examining the result file with Analysis gt Networks gt Network Analysis Toolkit NAT shows that there are 2056 nodes 26070 edges and 775 isolates in the network Run Preprocessing gt Delete Isolates to remove all the isolates Because this network
97. e degree distribution right Small world properties are usually studied to explore networks with tunable values for the average shortest path between pairs of nodes and a high clustering coefficient Networks with small values for the average shortest path and large values for the clustering coefficient can be used to simulate social networks unlike ER random graphs which have small average shortest path lengths but low clustering coefficients The algorithm requires three inputs the number n of nodes of the network the number k of initial neighbors of each node the initial configuration is a ring of nodes and the probability of rewiring the edges which is a real number between 0 and 1 The network is built following the original prescription of Watts and Strogatz i e by starting from a ring of nodes each connected to the k nodes and by rewiring each edge with the specified probability The algorithm run time is O kn Run with Modeling gt Watts Strogatz Small World and input 1000 nodes 10 initial neighbors and a rewiring probability of 0 01 then compute the average shortest path and the clustering coefficient and verify that the former is small and the latter is relatively large 4 10 3 Barab si Albert Scale Free Model The Barab si Albert BA model is an algorithm which generates a scale free network by incorporating growth and preferential attachment Starting with an initial network of a few nodes a new node is added at e
98. e directed layout no recursive or parallel computation DrL expects the edges to be weighted and directed where the non zero weight denotes how similar the two nodes are higher is more similar The Sci version has several parameters The edge cutting parameter expresses how much automatic edge cutting should be done 0 means as little as possible 1 as much as possible Around 8 is a good value to use The weight attribute parameter lets you choose which edge attribute in the network corresponds to the similarity weight The X and Y parameters let you choose the attribute names to be Science of Science Sci Tool User Manual Version Alpha3 40 used in the returned network which corresponds to the X and Y coordinates computed by the layout algorithm for the nodes DrL is commonly used to layout large networks e g those derived in co citation and co word analyses In the Sci Tool the results can be viewed in either GUESS or Visualization gt Specified prefuse alpha For more information see https nwb slis indiana edu community n VisualizeData DrL 4 10 Modeling Why Data models are grouped into two major types descriptive models and process models Descriptive models aim to illustrate the major features of a typically static data set such as statistical patterns of article citation counts networks of citations individual differences in citation practice the composition of knowledge domains or the identification of res
99. e or Below Value Extract Top Edqes R self L 361 records with unique ISLIDs are available wia Day ob M era Trim by Degree Wrote log to MST Pathfinder Network Scaling EZ Scheduler Fast Pathfinder Network Scaling Remove From List Remove completed autor snowball sampling in nodes m Node Sampling Edge Sampling Algorithm Name Date Symmetrize Extract Co Author Metus 03 26 72 Dichatamize v Load and Clean ISI File 0326 2 aata ed e M M 4 m k Figure 2 3 Sci Tool interface components 2 2 1 Menus The Sci Tool top menu structure reflects a typical workflow The File menu on the left allows a user to load data in many different formats The data files can then be prepared Data Preparation preprocessed Preprocessing analyzed Data Analysis and finally visualized Visualization Users also have the option of Modeling new networks The Help menu leads to documentation and information about the tool All seven menus are explained below Data manipulation menus are further organized by the different types of analysis such as General Temporal Geospatial Topical and Networks 2 2 1 1 File The File menu functionality includes loading multiple data formats see section 2 3 Data Formats for details loading ISI and NSF data into a database saving and viewing results and merging or splitting node and edge files Load and
100. e to the end of each table s time interval which can be seen in the Data Manager and below 101 Unique ISI Records slice from beginning of 1988 to end of 2007 101 records slice from beginning of 1988 to end of 2002 72 records slice from beginning of 1988 to end of 1997 33 records ER slice from beginning of 1988 to end of 1992 b records The latter option aligns the output tables according to calendar intervals Es 99 Unique ISI Records s tH slice From beginning of 2003 bo end of 2007 7 records HEH slice Fram beginning of 1998 to end of 2002 12 records PU slice From beginning of 1993 to end of 1997 4 records ER slice From beginning of 1988 to end of 1992 17 records Choosing Years under Slice Into creates multiple tables beginning from January 1 of the first year If Months is chosen it will start from the first day of the earliest month in the chosen time interval To see the evolution of Vespignani s co authorship network over time check cumulative Then extract co authorship networks one at a time for each sliced time table using Data Preparation gt Text Files gt Extract Co Author Network making sure to select ISI from the drop down menu during the extraction Visualize the evolving network using GUESS as shown in Figure 5 4 Science of Science Sci Tool User Manual Version Alpha3 48 Rietronero L lt lt _ _ 71
101. e w From To Do Resize Linear Interpreter Graph Modifier j Figure 4 12 GUESS Information Visualization and Graph Modifier windows 4 9 4 1 1 Network Layout and Interaction GUESS provides different network layout algorithms under menu item Layout Apply Layout gt GEM to the Florentine network Use Layout gt Bin Pack to compact and center the network layout Using the mouse pointer hover over a node or edge to see its properties in the Information window Right clicking on a node gives the options to Center on Color Toggle Label Remove Add Modify Field and Copy as Variable see Figure 4 12 GUESS supports different types of interaction Science of Science Sci Tool User Manual Version Alpha3 38 Pan simply grab the background by clicking and holding down the left mouse button and move it using the mouse Zoom Using the scroll wheel on the mouse OR press the and buttons in the upper left hand corner OR right click and move the mouse left or right Center graph by selecting View gt Center Click to select move single nodes Hold down Shift to select multiple Right click node to modify Color etc Use the Graph Modifier panel to change node attributes e g Select all nodes in the Object drop down menu and click Show Label button Select Resize Linear Nod
102. earch fronts as indicated by new yet highly cited papers Process models or predictive models aim to simulate statistically describe or formally reproduce statistical and dynamic characteristics of interest 4 10 1 Random Graph Model The random graph model generates a graph that has a fixed number of nodes which are connected randomly by undirected edges see Figure 4 15 left The number of edges depends on a specified probability The edge probability is chosen based on the number of nodes in the graph The model most commonly used for this purpose was introduced by Gilbert Gilbert 1959 This is known as the G n p model with n being the number of vertices and p the linking probability The number of edges created according to this model is not known in advance Erd s R nyi introduced a similar model where all the graphs with m edges are equally probable and m varies between 0 and n n 1 2 Erd s and R nyi 1959 This is known as the G n m model The degree distribution for this network is Poissonian see Figure 4 15 right P k Figure 4 15 Random graph and its Poissonian node degree distribution Very few real world networks are random However random networks are a theoretical construct that is well understood and their properties can be exactly solved They are commonly used as a reference e g in tests of network robustness and epidemic spreading Batagelj and Brandes In the Sci Tool the random graph generator impl
103. east cancer and not records where breast and cancer are both present but the exact phrase The importance of a particular term in a query can be increased by putting a and a number after the term For instance breast cancer 10 would increase the importance of matching the term cancer by ten compared to matching the term breast Most Weed fe Getting Started i Latest Headlines Customize Links Fg Find Cheap Fight Tek SCHOLARLY DATABASE Cyberinfrastructure for Network Science Cender SLS Indiana University Bloomington Search Edit Profile About Logout Download Results i E sdb download El xl File Edit View Favorites Tc d F E rz Euvri all gt Dala Dh liar fangs File Beenlood ennn nagie Phare at egani heen tee fallowing datahapedi Medline Database Qm O A f Medline _author_table csv pm E Hadline Maski haad table 1 F Medline Made qaalifiar table LE ee i d i Medline co author table wb Format csv FF Headline master table EE l AE i Medline master_table csy LL sdb theta zip 2j NIM Database which it a ZIP He i Medline_MeSH_heading_table csv Freres heip edb er inna sdu ze Fonte master table EE ser i Medline_MeSH_qualifier_table csv z C Wit ed Firafao de wih thet ia ANIH Tees MSF Database master tanie cesv C opnun deme MMC FF Hor escinvartiautor tabla nub Forma LE C SAM NSF co investigator table nwb Format csv E HEF mast
104. eduction techniques are commonly used to project high dimensional information spaces i e the matrix of all unique papers multiplied by their unique terms into a low typically two dimensional space 4 8 1 Word Co Occurrence Network The topic similarity of basic and aggregate units of science can be calculated via an analysis of the co occurrence of words in associated texts Units that share more words in common are assumed to have higher topical overlap and are connected via linkages and or placed in closer proximity Word co occurrence networks are weighted and undirected Science of Science Sci Tool User Manual Version Alpha3 34 4 9 Network Analysis With Whom The study of networks aims to increase our understanding of natural and manmade networks It builds on social network analysis physics information science bibliometrics scientometrics econometrics informetrics webometrics communication theory sociology of science and several other disciplines Authors institutions and countries as well as words papers journals patents and funding are represented as nodes and their complex interrelations as edges Nodes and edges can have time stamped attributes Figure 4 12 shows a sample dataset of five papers A through E published over three years together with their authors named x y z references blue references cite papers outside this set and citations green citation links are made by future papers as well as
105. egree Strength Distribution Weight Distribution Randomize Weights Blondel Community Detection Extracts a hierarchical community structure of a large network HITS Computes authority and hub score for every node o Unweighted amp Directed Node Indegree Appends the number of incoming edges to each node Node Outdegree Appends the number of outgoing edges to each node Indegree Distribution Builds a histogram of the values of the indegree of all nodes Outdegree Distribution Builds a histogram of the values of the outdegree of all nodes K Nearest Neighbor Calculates the correlation between the degree of a node and that of its neighbors and then appends that value to each node Single Node In Out Degree Correlations Calculates the correlations between indegree and outdegree of a node Dyad Reciprocity The ratio of dyads with a reciprocated tie to dyads with any tie Arc Reciprocity The ratio of reciprocal edges to total edges Adjacency Transitivity The ratio of transitive triads to intransitive triads triads missing one edge Weak Component Clustering Extracts the N largest weakly connected components of a network Strong Component Clustering Extracts the N largest strongly connected components of a network Extract K Core Extracts the k K Core from a graph The k K Core is what remains of the graph after every node with fewer than k edges connected to it is removed from the graph recursive
106. elati POWRE Applying Database Techniques to Managem 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 1597 Ls 1999 LE AX QT JR SH A5 DOG AU AK JO AM ADI Figure 5 5 Funding profiles over time of Geoffrey Fox top Beth Plale middle and Michael McRobbie bottom at Indiana University Science of Science Sci Tool User Manual Version Alpha3 51 Next we compare the Co PI networks Select each dataset in the Data Manager window and run Data Preparation gt Text Files gt Extract Co Occurrence Network using these parameters E Extract Network from Table i xj Extracts a network From a delimited table Column Mame fan Investigators T Text Delimiter F Aggregation Function File 0 Documents and Settings quoh Desktop scipolicy_windows scipolicy sampledata scientometrics properties nsFCoPT properties Browse Cancel Run Visualization gt Networks gt GUESS on each generated network to visualize the resulting Co PI relationships Select GEM from the layout menu to organize the nodes and edges To color and size the nodes and edges using the default Co PI visualization theme run yoursci2directory scripts GUESS co Pl nw py from File gt Run Script Geoffrey Fox s Co PI network Node Size amp Color Wojtek Furmanski Total award money e Ruth Small e Michael Milan OQ Shepherd Christof Koc Fredrick Shair 1l e A Alan Middleton ayabnase
107. elds PACS 4 Electromagnetism Optics Acoustics L7 Physical Review B Heat Transfer Classical Mechanics and Fluid Dynamics Bn uc n ovo PACS 2 Nuclear Physics PACS 3 Atomic and Molecular Physics PACS 5 Physics of Gases Plasmas and gn v Electric Discharges PACS 9 Geophysics Astronomy and Ix Physical Review L Astrophysics E Review Special PACS epe ddr Matter Structure Ls Accelerated Beams Mechanical and T Properties PACS 7 Contant latis Bt mu Educational Research Structure Electrical Magnetic and Optical Properties B nos 113 Years of bibliogaphic data from the American Physical Society Reference Herr Il Bruce W Duhon Russell Jackson Borner Katy Hardy Elisha F amp Penumarthy Shashikant 2008 113 Years of Physical Review Using Flow Maps to Show Temporal and Topical Citation Patterns Proceedings of the 12th Information Visualization Conference IV 2008 London UK July 9 11 IEEE Computer Society Conference Publishing Services pp 421 426 http ivl slis indiana edu km pub 2008 herr phys rev pdf Science of Science Sci Tool User Manual Version Alpha3 96 6 4 2 The Simultaneous Evolution of Author and Paper Networks 2004 By Katy Borner Jeegar Maru amp Robert Goldstone There has been a long history of research into the structure and evolution of mankind s scientific endeavor However recent progress in applying the tools of science to understand science i
108. ements the G n p model by Gilbert Run with Modeling Random Graph and input the total number of nodes in the network and their wiring probability The output is a network in which each pair of nodes is connected by an undirected edge with the probability specified in the input A wiring probability of O would generate a network without any edges and a wiring probability of 1 with n nodes will generate a network with n 1 edges The wiring probability should be chosen dependent on the number of vertices For a large number of vertices the wiring probability should be smaller 4 10 2 Watts Strogatz Small World A small world network is one whose majority of nodes are not directly connected to one another but still can reach any other node via very few edges It can be used to generate networks of any size The degree distribution is almost Poissonian for any value of the rewiring probability except in the extreme case of rewiring probability zero for which all nodes have equal degree The clustering coefficient is high until beta is close to 1 and as beta approaches one the distribution becomes Poissonian This is because the graph becomes increasingly similar to an Erd s R nyi Random Graph see Figure 4 16 Watts and Strogatz 1998 Wikimedia Foundation 2009 Science of Science Sci Tool User Manual Version Alpha3 41 P k B t Lu pF a m 7 t I 4 4 4 Y k Figure 4 16 Small world graph left and its nod
109. en change in frequency of occurrence 4 6 1 Burst Detection A scholarly dataset can be understood as a discrete time series i e a sequence of events observations which are ordered in one dimension time Observations e g papers come into existence for regularly spaced intervals e g each month volume or year Kleinberg s burst detection algorithm identifies sudden increases in the usage frequency of words These words may connect to author names journal names country names references ISI keywords or terms used in title and or abstract of a paper Rather than using plain frequencies of the occurrences of words the algorithm employs a probabilistic automaton whose states correspond to the frequencies of individual words State transitions correspond to points in time around which the frequency of the word changes significantly The Science of Science Sci Tool User Manual Version Alpha3 33 algorithm generates a ranked list of the word bursts in the document stream together with the intervals of time in which they occurred This can serve as a means of identifying topics terms or concepts important to the events being studied that increased in usage were more active for a period of time and then faded away In the Sci Tool the algorithm can be found under Analysis Textual Burst Detection As the algorithm itself is case sensitive care must be taken if the user desires KOREA and korea and Korea to be identi
110. er table 7 USPTO Database E D Deaths aubsrsaticsihy For Files Mus Ehis rom re ors ENSF master table csv fa USPTO agent table csv FE eTa Patent Cooparation T bie E FAL USPTO_assignee_table csv E s rt E m sgLISPTO citation table nwb Format csv mai x Em foma E Sp LISPTO co inventor table wb Format csv E urti niet tabla oe maj En sgLISPTO inventor table css sadi e inu USPTO master burst Format csv P uro marter table n USPTO master table csv z 5 gLISPTO Patent Cooperation Treaty table csv Figure 5 26 Downloading and saving RNAi data from the Scholarly Database Science of Science Sci Tool User Manual Version Alpha3 78 To view the co authorship network of MEDLINE s RNAi records go to File gt Load and open yoursci2directory sampledata scientometrics sdb RNAi Medline co author table nwb format csv as a standard csv file SDB tables are already pre normalized so now simply run Data Preparation Text Files Extract Co Occurrence Network using the default parameters According to Analysis gt Networks gt Network Analysis Toolkit NAT the output network has 21 578 nodes with 131 isolates and 77 739 edges Visualizing such a large network is memory intensive so extract only the largest connected component by running Analysis gt Networks gt Unweighted and Undirected gt Weak Component Clustering with the following parameters lil
111. erges in a circular form The ordering of areas is as follows mathematics is arbitrarily placed at the top of the circle and is followed clockwise by physics physical chemistry engineering chemistry earth sciences biology biochemistry infectious diseases medicine health services brain research psychology humanities social sciences and computer science The link between computer science and mathematics completes the circle If the lowest weighted edges are pruned from this consensus circular map a hierarchical map stretching from mathematics to social sciences results The circular map of science is found to have a high level of correspondence with the 20 existing maps and has a variety of advantages over hierarchical and centric forms A onedimensional Riemannian version of the consensus map is also proposed Images of the 20 maps of science that were used in this study along with their codings The 20 maps are shown in the same order in which they are listed in Table 1 from upper left to lower right Reference Klavans R amp Boyack K W 2009 Toward a consensus map of science Journal of the American Society for Information Science and Technology 60 3 455 476 http sci slis indiana edu klavans 2009 JASIST 60 455 pdf Science of Science Sci Tool User Manual Version Alpha3 99 6 6 Databases and Tools 6 6 1 The Scholarly Database and Its Utility for Scientometrics Research 2009 By Gavin LaRowe Sumeet A
112. es totalities drop down menu then type 5 and 20 into the From and To Value box separately Then select Do Resize Linear Select Colorize Nodes totalities then select white and enter a 204 0 51 in the pop up color box boxes on in the From and To buttons Select Format Node Labels replace default text original label with your own label in the pop up box Enter a formatting string for node labels This will create the labels shown in Figure 4 13 visualization GUESS E Bl x File Edit Layout Script View Help Pucci Field Value color 255 25 fixed false height 5 0 image label Pucci labelcolor 0 0 0 labelsize 12 labelvi true name ni2 original Pucci priorates 0 stroke cadetb style 2 totalities 1 visible true wealth 3 width 5 0 b Barbadort Ginori Salviati bizzi Pd 1 Sa un amp MM Medici Tornabuoni Information Window a Acciaiuoli Pucci Fem Object Property lsbelsize Operator vale Colour Show Hide Sie ShowLabel HideLabel Change Label Format Node Labels Format Edge Labels Node Shape Center Change History Resize Linear Colorize d Interpreter Graph Modifier Figure 4 13 Using the GUESS Graph Modifier 4 9 4 1 2Interpreter The
113. eseuseesees 7 2 CGS VI Si Im 8 PAUNMENU QU cr 8 PAS EEG EIE m 9 Bis Waa NVA Ole EE EO T E E E E A EE E 9 2 25 SSEBe U a E A E E TAEA A E E T e NM ONIERNI RUNE 10 A DEA a E E E E E Hmm 10 2 4 Saving Visualizations for Publication ccccccccssssscccceesseceecseeecceeesaaeceessuaseceesseeseceessuaaeceessuaaaeeessuaaeeesseges 11 Pe E dB CE A E E E E E AE EE A E E E 11 3 Algorithm and Tool PIUBINS csccccscccsccscsceccccscscsccccscscsscccscscessccscesccsscscssessscsces LO 3 1 Sci Tool PIE e o mme 13 CINE ES VN A D aaa N N E EE AES 20 rr Mon Ao Nro gm E E UTE 20 331 Windows and LINUX sg ccc een et E E E E EE 20 Die E A A E A see A A S T E A A A A 20 A METO LINAS a E O E A E O EET A E 21 4 WORK OW DESIG aiincaccecscieran anemia cacauecuesacdeenresauwussiosumtecsesbranenuncmcerenussneineanetee QU insu E opo eo SOV OE 23 4 2 Data Acquisition and Pre Garage OM rccssssunscdonctsmvostmansvanendogseunenoensthwsnssnaysarnenduevennaroandvanessexvanenenansuaunntonssiaeneencess 23 4 2 1 Datasets PUDICATIONS rererere R OT ENTERESE EE 24 AA Doe FE ena AE E A AO 29 2 3 Datasets Scholar Dat aas saccce sascrtacecasccastacasacarcinneacue NEE E 30 4 3 Database Loading and Manipulation vusscoii vais ase ena osboavsansacoseusssnouanwieaedsievaasostoasemeaecousmesaneseevasanecusannoncwessee 33
114. ested policymakers as originally suggested by Hemmings and Wilkinson Visualization of Job Postings Map of Science Scientific domains are highly interconnected The boundaries between different domains are often fuzzy One way of thinking about the relationships between domains is to conceptualize all Y Hasith Professionals scientific domains as existing within a Math and 6 ssOhemistry large network of research H DA T EDD Creating a network of scientific research te B i can be accomplished by looking at ct xd Medical 9nacisiflez scientific journals and their articles The T G NS D i me et UCSD Map of Science used here is the oS Y product of a large study by researchers at LEnginesting and Computer Scenca E 75 eo n We dm the University of California San Diego SN o y FR iene UU e UNS using 7 2 million papers and over 15 000 SW I y Blotechnolagy 7 Brain Research 4 separate journals proceedings and jew s i e ris uis Thomson E and Scopus NUM over the five year period from 2001 to Offf mical Mechanical and Civil Engineering p y iniectipus Diseases Smal Salances 2005 The researchers used citations l between the papers and journals to cluster journals into small aroups of highly related journals Those clusters are represented by 554 individual nodes in the network The links between the clusters shaw that some clusters are related to other clus
115. etwork Analysis In the analysis menu certain algorithms append values to each node or delete groups of nodes and edges entirely When several algorithms are applied simultaneously their results can be compared by viewing them in GUESS Figure 4 23 below compares Weak Component Clustering Node Degree Node Betweenness Centrality and Pathfinder Network Scaling run on the FourNetSciResearchers dataset Weak Component Clustering extracts the N largest weakly connected components of a network Node Degree calculates the amount of edges adjacent to a Science of Science Sci Tool User Manual Version Alpha3 37 node and Pathfinder Network Scaling prunes a network to find its underlying structure Node Betweenness Centrality appends a value to each node which correlates to the amount of shortest paths that node resides on The more shortest paths between node pairs a certain node resides on the higher its betweenness centrality To learn about each algorithm see Section 3 1 Sci Tool Plugins or visit https nwb slis indiana edu community for details 4 9 4 Network Visualization 4 9 4 1 GUESS Visualizations Load the sample dataset yoursci2directory sampledata networks florentine nwb and calculate an additional node attribute Betweenness Centrality by running Analysis gt Networks gt Unweighted and Undirected gt Node Betweenness Centrality with default parameters Then select the network and run Vi
116. f Mapping topic bursts 113 years of Analysis When one individual in 20 years of PNAS physics research Geospatial Career trajectory of Mapping a state s PNAS publications Analysis Where one individual intellectual landscape Topical Analysis Base knowledge from Knowledge flows in Topic maps of NIH What which one grant Chemistry research funding draws Network Analysis NSF Co PI network of Co author network NSF s core With Whom one individual competency Table 5 2 Screenshots of major analysis types and levels of analysis Micro Individual Meso Local Macro Global 1 100 records 101 10 000 records 10 000 records Statistical Analysis Profiling Temporal Analysis When Geospatial Analysis Where Topical Analysis ARS SUC A or 5 1 lt s i 3 What N L a Rs Network Analysis FE With Whom TORE cea i e Seats S Science of Science Sci Tool User Manual Version Alpha3 5 Users of the tool can e Access science datasets online or load their own e Perform different types of analysis with some of the most effective algorithms available e Use different visualizations to interactively explore and understand specific datasets e Share datasets and algorithms across scientific boundaries The Sci Tool is built on the Cyberinfrastructure Shell ClShell Cyberinfrastructure for Network Science Center 2008 an open source software framework for the easy
117. f Science Sci Tool User Manual Version Alpha3 55 gt topte gt for i in range 0 20 toptc i labelvisible true Alternatively run GUESS File gt Run Script and select yoursci2directory scripts GUESS paper citation nw py A 2004 PROC NAT ACAD SCI USA Vint Pit Oo Pho S ARE Wee Pag ae r aa Ios Barrat tA f adas EC foe A Le CT F Fa Ot pe in y E J Fa Ler Figure 5 10 Giant components of the paper citation network Compare the result with Figure 5 10 and note that this network layout algorithm and most others are non deterministic That is different runs lead to different layouts observe the position of the highlighted node in both layouts However all layouts aim to group connected nodes into spatial proximity while avoiding overlaps of unconnected or sparsely connected subnetworks 5 1 4 2 Author Co Occurrence Co Author Network To produce a co authorship network in the Sci Tool select the table of all 361 unique ISI records from the FourNetSciResearchers dataset in the Data Manager window Run Data Preparation gt Text Files gt Extract Co Author Network using the parameter File Format isi The result is two derived files in the Data Manager window the co authorship network and a table with a listing of unique authors also known as merge table The merge table can be used to manually unify author names e g Albet R and Albe
118. ferences to a publication per year and a reference ID Extract Original Author Keywords by Year Outputs a table containing the number of original author keywords per year Extract New ISI Keywords by Year Outputs a table containing one row per new ISI keyword per year Extract Authors by Year for Burst Detection Outputs a table containing two columns author name concatenated with its author ID and year of publication Used for author burst detection Extract Documents by Year for Burst Detection Used for word occurrence based burst detection Extract Original Author Keywords by Year for Burst Detection Used for keyword burst detection Extract New ISI Keywords by Year for burst Detection Used keyword burst detection Extract References by Year for Burst Detection Used to detecting bursting references to publications Extract Longitudinal Summary Outputs a table with the total number of documents published references published references made distinct authors distinct sources distinct author keywords distinct ISI keywords and distinct other keywords by year Extract Co Author Network Extracts a weighted undirected network with authors as nodes and edges between authors who co wrote papers The extraction appends to nodes the number of authored documents ISI s times cited count the publication of the earliest document and the publication year of the most recent document The extraction appends to edge
119. fied as the same word 4 6 2 Slice Table by Time Slicing a table allows the user to see the evolution of a network over time Time slices can be cumulative i e later tables include information from all previous intervals or fully sliced i e each table only includes data from its own time interval Cumulative slices can be useful for seeing growth over time whereas fully sliced tables should be used for displaying changing structure over time 4 7 Geospatial Analysis Where Geospatial analysis has a long history in geography and cartography Geospatial analysis aims to answer the question of where something happens and with what impact on neighboring areas Geospatial analysis requires spatial attribute values or geolocations for authors and their papers extracted from affiliation data or spatial positions of nodes generated from layout algorithms Geospatial data can be continuous i e each record has a specific position or discrete i e each set of keywords has a position or area shape file e g number of papers per country Spatial aggregations e g merging via ZIP codes counties states countries and continents are common Cartographic generalization refers to the process of abstraction such as 1 graphic generalization the simplification enlargement displacement merging or selection of entities without enhancement or effect to their symbology and 2 conceptual symbolization the merging selection and symb
120. focusing on biological networks with particularly nice visualizations Cytoscape 2002 2003 Visualizatio Graph visualization Graphical Yes All Major Auber 2003 n software for networks over Tulip 1 000 000 elements iGraph 2003 Library Yes All Major Csardi and Nepusz 2006 Analysis A library for classic and and cutting edge network Manipulati analysis usable with many programming languages 2004 Scientom A tool to analyze and Graphical Yes All Major Chen 2006 visualize scientific literature particularly co citation structures CiteSpace No Windows Garfield 2008 Analysis and visualization Graphical tool for data from the Web of Science HistCite 2004 Scientom 2004 Statistics Command Yes All Major Ihaka and line Gentleman 1996 A statistical computing language with many libraries for sophisticated network analyses N C Oo Oo O Prefuse 2005 Library Yes All Major Heer Card et al 2005 Visualizat Visualizatio A general visualization ion n framework with many capabilities to support network visualization and analysis Visualizatio A tool for visual graph Graphical Yes All Major Adar 2007 n exploration that integrates a scripting environment GUESS 2007 Network Visualizatio Flexible graph Graphical Yes All Major AT amp T visualization software GraphViz 2004 Network Research Group 2008 dd NWB Tool
121. fter the r ov instance br ancer 10 would increase Medine Osdabhoy end Goltz 1008 Sustainability cori try m key 4 33 Modiss 1898 2004 the importance of makes the term cancer by ter d ha PPP due for he r win 1991 2007 camp ared to matching the term breart Medine ve 200 Sustainability ation a custainable future 2002 245 E NSF 1945 2004 Medine 2002 Materials for sustainability 3 15 r USPTO 1976 2097 Madine Astia et al 2003 Enhancing the SAFE strategy through collaboration 3 13 participation accountability amd sustainability Medline Starter 200g Moye posture usta ability public heabth s rote m the z let 2 13 Search s a S oentury pi p 2008 Creening up Kespitale getting cavvier om uctainahility 2 15 oP Done Figure 4 10 Scholarly Database Search and Browse interface Science of Science Sci Tool User Manual Version Alpha3 31 Results are displayed in sets of 20 records ordered by a Solr internal matching score The first column represents the record source the second the creators third comes the year then title and finally the matching score Datasets can be downloaded in different subsets and formats for future analysis Scholarly Database Download Mozilla Firefox a x Ele Edt View History Bookmarks Tools Help lex X 0 betpttsab sts inctana eduldownload q artificial inteligence AND 7 Q mark meke umich JO SCHOLARLY DATABASE Cyberi
122. ful analysis requires carefully collected data The Sci Tool can import many varieties of scientometric data outlined in section 3 2 Data Acquisition and Preparation but the tool also comes bundled with several sample datasets in the sampledata directory This sample data will be used throughout sections 4 Workflow Design and 5 Sample Workflows and includes among others e EndNote o scientometrics endnote KatyBorner enw 146 publications authored or co authored by Katy Borner from 1992 2010 e Scholarly Database o geo usptolnfluenza csv Heavily pre processed and geocoded data covering USPTO patents containing the keyword Influenza e NSF Award Search o scientometrics nsf MedicalAndHealth nsf 288 grants awarded from the NSF containing the words Medical or Health from 2003 2010 totaling 152 015 288 o scientometrics nsf KatyBorner nsf 13 grants awarded from the NSF to Katy Borner as PI or Co PI from 2003 2008 totaling 3 527 728 o scientometrics nsf BethPlale nsf GeoffreyFox nsf MichaelMcRobbie nsf 45 grants between three Indiana University researchers totaling 39 031 960 from 1978 2008 o scientometrics nsf Michigan nsf Indiana nsf Cornell nsf Three universities grant profiles totaling 951 478 510 from 2000 2009 e NIH Award Search o scientometrics nih CTSA2005 2009 xls 2 546 papers and 534 grants for Clinical and Translational Science research from 2005 2009 e Thomso
123. fuse beta Visualization tool for use with graphs having pre specified node coordinates o Circular Hierarchy Generates a circular visualization of the output produced by a multi level aggregation method such as Blondel Community Detection Result is a Postscript file o Science Map Circle Annotation Projects circles of user defined size and color onto UCSD s Map of Science Result is a Postscript file Please contact William Decker at UCSD wjdecker ucsd edu for permissions to use the UCSD map of science Cytoscape Cytoscape Analyzing and Visualizing Networks Data tool see http www cytoscape org Science of Science Sci Tool User Manual Version Alpha3 19 3 2 Load View and Save Data In the Sci Tool use File gt Load to load one of the provided in sample datasets in yoursci2directory sampledata or any dataset of your own choosing Any file listed in the Data Manager can be saved viewed renamed or discarded by right clicking it and selecting the appropriate menu options If File gt View With was selected the user can select among different application viewers Choosing Microsoft Office Excel for a tabular type file will open MS Excel with the table loaded The Sci Tool can save a network using File Save which brings up the Save window Note that some data conversions are lossy i e not all data is preserved 3 3 Memory Allocation Due to the constraints of the Java virt
124. g Adobe Distiller or GhostViewer see Section 2 4 Saving Visualizations for Publication and is shown in Figure 5 3 SGER Collaborative Research Mapping the Struc 3 TLS Towards a Macroscope for Science Policy NSF Workshop on Knowledge Management and Visual mammam Creative Metaphors to Stimulate New Approaches eO III Visualizing Network Dynamics Competi ZC Mapping Science Exhibit at the 233rd National M PE Collaborative Research Social Networking Tools 2 May 2006 International Workshop and Conference EE NetWorkBench A Large Scale Network Analysis M s SCI Workshop The Role of Social Network gq Mapping Chemistry TM S1 CAREER Visualizing Knowledge Domains SS SSSSFSSS SSW Project ENABLE Learning through Associations 1 IE en eee eee 2003 2004 2005 2006 2007 2008 2009 2010 Figure 5 3 Horizontal Bar Graph of KatyBorner NSF Note that co Pls from so called collaborative awards are not shown in the network Senior personnel that might be key to the success of a project is not part of this dataset either That is awards in which Borner served as senior personnel as well as her senior personnel collaborators are not shown 5 1 2 Time Slicing of Co Authorship Networks ISI Data AlessandroVespignani isi Time frame 1988 2006 Region s Indiana University University of Rome Yale University Leiden University International Center for Theoretical Physi
125. g cells that helps to control which genes are active and how active they are The data for this analysis comes from a search of the Scholarly Database SDB http sdb slis indiana edu for RNAi in All Text from MEDLINE NSF NIH and USPTO A copy of this data is available in yoursci2directory sampledata scientometrics sdb RNAi The default export format is csv which can be loaded in the Sci Tool directly Scholarly Database Search Mozilla Firefox File Edit view History Bookmarks Tools Help QB Cc x B http sdb slis indiana edu search Most Visited E Getting Started Latest Headlines Customize Links Find Cheap Flight Tick SCHOLARLY DATABASE Cyberinfrastructure for Network Science Center SLIS Indiana University Bloomington Search Edit Profile j About Logout j If multiple terms are entered in a field they are Search automatically combined using OR So breast cancer matches any record with breast or cancer Creators NEN in that field Title You can put AND between terms to combine with Abstract AlTex RNA First Year 1865 j Last Year 2008 IV Medline 1865 2008 M NIH 1961 2002 IV usr 1985 2004 M uspro 1976 2008 AND Thus breast AND cancer would only match records that contain both terms Double quotation can be used to match compound terms e g breast cancer retrieves records with the phrase br
126. h Tool Birger Larsen Jacqueline Leta Eds Proceedings of ISSI 2009 12th International Conference on Scientometrics and Informetrics Rio de Janeiro Brazil July 14 17 Vol 2 Bireme PAHO WHO and the Federal University of Rio de Janeiro pp 619 630 http ivl slis indiana edu km pub 2009 borner issi pdf Science of Science Sci Tool User Manual Version Alpha 3 100 6 6 2 Reference Mapper By Russel J Duhon Katy Borner The RefMapper tool supports the automatic detection mapping and clustering of grant awards and proposals based on citation references It might be used to group proposals for review or to communicate the topic coverage of a proposal funding portfolio The tool uses a master list of 18 351 journal names that are indexed by Scopus and Reuters Thomson Scientific ISI SCI SSCI and A amp H Indexes and a lookup table of 57 860 different abbreviations for those journal names It science locates identified journals on the 554 scientific areas of the UCSD Map of Science Klavans Boyack 2007 Each of the 13 main scientific disciplines is labeled and color coded in a metaphorical way e g Medicine is blood red and Earth Sciences are brown as soil The RefMapper also identifies clusters based on reference co occurrence similarity The RefMapper tool was made available as a plugin to the Network Workbench NWB Team 2006 Cyberinfrastructure for Network Science Center 2009 It can be downloaded for Windows and for Mac a
127. h journals cite 4 9 1 3 Co Citation Linkages Two scholarly records are said to be co cited if they jointly appear in the list of references of a third paper The more often two units are co cited the higher their presumed similarity 4 9 1 3 1 Document Co Citation Network DCA DCA was simultaneously and independently introduced by Small and Marshakova in 1973 Small 1973 Marshakova 1973 Small and Greenlee 1986 It is the logical opposite of bibliographic coupling The co citation frequency equals the number of times two papers are cited together i e they appear together in one reference list Science of Science Sci Tool User Manual Version Alpha3 36 4 9 1 3 2 Author Co Citation Network ACA Authors of works that are repeatedly juxtaposed in references cited lists are assumed to be related Clusters in ACA networks often reveal shared schools of thought or methodological approach common subjects of study collaborative and student mentor relationships ties of nationality etc Some regions of scholarship are densely crowded and interactive Others are isolated and nearly vacant 4 9 1 3 3 Journal Co Citation Network JCA JCA networks offer wide angle views of scholarly disciplines Slicing these networks by time can reveal the evolution of disciplinary similarity Like author and document co citation networks these are undirected and weighted 4 9 2 Compute Basic Network Characteristics It is often advantageous to kno
128. he parameter lll Extract Edges Above or Below Value X Extract all edges with an attribute above a certain number Extract From this number 5 Ta Below e Numeric Attribute TIMES _COCITED 8 Cancel Now remove isolates Preprocessing gt Networks gt Delete Isolates and append node degree attributes to the network Analysis gt Networks gt Unweighted amp Undirected gt Node Degree View the network in GUESS using Visualization gt Networks gt GUESS and Layout gt GEM Resize and color the edges to display the strongest and earliest co citation links using the following parameters Zoom Level 1 71522 zoom Level 1 515562 Edges ees socraton rom ll ve polos Resize color and label the nodes to display their degree using the following parameters Object nodes based on gt Property bokaldegree Operator Value 40 we Zoom Level 1 69858 zoom Level 1 65404 ftataldegree i From Te The resulting Journal Co Citation Analysis JCA network is given in Figure 5 20 Science of Science Sci Tool User Manual Version Alpha3 64 SS bs T a Ly gt P E a An A ima R m M A 2 y SU Ki im i AON MOB in NL LIII rs A Figure 5 20 Journal co citation analysis of FourNetSciResearchers vs I r P X 5 2 Institution Level Studies IMeso 5
129. hich good otation data contaming the 26 papers appeaeng n Prya hr Ot aamini lt rd porcine y the map The 04 Nobat priis teen 990 an d 008 Exch E Thomsen m rides ban 2000 2005 for which good otation deta i predcs o n physics bises oF e act third of the wap tation couma gh moat papers teil easton ir AEA worthy ot f ipecel me Correct predictions by Each vertical bar ia subdedec vertically into fe jours that Thorsen SE are Fight appear n A with heht propoctornal o the number of papers and each parmal is sebchvided horgontaly roo the volumes of the journal appesning e the column Nobel Prizes i in Physical Review Year of Nobel Prize Winners Put Nebel Poe merat wit 2005 Roy Gauber John L Hall and Theodor W H nsch folowing Theories Pede Rem 2003 Anthony Leggett 197 2002 Raymond Davis jr Masatoshi Koshiba and Riccardo Giacconi 19463 1924 198 2001 Enc A Comel Wolfgang Ketterle and Carl E Wieman 19 1998 Robert B Laugh n 194 1997 Steven Chu and Willam Di Philips i 1996 David M Loe Douglas D Osherof and Robert C Richardson Q 1995 Martin L Perl 19 1994 Bertram N Brockhouse and Clifford G Shull 1955 19548 1990 jerome L Friedman Henry W Kendall and Richard E Taylor Bar Graph Lines DDD Physical Review PACS 0 General PACS 8 interdisciplinary Physics and a Physical Review Series Related Areas of Science and Technology PACS The Physics of Elementary Particles Bon voee and fi
130. hted amp Directed Node Indegree to append indegree attributes to each node and then visualize the network using Visualization Networks GUESS followed by Layout GEM In the graph modifier pane click Resize Linear and size nodes by the indegree attribute from 7 to 30 as below Click Do Resize Linear zoom Level 0 6747 Once again in the modifier pane select nodes based on gt in the Object drop down box bipartitetype in the Property drop down box in the Operator drop down box and cited patents in the Value drop down box Press Colour and click on blue below Object Property Operator value Resize Linear Colorize zoom Level 0 6747 mm E E remmememr rrrrmr m ENNMFENNERNFE eee ee ching patent Repeat the previous steps but change the Value to citing patent and select the color red Now press Show Label The resulting graph should look like Figure 5 28 Science of Science Sci Tool User Manual Version Alpha3 80 E o e7368248 LJ ce E E E 7109173 e EI ec E e7150970 sa 7 368559 o o o e e o n 7472353 06 gt a 1 e 7333683 e o o e 6 74 3870 e gt IM s es E 7419779 7101991 741977 e 7292808 07361752 9 e 2 co MSS es e 7 50 7129223 o A 05 I 2 e6806066 se X 7429656 eee 9 Se a ea oo e MN e cers e ee reto 7 PL Node Size Bv Indegree Q i e X tS
131. ical analyses Data in NIH files can be used for the following types of analyses O Statistical Attributes o Type Temporal Analysis o Year of award Topical Analysis o Abstract o Project Title Network Analysis o Principle Investigator o Organization o Project Number 4 2 3 Datasets Scholarly Database Science of Science Sci Tool User Manual Version Alpha3 30 The Scholarly Database SDB at Indiana University provides easy access to more than 23 000 000 records from MEDLINE U S Patents as well as awards by the National Science Foundation and the National Institutes of Health see Figure 4 9 right for number of records per year Anybody can register at http sdb slis indiana edu cross search all four databases and download large amounts of data as dump and in precompiled formats see Figures 4 9 to 4 11 for interface snapshots Search the four databases separately or in combination for Creators authors inventors investigators or terms occurring in Title Abstract or All Text for all or specific years If multiple terms are entered in a field they are automatically combined using OR So breast cancer matches any record with breast or cancer in that field You can put AND between terms to combine with AND Thus breast AND cancer would only match records that contain both terms Double quotation can be used to match compound terms e g breast cancer retrieves recor
132. ients The r Da Fontoura Costa L Andrade R F 35 What are Yang J Xie Z Sun Y Clustering effect on Chen Y W Zhang L F Huang J P The Watts 3 Park S M Yair Y Aviv R Ravid G Yaniv R Ziv B P Li Y Fang J Q Liu Q Liang Y Small wor Yang L H Holland M D Small world propertie Kim B J Dynamic behaviors in dire Schank T Wagner D Approximating clustering Tu S Sousa O Kong L J Liu M R 3 iraujo T Mendes R V Seixas J dynamical Eckmann J P Moses E Curvature of co links Marchiori M Latora V Harmony in the small Figure 4 5 Saving and viewing WattsStrogatz scopus Science of Science Sci Tool User Manual Version Alpha 3 27 Data in Scopus files is commonly used for the following types of analyses o Temporal Analysis O O O Issue Volume Year o Geospatial Analysis O Correspondence Address o Topical Analysis O e 900 Q O 0 Abstract Author Keywords Conference Name Index Keywords Source Title Source Title o Network Analysis O O Authors References 4 2 1 5 Google Scholar Google Scholar data can be acquired using Publish or Perish Harzing 2008 that can be freely downloaded from http www harzing com pop htm A query for papers by Albert L szl Barab si run on Sept 21 2008 results in 111 papers that have been cited 14 343 times see Figure 4 6 Harzing s Publish or Perish File Edit View
133. ing of Research Visualizing the Influence of Grants on the Number and Citation Counts of Research Papers 2003 ccccccccssseecceeeeseececeeeeesececseeeeeceeeeeeeeeeeess 87 6 2 2 Mapping Transdisciplinary Tobacco Use Research Centers Publications forthcoming 88 6 3 Eocal and Global SCIEMCE Studie Soesman sis Qu den eve nba me doo mec vu Ueusem a Ea de ON 89 6 3 1 Mapping the Evolution of Co Authorship Networks 2004 ccccccccceccssssseceeeeeeeeeeseeceeeeeeeeeeeneeeeeeeeas 89 6 3 2 Studying the Emerging Global Brain Analyzing and Visualizing the Impact of Co Authorship Teams ZOOS fett mr mmt Mr 90 6 3 3 Wapping Indiana s Iritellectdal SDabe cet pr b E ac ER Coen E ceo tbe Ud 91 Science of Science Sci Tool User Manual Version Alpha3 3 6 3 4 Mapping the Diffusion of Information Among Major U S Research Institutions 2006 92 6 3 5 Research Collaborations by the Chinese Academy of Sciences 2009 eeeee 93 6 3 6 Mapping the Structure and Evolution of Chemistry Research 2009 eee 94 6 3 7 Science Map Applications Identifying Core Competency 2007 ccc csssecccccesseccceeeeseeecceeeeeeeseseeenerss 95 6 1 Modeling SClelt ences ec shat eur ees ades UD C Ea LE eA eA RAD E UU UD CIE 96 6 4 1 113 Years of Physical Review Using Flow Maps to Show Temporal and Topical Citation 2008 96 6 4 2 The Simultaneous Evoluti
134. ion mapping s S 7 protein vite ade 92 amp 94 comparative study CELL LINE nucleic acid hybridization structural inbred strains plasmids 99 mutagenesis MOLECULAR SEQUENCE DATA nucleic acid binding apoptosis Oan Co word space of the top 50 highly frequent and bursty words used in the top 10 most highly cited PNAS publications in 1982 2001 growth factor 95 Color Code base sequence amino acid amino acid sequence N in vitro circle size burst weight gene expression circle color burst onset ring color year of max word count years of 2nd and 3rd burst are given in color Reference Mane Ketan K amp B rner Katy 2004 Mapping Topics and Topic Bursts in PNAS Proceedings of the National Academy of Sciences of the United States of America Vol 101 Suppl 1 5287 5290 http ivl slis indiana edu km pub 2004 mane burstpnas pdf Science of Science Sci Tool User Manual Version Alpha3 86 6 2 Local Impact Output ROI Studies 6 2 1 Indicator Assisted Evaluation and Funding of Research Visualizing the Influence of Grants on the Number and Citation Counts of Research Papers 2003 By Kevin W Boyack amp Katy B rner This article reports research on analyzing and visualizing the impact of governmental funding on the amount and citation counts of research publications For the first time grant and publication d
135. itutes of Health NIH between 1999 to 2009 Specifically the study shows the results of a geospatial and topical analysis a network analysis of collaboration networks a funding input vs publication output analysis and a temporal analysis of data trends and coverage The results were interpreted by tobacco domain experts providing insight into the overall structure and evolution of tobacco research collaborations interdisciplinary integration and impact on science as a whole The study complements efforts that use the very same controlled RO1 TTURC dataset in two ways First it shows major differences in collaboration patterns and topical coverage of TTURC funded projects TTURC co author networks have small world characteristics making them robust to the frequent movement of people in academia while supporting efficient diffusion of information and expertise Second TTURC projects by design have a larger topic coverage and wider spectrum of basic to applied research and practice This is reflected in their topic coverage but also in the ratio of funding input versus paper output TTURC output goes beyond simply producing papers We conclude with recommendations on how to improve future evaluations of transdisciplinary research centers ITUR Co Xurhorship Nerwork Longitusliral RIH Co Aul Ix ship ctwork oft IR 48 Compare R01 investigator based funding with TTURC Center awards in terms of number of publications and evolving co author netw
136. itutions PI Figure 5 17 Bimodal institution Pl network for CTSA Centers Now load publications csv as a standard csv and create a co authorship network by running Data Preparation Text Files gt Extract Co Occurrence Network with text delimiter set to The resulting co authorship network has 8 680 nodes 27 isolates and 50 160 edges see Figure 5 18 its largest giant component is shown enlarged in Figure 5 19 wv m 2 i A d 6 e amp a e e 9 amp LI N 23 Be E amp ay amp ow gt e f e e is H x e gt s 9 4 5 fa E e e DA 2 aa 5 1 i F amp a E x 6 t 3 S 5 ge og 1 4 a g gt j 4 gt A pS 5 g P a x auld Y Z E gt ww r d Fd 5 w wo D i 4 amp M NEL VT A UN ta s P P a 4 gt voa wt t j P4 g g D z Ca 4 l z A 1 9 LI i W d a wx FY v v e o hs tn a sora J y e Tas t heel EZ x 2 n ye C ot 7 9 B bat ad v ice ee M in 5 s a T NEM s gt a c y i N vee Miei m ez inu d IV x 7g nl v 4 ee cst e 9 me b PEF 2 Y e 1 4M t 5 9 A 9 des T 9 T r d S e Y e as x A gt S s 1 pO A FE v Aa Med x s ye A wr Paw fq l dic y gt o s s i e is P th bie A 8 s a o s ut E e bad b j d W ff ade b ad o amp z
137. l Review Letters 444 74 00 7 ERavasz Al Bar Hierarchical organization i 2003 Physical Review E 439 43 90 6 ALBarab si R Al Emerging of scaling in ran 1999 Science 381 42 33 8 ALBarab si R Al Scale free characteristics 2000 Physica Statistical Mecha 198 33 00 9 ALBarab si Linked How Everything I 2003 163 13 58 10 IDaruka L Bar Dislocation Free Island Fo 1997 Physical Review Letters 160 11 43 11 R Cuerno ALBa Dynamic Scaling of Ion Sp 1995 Physical Review Letters 136 17 00 12 IJFarkas IDer Spectra of real world qr 2001 Physical Review E 135 16 88 13 GBianconi ALB Bose Einstein Condensati 2001 Physical Review Letters 132 18 86 14 ZbDezs AL Bara Halting viruses in scale fr 2002 Physical Review E 106 13 25 15 ALBarab si ER Deterministic scale free n 2001 Physica 4 Statistical Mecha 105 10 50 16 R Albert H Jeon Internet Diameter of the 1999 Nature wll Lookup Lookup Direct Help Copy statistics Copy results Check all Check selection HE pi Uncheck all Uncheck 0 cites Uncheck selection Help di Figure 4 6 Publish or Perish interface with query result for Albert L szl Barab si To save records select from menu File gt Save as Bibtex or File gt Save as CSV or File gt Save as EndNote All three file formats can be read by the Sci Tool The result in all three formats named LaszloBara
138. l stindianapoli Academic ws Academic J Y Academic ws Industry Cy Wwe Bloomington Industry we Industry O eo New Albany Bvansvilie Submitted and Awarded Proposals in Indiana 2001 2006 Reference Unpublished Science of Science Sci Tool User Manual Version Alpha3 91 6 3 4 Mapping the Diffusion of Information Among Major U S Research Institutions 2006 By Katy Borner Shashikant Penumarthy Mark Meiss amp Weimao Ke This paper reports the results of a large scale data analysis that aims to identify the information production and consumption among top research institutions in the United States A 20 year publication data set was analyzed to identify the 500 most cited research institutions and spatio temporal changes in their inter citation patterns A novel approach to analyzing the dual role of institutions as information producers and consumers and to study the diffusion of information among them is introduced A geographic visualization metaphor is used to visually depict the production and consumption of knowledge The highest producers and their consumers as well as the highest consumers and their producers are identified and mapped Surprisingly the introduction of the Internet does not seem to affect the distance over which information diffuses as manifested by citation links The citation linkages between institutions fall off with the distance between them and there is a strong linear
139. ly Annotate K Coreness Appends to each node the K Core that node belongs to HITS Computes authority and hub score for every node PageRank Ranks the importance of a node by how many other important nodes point to it o Weighted amp Directed HITS Computes authority and hub score for every node Science of Science Sci Tool User Manual Version Alpha3 18 Weighted PageRank Ranks the importance of a node by how many other important nodes point to it taking into account edge weights Modeling e Random Graph Generates a graph with a fixed number of nodes connected randomly by undirected edges e Watts Strogatz Small World Generates a graph whose majority of nodes are not directly connected to one another but are still connected to one another via relatively few edges e Barab si Albert Scale Free Generates a scale free network by incorporating growth and preferential attachment e TARL Topics Aging and Recursive Linking process model simulates the simultaneous evolution of author and paper networks The model attempts to capture the roles of authors and papers in the production storage and dissemination of knowledge Information diffusion is assumed to occur directly via co authorship and indirectly via the consumption of other author s papers The model generates a bipartite evolving network which also incorporates aging in the paper citation network Visualization e General o Image Viewer Views
140. mbre John Burgoon Weimao Ke amp Katy Borner The Scholarly Database aims to serve researchers and practitioners interested in the analysis modelling and visualization of large scale data sets A specific focus of this database is to support macro evolutionary studies of science and to communicate findings via knowledge domain visualizations Currently the database provides access to about 18 million publications patents and grants About 9096 of the publications are available in full text Except for some datasets with restricted access conditions the data can be retrieved in raw or pre processed formats using either a web based or a relational database client This paper motivates the need for the database from the perspective of bibliometric scientometric research It explains the database design setup etc and reports the temporal geographical and topic coverage of data sets currently served via the database Planned work and the potential for this database to become a global testbed for information science research are discussed at the end of the paper South Dakar CONGR MIO Map of NIH Grants top and MEDLINE Publications bottom Reference B rner Katy Huang Weixia Bonnie Linnemeier Micah Duhon Russell Jackson Phillips Patrick Ma Nianli Zoss Angela Guo Hanning amp Price Mark 2009 Rete Netzwerk Red Analyzing and Visualizing Scholarly Networks Using the Scholarly Database and the Network Workbenc
141. n 2009 Understanding Outside Collaborations of the Chinese Academy of Sciences Using Jensen Shannon Divergence Proceedings of SPIE IS amp T Visualization and Data Analysis Conference San Jose Vol 7243 pp 72430C http ivl slis indiana edu km pub 2009 duhon cas pdf Science of Science Sci Tool User Manual Version Alpha3 93 6 3 6 Mapping the Structure and Evolution of Chemistry Research 2009 By Kevin W Boyack Katy Borner amp Richard Klavans How does our collective scholarly knowledge grow over time What major areas of science exist and how are they interlinked Which areas are major knowledge producers which ones are consumers Computational scientometrics the application of bibliometric scientometric methods to large scale scholarly datasets and the communication of results via maps of science might help us answer these questions This paper represents the results of a prototype study that aims to map the structure and evolution of chemistry research over a 30 year time frame Information from the combined Science SCIE and Social Science SSCI Citations Indexes from 2002 was used to generate a disciplinary map of 7 227 journals and 671 journal clusters Clusters relevant to study the structure and evolution of chemistry were identified using JCR categories and were further clustered into 14 disciplines The changing scientific composition of these 14 disciplines and their knowledge exchange via citation linkages
142. n Reuter s Web of Science Science of Science Sci Tool User Manual Version Alpha3 11 scientometrics isi AlessandroVespignani isi 101 publications authored or co authored by scientometrics isi Alessandro Vespignani from 1990 2006 scientometrics isi EugeneGarfield isi 99 publications retrieved on November 11 2009 Jscientometrics isi FourNetSciResearchers isi 361 publications spanning 52 years and four network scientists Albert L szl Barab si Eugene Garfield Alessandro Vespignani amp Stanley Wasserman scientometrics isi Scientometrics isi all 2 126 articles published in the journal Scientometrics from 1978 2008 Science of Science Sci Tool User Manual Version Alpha3 12 3 Algorithm and Tool Plugins The Sci Tool menu provides easy access to diverse preprocessing modeling analysis visualization and scientometrics algorithms that are listed here Note that the Analysis gt Networks algorithms are grouped by data type i e un weighted vs un directed Please see the online documentation https nwb slis indiana edu community n Algorithms HomePage for additional details 3 1 Load Data Preparation Sci Tool Plugins Load Load a file Load and Clean ISI File Load ISI file and reduce set to those that have unique ISI identifiers The record with the highest value of citations TC field is kept Load into Database O Load ISI File into Database
143. n in Figure 5 17 Science of Science Sci Tool User Manual Version Alpha3 62 Wasserman S 1986 BRIT MATH STAT PSY V39 P Wasserman S 1987 PSYCHOMETRIKA V52 Wasserman S 1985 MATH PSYCHOL V29 P406 Fienberg SE 1985 AM STAT ASSOC V80 P51 Wasserman S 1984 SOC NETWORKS V6 P177 Holland PW 1981 AM STAT ASSOC V76 P33 Holland PW 1983 SOC NETWORKS V5 P109 Fienberg SE 1981 SOCIOLOGICAL METHODO Meyer MM 1982 ANN STATIS V10 P1172 Fienberg SE 1980 ANAL CROSS CLASSIFIE Bishop YMM 1975 DISCRETE MULTIVARIAT White HC 1976 AM J SOCIOL V81 P730 Garfield E 1977 CURR CONTENTS P5 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 Figure 5 17 Top reference bursts in the FourNetSciResearchers dataset For temporal studies it can be useful to aggregate data by year rather than by author reference etc Running Data Preparation Database ISI Extract Authors Extract Longitudinal Study will output a table which lists metrics for every year mentioned in the dataset The longitudinal study table contains the volume of documents and references published per year as well as the total amount of references made the amount of distinct references distinct authors distinct sources and distinct keywords per year The results are graphed in Figure 5 18 DOCUMENTS PUBLISHED REFERENCES PUBLISHED TOTAL_REFERENCES MADE DISTINCT REFERENCES MADE DISTINCT AU
144. n record linking methodology as applied to the 1985 census of Tampa Florida Journal of the American Statistical Society 64 1183 1210 Jaro M A 1995 Probabilistic linkage of large public health data file Statistics in Medicine 14 491 498 Kampis G L Gulyas et al 2009 Dynamic Social Networks and the TEXTrend ClShell Framework Applications of Social Network Analysis University of Zurich ETH Zurich Kessler M M 1963 Bibliographic coupling between scientific papers American Documentation 14 1 10 25 Kleinberg J M 2002 Bursty and Hierarchical Structure in Streams 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining ACM Press Krebs V 2008 Orgnet com Software for Social Network Analysis and Organizational Network Analysis from http www orgnet com inflow3 html Leydesdorff L 2008 Software and Data of Loet Leydesdorff Retrieved 7 15 2008 from http users fmg uva nl lleydesdorff software htm Marshakova I V 1973 Co Citation in Scientific Literature A New Measure of the Relationship Between Publications Scientific and Technical Information Serial of VINITI 6 3 8 Martin S W M Brown et al in preparation DrL Distributed Recursive Graph Layout Journal of Graph Algorithms and Applications Nicolaisen J 2007 Citation Analysis Annual Review of Information Science and Technology B Cronin Medford NJ Information Today Inc 41 609
145. n the Data Manager It has 5 335 nodes 213 of which are isolates and 193 039 edges Isolates can be removed running Preprocessing Networks Delete Isolates The resulting network has 5122 nodes and 193 039 edges and is too dense for display in GUESS Edges with low weights can be eliminated by running Preprocessing Networks gt Extract Edges Above or Below Value with parameter values Extract from this number 4 Below leave unchecked Numeric Attribute weight Here only edges with a local co citation count of five or higher are kept The giant component in the resulting network has 265 nodes and 1 607 edges All other components have only one or two nodes The giant component can be visualized in GUESS see Figure 5 13 right see the above explanation and use the same size and color coding and labeling as the bibliographic coupling network Simply run GUESS File gt Run Script and select yoursci2directory scripts GUESS reference co occurence nw py Science of Science Sci Tool User Manual Version Alpha3 59 Figure 5 13 Undirected weighted bibliographic coupling network left and undirected weighted co citation network right of FourNetSciResearchers dataset with isolate nodes removed 5 1 4 5 Word Co Occurrence Network In the Sci Tool select the table of 361 unique ISI records from the FourNetSciResearchers dataset in the Data Manager Run Preprocessing gt Topical gt Normalize
146. nd hit Export Save the file as WattsStrogatz scopus Part of the resulting file can be seen in Figure 4 5 right Ese opus Ligtput Laport Print mall or oreabe Mhig apy Mozilla Pirelas Ble pi yew goy omaks Jes b e e x r 3 htp wen scepuri carrear ur ihein hl rain DC SR Lr 5 E3 scopus Output Laport Print eile SCGPUS TIE Sources ES ES sty List My Profile Quick Search Gal Micah Unneameler Is logged In Lisa Qu Output Export Print E mail or creare a Bibliograph amp Select the desired output type far the 15 selected documents E G Epot Ey Print C E E mail C D Bibliography Export Choose your preferences and click Export Export format Tert ASCI karma Unt pul Complete format Halti Supa may iat be cores Die fahn Suejras did A Abstract and Keywords Selected output Includes Citation inborrrval inn Author Abstract Document tithe Year Index Keywords Author Keywords Source Title vohani Issue Pags Fund Detalls Citation count Number Source and Document Type Acronym Sponsor Bibliographical infarrmat iin Affiliations Rolerences Sena xhentifusrs 0 9 155N References x Find A fet Tono Match cane Authors Title Year Source title Volume Issue art Li K Small M Wang K Fu X Three structu Yang X Wang B Wang W Sun novel sma Kaiser M Mean clustering coeffic
147. ne T aio V WP Neon O sees EM ee 0 i bt sont 75 J p A e Eman 27 Ji Vergassola M e Dm A Deangelis R t E S Marsili M t E 9 Deangets R ware M 1988 1992 1988 1997 Vazquez A f E i Me i A s 4 7 JX b a Bees ert V rj 7 p NS CR NG Zip N Sidoretts art R EE aN 1988 2002 1988 2007 A Mandelbrot BB 2 Mandelbrot BB vA Kaufman H v wrens ekutieli Kaufman H V AIT Deae Marsit M Tosatti E Y ekutieli EWES inn HJ Marsii Tosatti E Figure 5 4 Evolving co authorship network of Vespignani from 1988 2007 The four networks reveal that from 1988 1992 Alessandro Vespignani had one primary co author and four secondary co authors His network expanded considerably over time comprising 221 co authors in 2007 5 1 3 Funding Profiles of Three Researchers at Indiana University NSF Data GeoffreyFox nsf BethPlale nsf MichaelMcRobbie nsf Time frame 1978 2010 Region s Indiana University Topical Area s Informatics Miscellaneous Analysis Type s Co PI Network Grant Award Summary Science of Science Sci Tool User Manual Version Alpha3 49 It is often useful to compare the profiles of multiple researchers within similar disciplinary or institutional domains For this comparison we use the complete funding profiles of three Indiana University researchers as retrieved via NSF s
148. network Diameter Calculates the length of the longest shortest path between pairs of nodes in a network Average Shortest Path Calculates the average length of the shortest path between pairs of nodes in a network Shortest Path Distribution Builds a histogram of the lengths of shortest paths between pairs of nodes in a network Node Betweenness Centrality Appends a value to each node which correlates to the amount of shortest paths that node resides on The more shortest paths between node pairs a certain node resides on the higher its betweenness centrality Weak Component Clustering Extracts the N largest weakly connected components of a network Science of Science Sci Tool User Manual Version Alpha3 17 Global Connected Components Calculates the number of connected components or subgraphs with a path between each pair of nodes Extract K Core Extracts the kth K Core from a graph The kth K Core is what remains of the graph after every node with fewer than k edges connected to it is removed from the graph recursively Annotate K Coreness Appends to each node the K Core that node belongs to HITS Computes authority and hub score for every node o Weighted amp Undirected Clustering Coefficient Calculates the degree to which nodes tend to cluster together and then appends that value to each node Nearest Neighbor Degree Strength vs Degree Degree amp Strength Average Weight vs End point D
149. nfrastructure for Network Science Center SLIS Indiana University Bloomington Search Edit Profile Admin About Logout Download Results Download 20000 records starting at record fi from the following databases Select all downloads Medline Database ea r Medline MeSH heading table m r Medline MeSH qualifier table Medline author table Medline co author table nwb format rm E Medline master table File Edit View Favorites Tools Help Y En NIH Database CD Writing Tasks Y DL NIH master table fr PE AE aeu A rs s BRI IR EN File and Folder Tasks om sa NSF Database El Medline co author table nwb format csv 627 KB 7 NSF co investigator table nub format E CJ Make a new folder El Medline master table csv 13 986 P MEN Fi Publish this folder to the El Medline MeSH heading table csv 3 453 KB Web El Medine MeSH qualifier table csv 853 KB USPTO Database BONIH master table csv 5 189 KB USPTO Patent Cooperation Treaty table fr Other Places BNSF co investigator table nwb format csv 19 KB usPTO agent table BLNSF master table csv 1 303 KB uspro assignee table Er O scentometrics amp L USPTO co inventor table nwb format csv 18 KB r USPTO citation mU format rz B My Documents amp hyusprO agent table csv 20 KB T USPTO daims table HE My Network Places A UsPTO_assignee_table csv 23 KB r USPTO co inventor table nwb format m amp yuspro citation table nwb
150. ning the command line java version If not already installed on your computer download and install Java SE 5 or 6 from http www java com en download index jsp To uninstall the Sci Tool simply delete yoursci2directory This will delete all sub directories as well so make sure to backup all files you want to save Please cite the tool as Sci Team 2009 Science of Science Sci Tool Indiana University and SciTech Strategies http sci slis indiana edu Science of Science Sci Tool User Manual Version Alpha3 7 2 2 User Interface The general Sci Tool user interface is shown in Figure 2 3 It consists of a Menu on top Console below Data Manager right and Scheduler lower left explained subsequently s Sci2 Tool bade File Data Preparation Preprocessing Analysis Modeling Visualization Help El Console General j Cl tint Data Manager Er 4 t n SS TOT Temporal j IST Data CrUsers UserDesktop s6P Tutori Load and Clean ISI File w Geospatial H 361 Unique ISI Records Authoris Micah Linner Implernenteris Micah Integrator s3 Micah Lint Networks Extract Top Modes Documentation 1 httpsz nwb slis indiana edu com munity7 n Loa Tapical k li S Extracted Co Authorship Network Es Author infarmatian Extract Nodes Shove or Below Value Delete Isolates Loaded 361 records Removed 0 duplicate records Author names have been normalized Extract Edges Shov
151. ns of the open source Network Workbench NWB Tool http nwb slis indiana edu The utility of these infrastructures is then exemplarily demonstrated in three studies a comparison of the funding portfolios and co investigator networks of different universities an examination of paper citation and co author networks of major network science researchers and an analysis of topic bursts in streams of text The paper concludes with a discussion of related work that aims to provide practically useful and theoretically grounded cyberinfrastructure in support of computational scientometrics research practice and education e Jorge G ns Thom ley Ste agn i dv TZ Gibbons e M T risellopa Mm Val P Johannes Gehrke M elissa Hines 4 i i 5 Bik Kwoon Tye am gt JAahn bgi i hes Raster igan e a E a 1 auryTigner om Batt 1 pep Sandip Tina YN Jun hy al j aint T dcr mante e T fin z saa aptat ms a ds gt Ja uri Dansel Ralph step s e rated ct Abruna 1 Amis src ay e av n i ah J 5 i Fm Ratner ye Charles Drisco a Sere nee F games athna 9 avi M pee Eva TM SR Tang e x 3 E TED Stepha Left asid ex opr Trac ni TT phen i esi Ra imakrist hn Guckenhainle ashay P ir An Bro 1 usus Ferri foa Ind Bors Ica ard Swartz x chart e Philip Sa otter Fauan s xe Irc ast _ Gennady Samorodnitsky WA v Hasmiri Complete network left and
152. oe we Z Tanin me Par a zug d Beorcttetld Meere Legend Node Color Code dge Color Code Ma ec Sudo vus so m Mapping the Evolution af 901 95 MENEEENENNN P Note MEE IRNNLRECUSRENAT n o M osoo EM Co Authorship Networks s c5 o 0 0 Weimas Ke Lalitha Vievanath 2 Katy Bamar oa oa infoVis Lab indiana Lineversty Virginia Tech Schulmen Georgia 3 Badra Tech zen e Bell Labs o Legend Node Color Code Edge Color Code Nodes Authors m So END Mapping the Evolution of Node ze Numb f pape bkshed emm Node color Number ofciistions 20 29 NN 96 00 MENENENEEN Co Authorship Networks 30 30 s t Weimao Ke Lalitha Visvanath amp Katy Borner Edges Co authorship relations 40 49 M infoVis Lab Indiana University Edge color Year of first co authorship 50 ee Displayed Yoor 2004 2004 Mapping the Evolution of Co Authorship Networks Reference Ke Weimao Borner Katy and Viswanath Lalitha 2004 Analysis and Visualization of the IV 2004 Contest Dataset Poster Compendium IEEE Information Visualization Conference pp 49 50 2004 Data and detailed workflows are at http iv slis indiana edu ref ivO4contest Animated gif is at http iv slis indiana edu ref ivOAcontest Ke Borner Viswanath gif Science of Science Sci Tool User Manual Version Alpha3 89 6 3 2 Studying the Emerging Global Brain Analyzing and Visualizing the Impact of Co Authorship Teams 2005
153. olization of entities including enhancement such as representing high density areas with a new city symbol Geometric generalization aims to solve the conflict between the number of visualized features the size of symbols and the size of the display surface Cartographers dealt with this conflict intuitively in part until researchers like Friedrich T pfer attempted to solve them with quantifiable expressions 4 8 Topical Analysis What The topic or semantic coverage of a unit of science can be derived from the text associated with it Topical aggregations e g over journal volumes scientific disciplines or institutions are common Topic analysis extracts the set of unique words or word profiles and their frequency from a text corpus Stop words such as the and of are removed Stemming can be applied Co word analysis identifies the number of times two words are used in the title keyword set abstract and or full text of a paper The space of co occurring words can be mapped providing a unique view of the topic coverage of a dataset Similarly units of science can be grouped according to the number of words they have in common Salton s term frequency inverse document frequency TFIDF is a statistical measure used to evaluate the importance of a word in a corpus The importance increases proportionally to the number of times a word appears in the paper but is offset by the frequency of the word in the corpus Dimensionality r
154. on Networks DrL VxOrd with default values To keep only the strongest edges run Preprocessing Networks Extract Top Edges using parameters Top Edges 1000 and leave the others at their default values Once edges have been removed the network can be visualized by running Visualization gt Networks gt GUESS In GUESS run the following commands gt for node in g nodes to position the nodes at the DrL calculated place node x node xpos 40 node y node ypos 40 Science of Science Sci Tool User Manual Version Alpha3 60 V VV V resizeLinear references 2 40 colorize references 200 200 200 0 0 0 resizeLinear weight 1 2 g edges color 127 193 65 255 and set the background color to white to re create the visualization The result should look something like the one in Figure 5 14 Figure 5 14 Undirected weighted word co occurrence network for FourNetSciResearchers dataset 5 1 5 Studying Four Major NetSci Researchers ISI Data using Database New versions of the Sci Tool include the ability to load ISI files into a database While the initial loading can take quite some time for larger datasets see Sections 3 3 Memory Allocation and 3 4 Memory Limits it results in vastly faster and more powerful data processing and extraction The database functionality also allows users to compose and extract custom SQL queries which will be documented in later versions of this documentation
155. on of Author and Paper Networks 2004 sse 97 Oo ACCURACY SEU ClO aaa aac tae aac A 98 6 5 1 Mapping the Backbone Or SCIENCE A 2005 sete odes nte tene Para tte a he D tv vad t de ene red 98 6 5 2 Toward a Consensus Map of SCIENCE 2009 smeceris E tein ect ven an evo ze U Lal sud eo ves eh Ota tuusvoustabaactundenses 99 66 Databases and More tr 100 6 6 1 The Scholarly Database and Its Utility for Scientometrics Research 2009 esee 100 6 6 2 sRererence Mappe rreri ii ue ou kDOde om mete ub unn cec uM x b odium ped vcridiedua us suiou bendi rea nscutau decus ies 101 6 6 3 Rete Netzwerk Red Analyzing and Visualizing Scholarly Networks Using the Scholarly Database and the Network Workbench Tool 2009 3 5 teres ct GU ode tete cS ted ee be o eer re Ee Reit rre PET ret dead 102 67 Interactive Onlibie ServiC BS adt ense tubae doceat vens voulu denote or dedte loue meom todo ute ose d UE c ec 103 6 7 1 The NIH Visual Browser An Interactive Visualization of Biomedical Research 2009 103 6 7 2 Interactive World and Science Map of S amp T Jobs 2010 eeesssseeeeeeeeeennnennnenee nnne 104 7 Extending the Sci Tool eroe coe toii oar eaae uenswsc ctviecsbataciesssedcscedeesinessevcavss 105 Jd CISNE BASIES uscite tonii nde osea ean ros ae edenda La muros fide LM D de EPOR E satan oec La Coe De MUT Ce od uas 105 7 2 Redd New D
156. onic i Author Name k Bener fej Region Mame inst SE e Color By Total Awd Tok KJ Color Scaling Linear i Color Range Yelow to Blue KJ Cancel Area color coding Circle coding MN 2 7 3 2 O 9 f TN O MA 2E P G A Geo Area Linear Geo Map Colored Region Annotation Style Region Color Linear Me Ce Ap M Ba quor Lambert Conformal Conic Projection Toul Amd Tot 1 Rm a m NR posu P am ve 245 Oct 14 2009 06 29 35 PM HE n 9 421710 Joseph Biberstine 443267 9 421 710 18 400 134 Joseph Bibersune 15 400 154 Figure 5 34 US map with area color coding and circle coding for aggregated data over states There are two available size scaling options Linear and Logarithmic We recommend using logarithmic scaling for larger datasets Circle coding Logarithmic Circle coding Linear O Geo Map Circle Annotation Style Area Logarithmic Geo Map Circle Annotation Style Area Linear Albers Equal Area Conic Projection Awd Tot Albers Equal Area Conic Projection Awd Tor on 15 2009 05 31 47 PM ZON 38 124 on 15 2009 05 30 59 PM Ow 38 124 K Borner 392 920 K Borner l j 2 043 847 LN 7 4 049 570 LN LZ 4049570 Yu S S JA rd d Figure 5 35 US geospatialmap of state level data with logarithmic circle size scaling left and circle linear size scaling right Science of Science Sci Tool User Manual Version Alpha3 85 6 Sample Science Studies am
157. onnie Huang Russell J Duhon Elisha F Hardy amp Katy Borner This map highlights the research co authorship collaborations of the Chinese Academy of Sciences with locations in China and countries around the world The large geographic map shows the research collaborations of all CAS institutes Each smaller geographic map shows the research collaborations by the CAS researchers in one province level administrative division Collaborations between CAS researchers are not included in the data On each map locations are colored on a logarithmic scale by the number of collaborations from red to yellow The darkest red is 3 395 collaborations by all of CAS with researchers in Beijing Also flow lines are drawn from the location of focus to all locations collaborated with The width of the flow line is linearly proportional to the number of collaborations with the locations it goes to with the smallest flow lines representing one collaboration and the largest representing differing amounts on each geographic map Jt 3 BEP 4 BS H L af pc iret de 83 5T 7C 2L 2 AR HAA PAE BE SBE I GOO 1 ell 3 Duhon Elisha F Hardy Katy Born di University USA Gbos amp d em xam par u k PERF q 45 4 3 wT H di f Te m B 155 h Ba me E35 E P HRS Pf SRA REC 8 BC PT d XT PAB A LAT T PAT BE I A BE P HR SA lt Collaboration and knowledge diffusion via co author networks Reference Duhon Russell Jackso
158. orithm services ClShell provides an environment which makes it easy for users to interact with a set of algorithms in the form of an executable tool This environment includes a Menu Manager which allows users to invoke algorithms a Data Manager which serves as workspace to hold data while users run a series of algorithms on it a scheduler which monitors algorithms as they run a conversion service which converts data between various types so algorithms can operate on data in the format of their choice and several other services Since all of this is provided by default developers can maintain focus on the algorithms they wish to implement without having to reinvent the entire supporting infrastructure Currently the ClShell environment is implemented as an Eclipse based desktop tool but the ClShell interfaces are defined in such as way that the environment could be implemented in a variety of ways e g a web based service Since ClShell is so closely tied to OSGi many references will be made to OSGi in the developer documentation To fully understand the details of how ClShell works it is often necessary to understand certain aspects of OSGi however most developers should be able to begin working with ClShell without understanding OSGi 7 2 Read New Data Data formats are documented at https nwb slis indiana edu community n DataFormats HomePage and their relationships can be seen in Figure 2 4 7 3 Creating and Sharing New Algorithm
159. orks Reference Zoss Angela amp B rner Katy forthcoming Mapping Transdisciplinary Tobacco Use Research Centers Publications American Journal of Public Health special issue on Modeling in Tobacco Control Science of Science Sci Tool User Manual Version Alpha3 88 6 3 Local and Global Science Studies 6 3 1 Mapping the Evolution of Co Authorship Networks 2004 By Weimo Ke Katy Borner amp Lalitha Viswanath The presented work aims to identify major papers and their interrelations topic trends over time as well as major authors and their evolving co authorship networks in the IV Contest 2004 data set Paper citation co citation word co occurrence burst analysis and co author analysis were used to analyze the data set The results are visually presented as graphs static Pajek visualizations and animated network layouts Edges Co ashocshig tatows Edoe odo Tear at irstoz avmersip Tat Siosebra Denr Tie H hen Ade 1 La Passa Mis andoy 5i i stele zi 5 ren n t iam amoreoere B Ba i Ln i I4 C Robertson uyer P er m e gt Ws ey Masinter anesse e He T Heme Parisera A m Le xg P Howe ede P om 4 lending ee me i Disi m j j Lagny Cann Price Stes at en hen Bur b Em Bias 5o Ser pie ch L x nass Actu Dunenire MANN D o 3 L5 dida 3omberg 4 o d mar lt i disch twal Bederson Benber on F a arnso erpsd Hs rh JB yy Plescant m UN x om
160. p Online Services 6 1 Science Dynamics 6 1 1 Mapping Topics and Topic Bursts in PNAS 2004 By Ketan K Mane amp Katy Borner Scientific research is highly dynamic New areas of science continually evolve others gain or lose importance merge or split Due to the steady increase in the number of scientific publications it is hard to keep an overview of the structure and dynamic development of one s own field of science much less all scientific domains However knowledge of hot topics emergent research frontiers or change of focus in certain areas is a critical component of resource allocation decisions in research laboratories governmental institutions and corporations This paper demonstrates the utilization of Kleinberg s burst detection algorithm co word occurrence analysis and graph layout techniques to generate maps that support the identification of major research topics and trends The approach was applied to analyze and map the complete set of papers published in PNAS in the years 1982 2001 Six domain experts examined and commented on the resulting maps in an attempt to reconstruct the evolution of major research areas covered by PNAS A 9 e expression regulation molecular weight A ap kinetics an inhibiti cultured d r inhibition c pomeren chain reaction Zo Prom Y antigen transfection un j 93 F F i Pem homology x antibodies restrict
161. prefuse data Table e file lext icsv file textinsf e file text referbib Figure 2 4 Visualization of compatible data formats Science of Science Sci Tool User Manual Version Alpha3 10 2 4 Saving Visualizations for Publication The Sci Tool supports various image output formats To save image files created by visualizations such as Horizontal Bar graph Geo Map or Circular Hierarchy right click on the PostScript file in the data manager and then click Save Select PostScript and then save the file to your desired directory bial Data Manager x P S E 126 Unique ISI Records we EE ZIP codes For addresses in Reprint Address is added 78 571 Pick the Data um atitude amp Longitude From ZIP code Se m PostScript With Latitude amp Longitude From ZIP cade Raster Image view With Rename Discard Select Cancel Details gt gt Figure 2 5 Saving a PostScript file Adobe PostScript files require a special interpreter in order to be viewed One such interpreter is GSview which requires the Ghostscript software available online Ghostscript 8 64 http pages cs wisc edu ghost doc GPL gpl864 htm GSview 4 9 http pages cs wisc edu ghost gsview get49 htm When in GUESS use File gt Export Image to export the current view or the complete network in diverse file formats such as jpg png raw pdf gif etc 2 5 Sample Datasets Meaning
162. program to add a title and legend The image below was created using Photoshop and label sizes were changed as well Science of Science Sci Tool User Manual Version Alpha3 57 Joint Co Authorship Network tanley Wasserman d Node Size amp Color Edge Size amp Color Number of Papers Number of Times Co Authored 127 Eugene Garfield d 35 5 1 E cHHE HE 1 127 1 33 Figure 5 11 Undirected weighted co author network for FourNetSciResearchers dataset 5 1 4 3 Cited Reference Co Occurrence Bibliographic Coupling Network In Sci Tool a bibliographic coupling network is derived from a directed paper citation network see section 4 9 1 1 Document Document Citation Network Select the paper citation network of the FourNetSciResearchers dataset in the Data Manager Run Data Preparation gt Text Files gt Extract Reference Co Occurrence Bibliographic Coupling Network and the bibliographic coupling network becomes available in the Data Manager Running Analysis Networks Network Analysis Toolkit NAT reveals that the network has 5 335 nodes 5 007 of which are isolate nodes and 6 206 edges Edges with low weights can be eliminated by running Preprocessing gt Networks Extract Edges Above or Below Value with parameter values lll Extract Edges Above or Below Value x Extract all edges with an attribute above a certain number Extract From this number 4 0 e Below e Numeric Attribute weight
163. progress of running algorithms 2 3 Data Formats In March 2010 the Sci Tool supports loading the following input file formats e GraphML xml or graphml e XGMML xml e Pajek NET net e Pajek Matrix mat e NWB nwb e TreeML xml e Edgelist edge e Scopus csv scopus e NSF csv nsf e CSV csv e ISI isi e Bibtex bib e Endnote Export Format enw and the following network file output formats e GraphML xml or graphml e Pajek MAT mat e Pajek NET net e NWB nwb e XGMML xml These formats are documented at https nwb slis indiana edu community n DataFormats HomePage In total there are 26 external and internal data formats and 35 converters their relationships can be derived by running File gt Converter Graph and plotted as shown in Figure 2 4 Note that some conversions are symmetrical double arrow while others are one directional arrow filetext edge file text nwb file text treeml xml e db nsf B prefuse data Tree file text grabhmit xml file application pajeknet B prefuse data Graph db any e file licatian palekmeai edu uci ics jung graph Graph ile application pajekm B edu berkeley guir prefuse graph Graph dbisi tile text jpg e file text xgmml xml B ijava awLimage Bufferedlmage Wfile text scopus file text bibtex P file text grace e file text ps e fle text plot file textiisi e
164. puter science near the top of the map which has strong linkages to mathematics and engineering Just like a map of the world can be used to communicate the location of minerals soil types political boundaries population densities etc a map of science can be used to locate the position of scholarly activity The profiles for the U S NIH National Institutes of Health and NSF National Science Foundation are shown below and were calculated by matching the principal investigators and their institutions from grants funded in 1999 to first authors and institutions of papers indexed in 2002 This type of paper to grant matching will produce some false positives On the whole however it is a conservative approach in that it only considers a single time lag between funding and publication 3 years in this case and it does not match on secondary authors The 14 367 NIH matches and 10 054 NSF matches are large samples ensuring that the aggregated profiles are representative of the actual funding profiles of the agencies It serves as a good example of how journal level or disciplinary maps can be used to display aggregated information obtained from paper level analysis Law Math 4 Funding Patterns of the National Science Foundation left and the National Institute of Health right Reference Boyack Kevin W Borner Katy amp Klavans Richard 2009 Mapping the Structure and Evolution of Chemistry Research Scientometrics
165. r the three dots hit Enter again Note The Interpreter tab will have gt gt gt as a prompt for these commands It is not necessary to type gt at the beginning of the line You should type each line individually and hit enter to submit the commands to the Interpreter For more information refer to the GUESS tutorial at http nwb slis indiana edu Docs GettingStartedGUESSNWB pdf This way nodes are linearly size and color coded by their GCC and edges are green as shown in Figure 4 15 left Any field within the network can be substituted to code the nodes To view the available fields open the Information Window Display gt Information Window and mouse over a node Also note that each ISI paper record in the network has a dandelion shaped set of references Science of Science Sci Tool User Manual Version Alpha3 54 The GUESS interface supports pan and zoom node selection and details on demand see GUESS tutorial For example the node that connects the Barabasi Vespignani network in the upper left to Garfield s network in the lower left is Price 1986 Little Science Big Science The network on the right is centered on Wasserman s works inl x Guess isualization i Ini x Fie Edit Display Layout Help File Edit Display Layout Help Barabasi AL 1999 S 8 Field Value color 0 0 0 255 KON E fixed false 43 global
166. reduce the number of bursts select the burst table in Data Manager e ER Burst detection analysis Publication Year Cited References maximum burst level 1 Right click it and choose View Save View With Rename Discard The option will open as a csv file If it isn t already open the file in Excel and sort the table according to burst strength Delete all but the top 50 bursts save the file a csv and re load it in the Sci Tool Load new csv burst file choosing Standard csv format in the pop out box Science of Science Sci Tool User Manual Version Alpha3 76 The File you have selected can be loaded using one or more of 3 the following Formats Please select the Format you would like to try Load as Standard csv Format NSF csv Formak Scopus cs format Select Cancel Details gt gt Select the csv file in Data Manager and visualize it with the Visualization gt Temporal gt Horizontal Bar Graph using the workflow mentioned above l l l l I I I l I I I I i i i i i i i j i l I I l 1 l l l CHINA a I I I I l I I i IMOTEGHNOLUGY LDnnel MB I I 1 1 1 l l ee I I 1 I I 1 NOACZQOSCIENMUE mmm 1 i i 1 1 j ENIEKRDISCIPLIKARIDIY i i i i cd eee i U OUTPUT LILDL n DIDIO RII O I I I I CODILABORATION Lc I I I I l l I CITATIONS LL
167. relationship between the log of the citation counts and the log of the distance The paper concludes with a discussion of these results and an outlook for future work Harvard U x Stanford U U Calif SF MIT Hohns Hopkins U N i End edi A 1 505 1 771 Q 1 772 2 097 O 2 098 2 529 Q 2 530 3 039 3 040 4 172 LUU p 1 1982 1986 1 94 R2 91 5 1987 1991 2 11 R2 93 594 1992 1996 2 01 R2 90 8 1997 2001 2 01 R 90 7 TU LOU log of number of institutions citing each other log of geographic distance Geographic location and number of received citations for the top 500 institutions top and log log plot showing the variation of the number of institutions that cite each other over geographic distance among them for each of the four time slices The distance was calculated by applying the Euclidean form formulae to xy coordinates obtained using the Albers projection 1 5 units of geographic distance equal approximately 100 miles bottom Reference Borner Katy Penumarthy Shashikant Meiss Mark amp Ke Weimao 2006 Mapping the Diffusion of Information among Major U S Research Institutions Scientometrics Vol 68 3 415 426 http ivl slis indiana edu km pub 2006 borner mapdiff pdf Science of Science Sci Tool User Manual Version Alpha3 92 6 3 5 Research Collaborations by the Chinese Academy of Sciences 2009 By Weixia B
168. ric studies as well as combination and modeling studies The following chapter describes the workflows to conduct scientometric studies of each type and at each scale Tables 1 1 and 1 2 in Section 1 Introduction show examples of studies in category several of which can be found in Chapter 6 Sample Science Studies amp Online Services 5 1 Individual Level Studies Micro 5 1 1 Mapping Collaboration Publication and Funding Profiles of One Researcher EndNote and NSF Data 5 1 1 1 Endnote KatyBorner enw Time frame 1992 2010 Region s Indiana University University of Technology in Leipzig University of Freiburg University of Bielefeld Topical Area s Network Science Library and Information Science Informatics and Computing Statistics Cyberinfrastructure Information Visualization Cognitive Science Biocomplexity Analysis Type s Co Authorship Network Many researchers tools and online services use EndNote to organize their bibliographies To analyze an individual researcher s collaboration and publication profile load an EndNote file into the Sci Tool including their entire CV To generate a research profile for Katy Borner load Katy Borner s EndNote file at yoursci2directory sampledata scientometrics endnote KatyBorner enw and run Data Preparation gt Text Files gt Extract Co Author Network using the parameter lll Extract Co Author Network X Extracts a co authorship network From one of
169. rom http www analytictech com ucinet ucinet 5 description htm B rner K S Penumarthy et al 2006 Mapping the Diffusion of Information Among Major U S Research Institutions Scientometrics Dedicated issue on the 10th International Conference of the International Society for Scientometrics and Informetrics 68 3 415 426 Brandes U and D Wagner 2008 Analysis and Visualization of Social Networks Retrieved 7 15 08 from http visone info Chen C 2006 CiteSpace II Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature JASIST 54 5 359 377 Cs rdi G and T Nepusz 2006 The igraph software package for complex network research Retrieved 7 17 08 from http necsi org events iccs6 papers c1602a3c126ba822d0bc4293371c pdf Cyberinfrastructure for Network Science Center 2008 Cyberinfrastructure Shell Cytoscape Consortium 2008 Cytoscape Retrieved 7 15 08 from http www cytoscape org index php Davidson G S B N Wylie et al 2001 Cluster Stability and the Use of Noise in Interpretation of Clustering IEEE Information Visualization San Diego CA IEEE Computer Society 23 30 De Roure D C Goble et al 2009 The Design and Realisation of the myExperiment Virtual Reserach Environment for Social Sharing of Workflows Future Generation Computer Systems 25 561 567 Elnashai A B Spencer et al 2008 Architectural Overview of MAEviz HAZTURK
170. rs and algorithm authors to integrate new and existing algorithms and tools that take in diverse data formats The OSGi http www osgi org component architecture and ClShell algorithm architecture http cishell org built on top of OSGi make this possible Cytoscape is also adopting an architecture based on OSGi though it will still have a specified internal data model and will not use ClShell in the core Moving to OSGi will make it possible for the tools to share many algorithms including adding Cytoscape s visualization capabilities to Network Workbench Several of the tools listed in the table above are also libraries Unfortunately it is often difficult to use multiple libraries or Sometimes any outside library even in tools that allow the integration of outside code Network Workbench however was built to integrate code from multiple libraries including multiple versions of the same library For instance two different versions of Prefuse are currently in use and many algorithms use JUNG the Java Universal Network Graph Framework We feel that the ability to adopt new and cutting edge libraries from diverse sources will help create a vibrant ecology of algorithms Although it is hard to discern trends for tools which come from such diverse backgrounds it is clear that over time the visualization capabilities of scientometrics tools have become more and more sophisticated Scientometrics tools have also in many cases become more user
171. rt R see example below In order to manually examine and if needed correct the list of unique authors open the merge table e g in a spreadsheet program Sort by author names and identify names that refer to the same person In order to merge two names simply delete the asterisk in the last column of the duplicate node s row In addition copy the uniquelndex of the name that should be kept and paste it into the cell of the name that should be deleted Table 4 1 shows the result for merging Albet R and Albert R where Albet R will be deleted yet all of the nodes linkages and citation counts will be added to Albert R Table 5 1 Merging of author nodes using the merge table label timesCited_ numberOfWorks Abt HA MbeR 71 7 Jo Abe R e 1 60 A merge table can be automatically generated by applying the Jaro distance metric Jaro 1989 Jaro 1995 available in the open source Similarity Measure Library http sourceforge net projects simmetrics to identify potential duplicates In the Sci Tool simply select the co author network and run Scientometrics gt Detect Duplicate Nodes using the parameters Attribute to compare on label Science of Science Sci Tool User Manual Version Alpha3 56 Merge when this similar 0 95 Create notice when this similar 0 85 Number of shared first letter 2 The result is a merge table that has the ver
172. s Returns the top N rows of a table by some sorting criteria o Aggregate Data Aggregates summarizes the input table based on values in a Grouped On column provided by the user e Temporal o Slice Table by Time Slices a table into groups of rows by time e Geospatial o Extract ZIP Code Extracts a ZIP code from a given address e Topical o Normalize Text Replaces spaces and punctuations from a field with a standard delimiter of the user s choosing e Networks o Extract Top Nodes Extracts the top N nodes from a graph based on a given attribute Science of Science Sci Tool User Manual Version Alpha3 16 Analysis o Extract Nodes Above or Below Value Extracts nodes with an attribute above or below a certain value o Delete Isolates Removes nodes which are not connected to any other in the graph o Extract Top Edges Extracts the top N edges from a graph based on a given attribute o Extract Edges Above or Below Value Extracts all edges with an attribute above or below a certain number from a graph o Remove Self Loops Removes edges whose source and target nodes are equivalent from a graph o Trim by Degree Deletes edges at random until each node has at most N edges o MST Pathfinder Network Scaling Prunes a network using the MST Pathfinder algorithm o FastPathfinder Network Scaling Prunes a network using the Fast Pathfinder algorithm o Snowball Sampling N nodes Picks a random node and traverses it
173. s Their size ranges from several bytes to terabytes trillions of Science of Science Sci Tool User Manual Version Alpha3 23 bytes of data They might be high quality materials curated by domain experts or content retrieved from the Web Based on a detailed needs analysis and deep knowledge about existing databases the best suited yet affordable datasets have to be selected filtered integrated and augmented It may also be necessary for networks to be extracted see section 4 7 Network Analysis for details The Sci Tool supports the loading and pre processing of different types of data such as publication funding and patent datasets given in different formats as discussed next 4 2 1 Datasets Publications This section discusses different input formats for publication data In each data format type we list and color code data elements that are commonly used in statistical temporal geospatial topical and ne analyses 4 2 1 1 Refer BibIX enw Refer was one of the first digital reference managers developed by Bell labs in 1978 Refer s file output format has since been adopted by many tools and web services including BibIX for UNIX early versions of EndNote CiteSeerX Zotero Data in refer formatted files is commonly used for the following types of analyses O Statistical Attributes o 961 Times Cited o Temporal Analysis o 968 Date o 96V Volume o 96D Year Published o Geospatial Analysis o 96 Author Addres
174. s o 96C Place Published o Topical Analysis o 96X Abstract J Journal 96K Keywords F Label 96 Short Title T Title o Network Analysis o 96A Author O 0 O0 0 0 4 2 1 2 BibTeX Like Refer BibTeX provides a standard reference file format used by many tools and web services including CiteSeerX citeulike Bibsonomy and Google Scholar Data in BibTex files is commonly used for the following types of analyses o Temporal Analysis o date o bibdate o date added o date modified o issue o month o timestamp o volume O year o Geospatial Analysis o address Science of Science Sci Tool User Manual Version Alpha3 24 o location o Topical Analysis o abstract o booktitle o conference o description o journal o keywords o Network Analysis o author o organization 4 2 1 3 ISI Web of Science Thomson Reuter s Web of Knowledge WoS is a leading citation database cataloging over 10 000 journals and over 120 000 conferences Access it via the Web of Science tab at http www isiknowledge com note access to this database requires a paid subscription Along with Scopus WoS provides some of the most comprehensive datasets for scientometric analysis To find all publications by an author search for the last name and the first initial followed by an asterisk in the author field For example to find papers by Eugene Garfield enter Garfield E in the author field The search yielded 1 529 re
175. s Stogai Clugignng in arce Titia Abstract Keywords Das hirt GER AND Frere AND in Aricie Tita Abstract Keywords Sz earen cin Docurnint T All Linit tm Date Rangt rshi m Published AJ years t Present C Added to Scopus in the last E days Subject Araas U Life Sci nces 4 300 ddar E Health Science 6 400 virer E Physical Seienees 7 260 das Social Sciences amp Humanities 5 300 Ja ludaz DODI Medline chari keh 3 Search Clear e Enarch History Claxe You have mat pechormed any searches in this sessir eT Note This Search History wil contar the latest 50 searches you perform im this session Search My alerts ZI m Ey He a bein a brihi i a ia NE TUS Bh sate Mise Dvn ina na re ia Find net dr pee xe myrt DC Maho Tieren Soogus Basi Scarch E tiis eligi xj input Results TITLE ABS BEY Wais Strogatz CI br a Coeificient Mozilla Firefox s o File Edi Yew Hetory Binmas ook Help BB c x tat 2 hip roeem stp comle rir s ur scri ficuc l a TU 33 r E E3 Sepur Resuks TITLE ABS EEY Wa a Abstract Refs ven at Publisher iiir 5j art na E amp hbacrezr 087101 np 1 4 iod 11 Approximating clustering coefficient Schank T 2005 Journal of Graph l and transitivity Wagner D AWsocitheos and Abstract Refs view at Fublisber Show ApecabDons 9 2 err pp 266 205 Tu
176. s edges iteratively until N nodes are extracted o Node Sampling Extracts N random nodes and their intervening edges and then deletes isolates o Edge Sampling Extracts N random edges and their target and source nodes o Symmetrize Turns a directed network into an undirected network Dichotomize Trims edges above equal to or below a certain value o Mutltipartite Joining Joins a multipartite graph for one node type across another node type O Temporal o Burst Detection Determines periods of increased activity in a table with dates timestamps Geospatial o Geocoder Converts place names to latitudes and longitudes Topical o Burst Detection Determines periods of increased activity in a table with dates timestamps Networks o Network Analysis Toolkit NAT Calculates basic network statistics o Unweighted amp Undirected Node Degree Calculates the amount of edges adjacent to a node and then appends that value to each node Degree Distribution Builds a histogram of the degree values of all nodes K Nearest Neighbor Java Calculates the correlation between the degree of a node and that of its neighbors and then appends that value to each node Watts Strogatz Clustering Coefficient Calculates the degree to which nodes tend to cluster together and then appends that value to each node Watts Strogatz Clustering Coefficient over K Correlates the clustering coefficient and the degree of the nodes of a
177. s weights for the number of co written papers and the publication years of the earliest and most recent collaboration Extract Author Citation Network Extracts a weighted directed network with authors as nodes and edges from a citing author to a cited author Nodes include all data from the PERSON table and the number of documents authored in the current dataset Extract Document Citation Network core only Extracts an unweighted directed network with documents as nodes and edges from a citing paper to a cited paper Only those documents with full entries in the dataset are included in the network Nodes include all data from the DOCUMENT table Extract Document Citation Network core and references Extracts an unweighted directed network with documents as nodes and edges from a citing paper to a cited paper Nodes include all data from the DOCUMENT table Extract Source Citation Network core only Extracts a weighted directed network with sources journals as nodes and edges from a citing source to a cited source Citations are via documents within sources and only those sources represented by documents within the dataset are included in the network Nodes include all data from the SOURCE table and edges are weighted by the number of citations between sources Extract Source Citation Network core and references Extracts a weighted directed network with sources journals as nodes and edges from a citing source to a cite
178. see section 2 4 Saving Visualizations for Publication and Figure 5 31 Science of Science Sci Tool User Manual Version Alpha3 82 File Preprocessing Modeling Analysis Visualization Scientometrics Help E Scheduler Figure 5 30 Geospatial workflow with usptoinfluenza csv data left and Geo Map parameters right Geo Map Circle Annotation Style was selected Author s Joseph R Biberstine Implementer s Joseph R Biberstine Integrator s Joseph R Biberstine Creates a map with circle annotations Circles are positioned sized and colored inside and outside according to columns in the input table Either or both kinds of coloring can also be disabled The table data for each dimension can be log scaled before processing Map eJ Author Name K Borner e ae E Latitude wed 0 0 0 7J 9 Latitude LongitudePatents Times Cited 47 16116 19 50496 0 083333 Longitude e 50 500998 4 47677 3 017857 11 5109084 10 45424 4783333 4 Size Circles By e 62 35873 96 5821 5 539206 21 5946148 1088318 0 266667 2 Size Scaling e 47 659651 13 34577 42 17 E NENNEN Color Circle Exteriors By 9 BSTS a 03 jr Color el EH usta 249162 1333931 1 617857 23 EC e France 4871245 171632 220006 9 5 4671245 171832 2201166 9 Exterior Color Range Green to Red Z e 374876 1398383 1599167 S Color Circle Interiors By None no inner color z O 35 36124 35 3 Interior Color Range Yellow
179. several supported File types snos C Cancel After generating Dr Borner s co authorship network run Analysis Networks Unweighted amp Undirected Node Degree to append degree information to each node To visualize the network run Visualization gt Networks gt GUESS and select GEM in the Layout menu once the graph is fully loaded The resulting network in Figure 5 1 was modified using the following workflow 1 Resize Linear gt Nodes gt totaldegree gt From 5 To 30 gt Do Resize Linear Note total degree is the number of papers 2 Resize Linear gt Edges gt weight From 1 To 10 gt Do Resize Linear Note weight is the number of co authored papers Science of Science Sci Tool User Manual Version Alpha3 44 Colorize gt Nodes gt totaldegree From Mt gt Do Colorize Colorize gt Edges gt weight From E To gt Do Colorize Object nodes based on gt gt Property totaldegree gt Operator gt gt Value 10 gt Show Label Type in Interpreter gt for n in g nodes n strokecolor n color te Hg The largest cluster in the network is outlined in black and represents one single paper with many authors Node Size amp Color Edge Size amp Color Total Degree NI her of Limes uussusw w pol er of Lime e a C o Authored B Huang Weixia Bonnie ooo imme n 176 14 22 4 3 1 1 14 1 e Carsten nissi Ebad Ursyn Anna e Izquierdo
180. sing gt Networks gt Delete Isolates Deleting isolates is a memory intensive procedure If you experience problems at this step refer to Section 3 3 Memory Allocation The FourNetSciResearchers dataset has exactly 65 isolates Removing those leaves 12 networks shown in Figure 6 right using the same color and size coding as in Figure 5 left Using GUESS Display gt Information Window reveals detailed information for any node or edge Here the node with the highest GCC value was selected Alternatively nodes could have been color and or size coded by their degree using e g gt g computeDegrees gt colorize outdegree gray black Note that the outdegree corresponds to the LCC within the given network while the indegree reflects the number of references helping to visually identify review papers The complete paper paper citation network can be split into its subnetworks using Analysis Networks Unweighted amp Directed Weak Component Clustering with the default values The largest component has 163 nodes the second largest 45 the third 24 and the fourth and fifth have 12 and 11 nodes respectively The largest component also called giant component is shown in Figure 4 16 The top 20 papers by times cited in ISI have been labeled using gt toptc g nodes gt def bytc n1 n2 return cmp nl globalcitationcount n2 globalcitationcount gt toptc sort bytc gt toptc reverse Science o
181. some commonly derived networks Note that the citation links are directed from old papers to current and future papers denoting the flow of knowledge For commonly studied network types are listed and exemplified on the right The extraction of these and other scholarly networks is explained below Papers A E written by authors x y z over 3 years Paper Paper Citation Network A gt C Each paper happens to have 4 references Papers are connected via direct citation links E Arrows represent information flow from B Pl 3 5 0 1 older papers to younger papers citation Author Author Co Author Network x and y co author papers A and E together y and z co author papers A and E A and B are co cited by C and D A and D are co cited by E X y Document Co Citation DCA Network C B A Reference Co Occurrence f Bibliographic Coupling Network references C and D are bibliographically coupled as they 2000 2001 2002 both cite reference A and B B Local citation counts within this dataset are given in black and global citation counts ISI times cited are given in green above each paper Figure 4 12 Sample paper network left and four different network types derived from it right Diverse algorithms exist to calculate specific node edge and network properties Node properties comprise degree centrality betweenness centrality or hub and authority scores Edge properties include durability reciprocity intensity
182. sualization gt Networks gt GUESS to open GUESS with the file loaded It might take some time for the network to load The initial layout will be random Wait until the random layout is completed and the network is centered before proceeding The GUESS window is divided into three parts 1 Information Window Examine node and edge attributes see Figure 4 12 left 2 Visualization Window View and manipulate network see Figure 4 12 top right 3 Interpreter Graph Modifier panels Analyze change network properties see Figure 4 12 bottom right Visualization GUESS B x File Edit Layout Script View Help 3 Medici 5 Field Value PF color cornflo amp fixed false height 10 0 EI Sa hi 6 mage amp label Medici oec labelcolor 0 0 0 labelsiee 12 labelvi false 8 l name n9 4 original Medici Centeron priorates 53 Y Color stroke cadetb Bennve style 2 NS totalities 54 Add visible true M Modify Field wealth 103 l Copy as Variable width 10 0 x 90 625 y 44 312 oh ww Object Property labelsize Operator Value Colour Show Hide Size Show Label Hide Label Change Label Format Node Labels Format Edge Labels Node Shape Center Change History Resize Linear Colorize Nodes labelsiz
183. sults on November ae 2009 500 of which can be downloaded at a time see Figure 4 2 PE Web of Kacwiedge v4 6 Web of Sclemce Homa Hona Firedist De p Wee Hhxy Bamra fei He GB C OX ny 1 hrpi thred corn Wir ceram atah input dape idate EE Ly S rers m F L Most Waited ife Getting ted i Labet Fisaclinas C BS Web of Kaceedge 5 4 5 Webs ISI Web of Knowledge alfa xl aii xj Eg nit j d Pat att Octo Started Latent Pado V Web o emerge v45 Wirth le a eee Web cf Science Web of Science Results Looking tor Search for 151 Proceedings atenta n hsec x JA kis orre sagrchabla from within Varo DIT Alien Sets tr ote Example O Brian C OR OBlria Proceeargs ction a The Centnect Autor Bat feature is a Asiam tool showing sath of pacers Uer writen by te tame person Te me more Raid help Greig papers by an ie Uii ya a Ae a pee mJ TU Eris 1529 4 Far 4159 Go pee Soniy Laet Date 9 xampis Oran C OR O8nan C Mais indyirmilen wane Pe it UC oec ell yen Heed help nding papers by an aui hor Usa 4 J Refine Results Eds s 8 ees ancy in Fublecatien Name x 7 Discover se WOO vend Exempe Cancer OR Jaumal of Cancer Firkiearth and Web of Science v l4 TM Clinical Oncology hpiore Tib woii ate nce Chen Gitet bciphrar Sieg Nen Dates maae come cora of ovr 10 000 Fa ve JOURMAL O
184. t is determined by the number of common references between documents o NSF Merge Identical NSF People Merges person names by removing punctualization and capitalization Extract Investigators Extracts a table containing one row per investigator from an NSF database Extract Awards Extracts a table containing one row per award from an NSF database Extract Organizations Extracts a table containing one row per organization from an NSF database Extract Co PI Network Extracts a weighted undirected network with principline investigators as nodes and edges between them if they co investigated an award in the database Nodes are appended with the humber of awards investigated total amount Science of Science Sci Tool User Manual Version Alpha3 15 across each investigated award start date of earliest award and expiration date of most recent award Edges are appended with the number of awards co investigated by the two investigators and the joint award total between the investigators o General Create Merging Tables Merge Entities Custom Table Query Custom Graph Query Extract Raw Tables From Database e Text Files o Remove ISI Duplicate Records Removes duplicate publications form ISI records based on ISI Unique ID attribute o Remove Rows with Multitudinous Fields Removes rows having at least N entries within a given field o Extract Directed Network General network extraction o E
185. t vated Getting Rated a Latest Headines Do you want Firefox to remember this password Remember Never for This Ste Not Now us SCHOLARLY DATABASE g SCHOLARLY DATABASE Cyberinfrastructure for Network Scsence Center SLIS Indiana University Bloomington Cyberinfrastructure for Network Science Center SLIS Indiana University Bloomington Search Edit Profile About Logout Search Edit Frofile About Logout our search returned 2 790 results in 0 450 seconds VY S ch 1f multiple tems are entered in a field they are VA Download ear avtormaticalls comnbined usg O Se breast cancer Matcher any record with Dreit or cancer i Total results per datebare HIH 70 Medline 2 400 USPTO 16 WEF 256 Creators ies ae Yow c n put AND bebreen terms to combine will Title AND Thus breast AMO cancer would only mare Results 1 through 20 recceds that contain beth terme Abstract Next gt gt Double quetation can be used te match compound All Text sustainabilny Verma tgs breast cancer retrieves records with Score out d the phrbrte breast cancer and not recorde where Source Authors Creators Year Whe of 4 99 185 braact amd cancer are both present but nat the Wet Veen d esas phrase Medine Os dons et l 2001 Financial sustainability 4 99 Last Year 2008 z f hporta hce of p n e query ca be Medine Gadtr 2007 Chemistry journals and sustainability 4 38 created by putting aa ber a
186. ters but are not as tightly connected as the journals that make up each cluster Then the clusters are labeled both by the content area shared by the journals in the cluster and by the overarching scientific domain for that cluster represented by Copyright 2008 The Regents of the University of California Terms of Use one of 13 colors NENNEN Biotechnology 7H Maps of science like this one can be used Search for Jobs Biotechnology bs to understand many different data sets Search and how they can be represented by topic Here we are looking at the topics that annear in inh nastinns from larne inh FARN b Humanes POWERED BY Google High level view of the Map of Science visualization The map is circular so areas of the map are repeated side to side as users scroll back and forth Postings are clustered by 13 main scientific domains at the high zoom level and the 554 subdisciplines at the lower zoom level Reference Zoss Angela Michael Conover amp Katy Borner Where Are the Academic Jobs Interactive Exploration of Job Advertisements in Geospatial and Topical Space Sun Ki Chai John Salerno Eds Proceedings of the 2010 International Conference on Social Computing Behavioral Modeling and Prediction SBP10 Springer http ivl slis indiana edu km pub zoss et al jobmaps pdf Hemmings J Wilkinson J What is a public health observatory Journal of Epidemiology and Community Health 57 2003 324 326
187. timescited 2218 height 50 0 amp image jindegree 2 inoriginaldataset true label Barabasi AL 1 labelcolor labelvisible false localtimescited 55 h name n237 CE outdegree 54 strokecolor cadetblue ras style 2 totaldegree 56 visible true width 50 0 x 160 94166564 Y y 595 74182128 55 Fesizetrinear gioDaldmiesciced Z 4U gt gt gt colorize globaltimescited 200 200 200 255 black ex gt gt gt g computeDegrees gt gt gt colorize indegree 200 200 200 0 0 0 gt a E gt gt gt fore in g edges SIL e color 127 193 65 255 p4 7 gt gt gt EROR f e D Interpreter Graph Modifier Select a state Sie Figure 5 9 Directed unweighted paper paper citation network for FourNetSciResearchers dataset with all papers and references in the GUESS user interface left and a pruned paper paper citation network after removing all references and isolates right The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of 1 Simply run Preprocessing gt Networks gt Extract Nodes Above or Below Value with parameter values Extract from this number 1 Below leave unchecked Numeric Attribute globalCitationCount The resulting network is unconnected i e it has many subnetworks many of which have only one node These single unconnected nodes also called isolates can be removed using Preproces
188. to Blue v 14 South Afric 28 4832 24 67699 0 333333 Lok Cancel Urera hen laikai k Legiak lia COURTE es ee el Sep 03 2009 12 25 24 AM Fcaty Eomer Figure 5 31 Geospatial map circle of USPTO patent influenza data Science of Science Sci Tool User Manual Version Alpha 3 83 To create a geospatial map with region coding select usptoinfluenza csv once again and then click Visualization gt Geo Map Colored Region Annotation Style Use the following parameters Geo Maps regian coloring Creates a map with colored region annotations Regions are identified and colored according ta columns in the input table The table data can be log scaled before processing Author Mame Katy Borner Region Name Creo Map Colored Resion Annotation Style Region Color Linear Mercator Projection lakik Oct 19 2009 09 28 22 FM OU Eat Bomer 2 7 fiz Figure 5 32 Geospatial map region colored of USPTO Patent influenza data Science of Science Sci Tool User Manual Version Alpha3 84 One can also create a US Geo Map with customized data by running the same workflow but selecting US States in Map see below B Geo Maps region coloring X Creates a map with colored region annotations Regions are identified and colored according to columns in Ehe input table The table data can be log scaled before processing Projection Lambert Conformal C
189. tself has been unprecedented because only recently has there been access to high volume and high quality data sets of scientific output e g publications patents grants and computers and algorithms capable of handling this enormous stream of data This article reviews major work on models that aim to capture and recreate the structure and dynamics of scientific evolution We then introduce a general process model that simultaneously grows coauthor and paper citation networks The statistical and dynamic properties of the networks generated by this model are validated against a 20 year data set of articles published in PNAS Systematic deviations from a power law distribution of citations to papers are well fit by a model that incorporates a partitioning of authors and papers into topics a bias for authors to cite recent papers and a tendency for authors to cite papers cited by papers that they have read In this TARL model for topics aging and recursive linking the number of topics is linearly related to the clustering coefficient of the simulated paper citation network a 14000 D 12000 JP 10000 af e iic win SIM 10000 8000 4 g 8000 sa PNAS t 5 E ifa SIM 2 6000 O 6000 ic p PNAS o p SIM 4000 c 2000 1 2000 0 0 1981 1986 1991 1996 2001 1981 1986 1991 1996 2001 Year Year Total number of actual and simulated papers p and authors a a and received citations amp c win
190. ual machine the amount of memory available to a Java application must be specified before the application starts The tool s default allotment of 350 Megabytes is a balance between providing enough memory for most uses of the tool while not causing the Sci Tool to crash on machines with too little memory For most analyses this amount should suffice For larger scale operations this amount needs to be increased to make full use of your systems available memory as discussed here 3 3 1 Windows and Linux Open the file scipolicy ini in your sci2 directory using a simple text editor such as Notepad The file should contain the following three lines if not add them vmargs Xmslb5bm Xmx350m The first number 15 here represents how much memory is allocated to the Sci Tool when it first starts up This number isn t particularly important but should not be set to anything below 10m The second and more important number 350 here represents the maximum amount of memory that can be allocated to the Sci Tool This can be up to roughly 3 4 the total available memory on your machine but should not be set any higher or the Sci Tool will fail to start Make sure that the formatting is exactly as displayed above as the ini file can be finicky about extra spaces in the file or multiple arguments on a single line After changing these two numbers save the scipolicy ini file and your new memory settings should be used the next time
191. umber of edges If this probability is greater Science of Science Sci Tool User Manual Version Alpha3 42 than zero and greater than the random number obtained from a random number generator then an edge is attached between the two nodes This is repeated in each time step Run with Modeling gt Barabdsi Albert Scale Free Model and a time step of around 1000 initial number of nodes 2 and number of edges 7 in the input Layout and determine the number and degree of highly connected nodes via Analysis gt Unweighted and Undirected gt Degree Distribution using the default value Plot node degree distribution using Gnuplot Science of Science Sci Tool User Manual Version Alpha3 43 5 Sample Workflows Scientometric studies cover a wide array of datasets methodologies and results Analysis can lead to several types of insights particularly those leading the questions what where when and with whom topical geospatial temporal and network analysis respectively see Section 1 Introduction Many studies also cover statistical surveys of scientometric datasets For details descriptions of the types of scientometric analyses see Sections 4 5 Statistical Analysis through 4 9 Network Analysis mM Each of these analysis types can be performed between one of three major scales micro individual meso local and macro global The Sci Tool supports workflows in all fifteen varieties of scientomet
192. vanced tool configuration and detailed development information 2 2 2 Console All operations such as loading viewing or saving datasets running various algorithms and algorithm parameters etc are logged sequentially in the Console window as well as in log files stored in the yoursci2directory logs directory The Console window also displays the acknowledgement information about the original authors of the algorithm the developers the integrators a reference paper and the URL to the reference if available together with an URL to the algorithm description in the NWB Sci2 community wiki https nwb slis indiana edu community 2 2 3 Data Manager The Data Manager window displays all currently loaded and available datasets The type of a loaded file is indicated by its icon Text text file ER Table tabular data csv file HE Matrix data Pajek mat i Plot plain text file that can be plotted using Gnuplot ua Database In memory database i Tree Tree data TreeML g ea Network Network data in memory graph network object or network files saved as Graph ML XGMML NWB Pajek net or Edge list format Science of Science Sci Tool User Manual Version Alpha3 9 Derived datasets are indented under their parent datasets That is the children datasets are the results of applying certain algorithms to the parent dataset 2 2 4 Scheduler The Scheduler lets users keep track of the
193. w for a network Whether it is directed or undirected Number of nodes Number of isolated nodes A list of node attributes Number of edges Whether the network has self loops if so lists all self loops Whether the network has parallel edges if so lists all parallel edges A list of edge attributes Average degree Whether the graph is weakly connected Number of weakly connected components Number of nodes in the largest connected component Strong connectedness for directed networks Graph density O O O O O O O O O O O D In the Sci Tool use Analysis Network Analysis Toolkit NAT to get basic properties e g for the network of Florentine families available in yoursci2directory sampledata network florentine nwb The result for this dataset is This graph claims to be undirected Nodes 16 Isolated nodes 1 Node attributes present label wealth totalities priorates Edges 27 No self loops were discovered No parallel edges were discovered Edge attributes Nonnumeric attributes Example value marriag T busines F Did not detect any numeric attributes This network does not seem to be a valued network Average degree 3 375 This graph is not weakly connected There are 2 weakly connected components 1 isolates The largest connected component consists of 15 nodes Did not calculate strong connectedness because this graph was not directed Density disregarding weights 0 225 4 9 3 N
194. was computed Major changes on the dominance influence and role of Chemistry Biology Biochemistry and Bioengineering over these 30 years are discussed The paper concludes with suggestions for future work 2004 ST B usn Plant Sc Map of the 14 disciplines fractions of papers by field for each discipline and knowledge flows between disciplines for 1974 left and 2004 right These 14 disciplines are further aggregated into six groups represented by the 6 colors shown in the legend Reference Boyack Kevin W B rner Katy amp Klavans Richard 2009 Mapping the Structure and Evolution of Chemistry Research Scientometrics Vol 79 1 45 60 http ivl slis indiana edu km pub 2009 boyack mapchem pdf Science of Science Sci Tool User Manual Version Alpha3 94 6 3 7 Science Map Applications Identifying Core Competency 2007 By Kevin W Boyack Katy Borner amp Richard Klavans The 2002 base map represents journal cluster interrelations but is invariant to rotation and mirroring The map was oriented to place mathematics at the top and the physical sciences on the right The ordering of disciplines is similar to what has been shown in other maps of science as one progresses clockwise around the map one progresses from mathematics through the physical sciences Engineering Physics Chemistry to the earth sciences life sciences medical sciences and social sciences The social sciences link back to com
195. xtract Bipartite Network Extracts an unweighted bipartite network o Extract Paper Citation Network Extracts an unweighted directed network from papers to their citations o Extract Author Paper Network Extracts an unweighted directed network from authors to their papers o Extract Co Occurrence Network General network extraction o Extract Word Co Occurrence Network Extracts a weighted network showing which words appear with each other most frequently o Extract Co Author Network Extracts a weighted network with authors as nodes and edge weights as the number of times those authors co wrote a paper o Extract Reference Co Occurrence Bibliographic Coupling Network Extracts a weighted network from a Paper Citation network with papers as nodes and edge weights as the number of citations two papers share o Extract Document Co Citation Network Extracts a weighted network from a Paper Citation network with papers as nodes and edge weights as the number of times two papers are cited together o Detect Duplicate Nodes Cleans graph data by detecting and preparing to merge nodes that are likely to represent the same entity o Update Network by Merging Nodes Creates a new network by running the algorithm with both the Merge Table from Detect Duplicate Nodes and the original network selected Preprocessing e General o Extract Top N Records Returns the top N rows of a table by some sorting criteria o Extract Top N Record
196. y f g 3 e k ve ye e o f E ae KS 5 E V bus t 4 LEN e x d x P 3 QR Tf ba g am 2 a 35 5 m pa gt bi bh s d s n amp T j S s diui S x o E e Ky g g 9 R g g f g e a Li w D it e Y 7 e B a 4 w q LJ e 53 5 Lan ae Pa H m k xe ay 8 we F e 4 J F OM E 2 Hd C a s a ot r Gi p ai s g ets VA gt me F 5 Re amp a p ot y a i Ez i amp 3 z 1 7 A y f j i g LM we a Li 4 g uw a 3 re m 4 amp Ld M LE i e 1 2 d te X T 2 5 ud ro i M LEE 4v gt g 2 5 P a o P z A m e eon i 4 P a o e E TN D Figure 5 18 Co authorship network of CTSA Center publications Science of Science Sci Tool User Manual Version Alpha3 70 ta 4 E 230 e x 4 ie i E En Ld i Hm d 1 n nF is e 4 1 TAEZ kat 1 H3 er g Ed em y QUE Tem mius a T a MEC am I B d s h T WU E LM a p P Ubi I m h ES i i n i hi a g EIN i b An A Vu E F g m 4 oh i L TE a i 7 m F Harmatz Paul iiccuns UR n I E L E F3 yum m aw y iF vi a E Ld Jh x are k ih e Ak i dec p Heyritan Mai B Uu EF JL cuam M is a NA Jer a amp a kS d a r F h r on F amp EM E lins JP z E a HX IP E riens i ho J T 20 PATOS EL d fE L p vw 2 a m 8 o g ue
197. y same format as Table 5 1 together with two textual log files The log files describe which nodes will be merged or not merged in a more human readable form Specifically the first log file provides information on which nodes will be merged right click and select view to examine the file while the second log file lists nodes which will not be merged but are similar Based on this information the automatically generated merge table can be further modified as needed In sum unification of author names can be done manually or automatically independently or in conjunction It is recommended to create the initial merge table automatically and to fine tune it as needed Note that the same procedure can be used to identify duplicate references simply select a paper citation network and run Data Preparation gt Text Files gt Detect Duplicate Nodes using the same parameters as above and a merge table for references will be created To merge identified duplicate nodes select the merge table and the co authorship network holding down the Ctrl key Run Data Preparation gt Text Files gt Update Network by Merging Nodes This will produce an updated network as well as a report describing which nodes were merged The updated co author network can be visualized using Visualization gt Networks gt GUESS see the above explanation on GUESS Figure 5 11 shows a layout of the combined FourNetSciResearchers dataset after setting
Download Pdf Manuals
Related Search
Related Contents
LE DICO ET SA VERSION DICOUÈBE Motion Control/Motion Controller/ C-432 User Manual - PowerBase Ind. (HK) Ltd. Pelco C2493M User's Manual Brodit 511380 holder Artisan Technology Group is your source for quality new and JD081 NR205HP User Guide final Manuale tecnico CoolGen 4 INSTALLATION MANUAL - SEC Heat Exchangers Mode d`emploi, Instruction JURA IMPRESSA Z5 Copyright © All rights reserved.
Failed to retrieve file