Home
Educational Data Mining Workbench User Manual V4.00
Contents
1. 15 WE coo ioo E 16 Chapter 2 System Manual Lim sessio Een iod de MN IS UM M DU OP DOE E UI MEUM uA EUM MEM Ut 17 LM op e 17 S CDDDHSBB sende niter scd E MM FSUNM M b d MD MN EUR EU NE IE EU 20 EE 1 o 10i cce 21 i udin Pe X 22 Oe Ennegsthp Fypexsoensenascuosd dd tuned Or pna RE ME aD RUM dite EP D E EM UR DUE 23 Per Value Change as Clip Typis uictor dou e pcdes ua uso siabann EL RUE 23 ooo m 26 Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 MEE GO Random Sampling csse ie apre peli n isset m pu Lan patte UR UH D SUMAR RU THER 27 O Stratified Sampling ier rtt pP FER RURUY HER VERE NREERE FE E toy tevevevesssibesrsbislorccbtevesssapente 28 ME dou E TANN 29 MN riori c 29 Add PROCESS ee M 29 O Add Feat te e n 31 Add Feature BUHONS nissar oisein a IPM pU quu i i aia 32 MEE uoondono rc uieii EN eee ee vetoes 67 ONE VI Nin eoe aea E EEE RAEE EEEE E AEE EE EE EEA EEE eee ae 67 Load EEO essensie c ees 67 Cancel Btn eec E ENE ANRE R E aE ERa 67 Add Feature Parameters caca dob vq orsi etes alas ence Ea a ane 67 Add Peabure DISE usua itm me
2. Eun e s EK Load Save Import Export Append Kappa DELTA VERSION BJ_EXT_VERSION SYSUSER 1278050218 20060907 i 79a9d4097 140 1278050264 c79a9d4097 12 1278050275 79a9d4097 1z 1278050276 79a9d4097 12 1278050277 c79a9d4097 12 1278050287 c79a9d4097 lz 1278050302 c79a9d4097 12 1278050325 c79a9d40971z y gt o 9 9 9 09 09 o g5 EDM Workbench4 0 20131202 Figure 24 Clip submission Sampling The data sampling feature of the Workbench allows the user to specify how clips are sampled from the data set It can also be used to sample at the action transaction level The user can specify the sample size and whether the Workbench will randomly take the sample across the entire population or whether the workbench will stratify the sampling based on one or more variables Note that the Workbench allows the user to sample the data at any point of the process after importing after clipping or after labelling depending on the user s analytical goals Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 To start sampling the dataset click Sampling Button located either in the Function menu Figure 7 or Toolbar Figure 9 Sampling functionalities involve creating subsets from the dataset using automatic select and grouping options A user may take samples or a subset from the loaded dataset and s
3. EDM User Manual ing Workbench Manual V4 0 Min tional Data Educa Educational Data Mining Workbench User Manual V4 00 EP Content Revision HistOty e n n0 G0SX M M 5 VER Gh AU EN i P aa 5 Definition of Cermis css 7 Oyerall Description eisses retirer EENES 8 X927 11858 15 esita aeeoea Ena e EE RAE ERA SE E A RE 10 Chapter 1 System UVereie Wieder eos Eni etera SEENE aR RERE n SERA aR rri S Ena ariera ENARE Raisa 11 BEy o m 11 MEE o8 co n 12 Ayo dol pcre 12 O Function Men s C anes 12 ANE ID ic o m 13 NEEDS I C RR 13 T Eoad Button P 13 PA M di a EEA PT 14 De Import BUON sc ccesesenssxecsatancrasacccnsastacastasevian iatinactaac een ra r Inde encase meee meme 14 4 Export BUNON M EER 14 Be Add Process BOO PPP C PR 15 Mg Chp BUttON 145 7e Sampling BUttON D 145 B Labelling BUHONisiies inisesin Rau RE E RUDI AA EES emus 156 9 Data Gd
4. o Load Button The Load button allows the user to choose a previously saved sampling template from a list and apply it to the current dataset Figure 27 Load Prompt o Submit Button The submit button closes the Sampling Form implements the sampling process and then displays the result in a new tab Add Process This allows the user to create a script composed of multiple processes and run them in a single thread Ateneo Laboratory for the Learning Sciences F206 AAMU Required Columns Class id revision TIMESTAMP DELTA VERSION BJ EXT VERSION SYSUSER HOME OSNAME OSVER Uncheck All Processes OSARCH IPADDR HOSTNAME LOCATION ID PROJECT ID Delete Process SESSION ID PROJECT PATH PACKAGE PATH Edit Process DELTA NAME DELTA SEQ NUMBER DELTA START TIME UA TAR TT Check All Processes Invert Checked Processes Run Processes Figure 28 Feature selection window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 o Add Feature This function allows users to add features to the dataset through the application of predefined operations Figure 29 Load Function Dialogue Function Name myFunction Enabled True C False ow oblem Name Add Add Al gt Remove gt gt gt Remove All gt gt Swap Contents Input Column Names choose column name s to be used
5. to seperate each columns Figure 47 Default RunningPrevCount window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Range Column Range of values used for computation Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 poo e Default StDev Function Name Default StDev v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Figure 48 Default StDev function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Range Column Range of values used for computation Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Sum v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate
6. Detection and Analysis of Off Task Gaming Behavior in Intelligent Tutoring Systems In Ikeda Ashley amp Chan Eds Proceedings of the 8th International Conference on Intelligent Tutoring Systems Springer Verlag Berlin pp 382 391 14 Witten I H amp Frank E 2005 Data Mining Practical Machine Learning Tools and Techniques Second Edition Morgan Kaufmann 15 Ateneo Laboratory for the Learning Sciences F206 AAMU
7. KEEL A software tool to assess evolutionary algorithms for data mining problems Soft Computing A Fusion of Foundations Methodologies and Applications 13 3 307 318 1 Baker R S J d 2007 Modeling and Understanding Students Off Task Behavior in Intelligent Tutoring Systems Proceedings of ACM CHI 2007 Computer Human Interaction 1059 1068 3 Baker R S J d amp de Carvalho 2008 Labeling Student Behavior Faster and More Precisely with Text Replays 1 International Conference on Educational Data Mining 38 47 5 Corbett A T amp Anderson J R 1995 Knowledge Tracing Modeling the Acquisition of Procedural Knowledge User Modeling and User Adapted Interaction 4 253 278 7 de Vicente A Pain H 2002 Informing the detection of the students motivational state an empirical study Proceedings of the 6th International Conference on Intelligent Tutoring Systems 933 943 8 McLaren B M Scheuer O amp Mik tko J 2010 Supporting collaborative learning and e Discussions using artificial intelligence techniques International Journal of Artificial Intelligence in Education IJAIED 20 1 1 46 11 Mierswa I Wurst M Klinkenberg R Scholz M amp Euler T 2006 YALE Rapid Prototyping for Complex Data Mining Tasks In Proc of the 12th ACM SIGKDD Int l Conference on Knowledge Discovery and Data Mining KDD 2006 pp 935 940 ACM Press 12 Walonoski J amp Heffernan N T 2006
8. Afterwards use pKnowDirect with the pKnow value N Numbers Only if more elements in a group are found only the last N items are kept for processing start count every N rows Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Range Column Range of values used for computation Group Column Used for grouping rows with the same values for selected columns Sort Column used for sorting the rows within the same group Problem Column name of the column corresponding to the problem Skill Column name of the column specifying the skill Outcome Column name of the column used by certain features Error Values used to specify which values constitute an error for use by percentError LO Number Only probability that the skill is already known before the first instance in using the skill in problem solving S Number Only probability that the student will commit a fault if the skill was already known beforehand G Number Only probability that the student will deduce the correct answer given that skill is not known T Number Only probability that the skill will be learned at each opportunity to use the skill regardless whether the answer is correct or incorrect Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Attempt Column Either of the two depends on how it was used
9. Is this the first attempt of the student to answer or get help on the problem step or How many attempts did they answer or ask for help on the problem step Pre defined functions The system has 23 default operations available Four parameters are common to all operations Input Column Names Output Column Names Feature Name Enabled Listed below are the current operations their descriptions and parameters needed aside from the previously mentioned parameters Function Description s Other Parameters P Needed E tes a logical AND ti xecutes a logica operationon Value And the selection and returns the corresponding Boolean results False Value Compares if two values are identical Compare 1 selected Input Column Check Values Compare Name with Check Values and its All Strings output is based on the Operation type Operation Type used Copy the values from a column Copy Values from Selected Input Column None Name Sort Columns Counts how many in the last n entries Group Columns CountIfLastN including the current cell are equal Range Columns to a given value or values N Numbers Only Check Values Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 5 CountLastN Counts how many in the last n entries including the current cell are equal to the current cell Sort Colu
10. assessments of the probability that the student knew the cognitive skills used in the current problem step This information can be distilled and or calculated by processing data across an entire log file corpus but there are currently no standard tools to accomplish this Feature distillation is time consuming and many times a research group re uses the same feature set and feature distillation software across several projects the second author for instance has been using variants of the same feature set within Cognitive Tutors for nine years Developing appropriate features can be a major challenge to new entrants in this research area To address this data labeling bottleneck and the difficulty in distilling relevant features for machine learning we are developing an Educational Data Mining EDM Workbench A beta version of this Workbench now available online at http penoy admu edu ph alls downloads is described in this user manual The Workbench currently allows learning scientists to 1 Label previously collected educational log data with behaviour categories of interest e g gaming the system help avoidance considerably faster than is possible through previous live observation or existing data labelling methods Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 2 Collaborate with others in labelling data 3 Automatically distil additional information from lo
11. bad changes depending on the data set but typically indicate cases that are unfit for the user s purposes Unsure clips can be separated for further analysis by other labellers Select column name Note Click Add Label id Add Label Good revision Bad TIMESTAMP saa DELTA VERSION Edit Label BJ EXT VERSION SYSUSER MEI HOME OSNAME Labeler s Name Name of User OSVER OSARCH IPADDR T HOSTNAME LOCATION_ID PROJECT_ID SESSION_ID PROJECT_PATH PACKAGE_PATH DELTA NAME DELTA SEQ NUMBER DELTA START TIME DELTA END TIME FILE PATH FTE NAME Load Template Save Template Submit Cancel Figure 62 Labelling Window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 A Set Up Labelling parameters Labels separated by Comma s Good Bad Unsure Figure 63 A sample Labelling window 1 Label Name Select Add Label in the Labelling window in order to add user defined labels Label name separates a label set from another 2 Labels separated by Comma s Here the user will be able to create labels for the data set as separated by commas o Use Template The template area specifies a pretty print of the text replay The user supplies descriptive text and indicates where the fields should be inserted Ateneo Laboratory for the Learning Sciences F206 AAMU Education
12. will be learned at each opportunity to use the skill regardless whether the answer is correct or incorrect Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default RunningCountlf v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Check Values use to seperate each columns Figure 46 Default RunningCountlf function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Group Column Used for grouping rows with the same values for selected column Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default RunningPrevCount Function Name Default RunningPrevCount v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use
13. 0 Figure 34 Default CountLastN function Window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Range Column Range of values used for computation Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected columns N Numbers Only if more elements in a group are found only the last N items are kept for processing start count every N rows Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Copy False Input Column Names choose column name s to be used in this feature lt Add lt ow lt lt Add All lt lt Anon Student Id ET eve 7 oblem Hierarchy gt gt Remove All gt gt oblem Name Swap Contents Output Column Names use to seperate each columns Figure 35 Default Copy function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Ateneo Laboratory for the Learning S
14. Date Hour Minute Second v Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Group Column Used for grouping rows with the same values for selected columns True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Date Column s value should be in the Date Year Month Date format Time Column s value should be in the Time Hour Minute Second format Date Time Column s value should be in the Date and Time Year Month Date Hour Minute Second format Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Inverse Function Name Default Inverse Enabled True C False Input Column Names choose column name s to be used in this feature lt Add lt ow lt lt Add All lt lt Sample Student Id 2 Nu oblem Hierarchy Swap Contents Output Column Names use to seperate each columns Figure 37 Default Inverse function window Parameters Needed Enabled indicates whether to the selected feature will be used in the proc
15. ESSFUL MSG_TYPE MSG MESSAGE MSG LINE NUMBER COMPILES PER FILE TOTAL COMPILES Figure 55 Add Feature Window with updated column Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used Operation Type contains values from 1 6 that correspond to different operations Strings or integers can be compared in this feature Example Compare feature was the selected feature The Check Value will be compared to the Selected Column Name and the output will depend on what Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 operation selected below 1 Greater than operation 2 Greater than or Equal to operation 3 Less than operation 4 Less than or Equal to operation 5 Equal to operation 6 Starts with operation Date Column s value should be in the Date Year Month Date format Time Column s value should be in the Time Hour Minute Second format Date Time Column s value should be in the Date and Time Year Month Date Hour Minute Second format Time 2005 10 15 02 08 56 0 Figure 56 Time in YYYY MM DD HH MM SS All String checks if all the column values are strings not numbers or any other type pKnowColumn s value should be the pKnow column Calculate first the pKnow value using pKnow operation
16. Error Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Ti e Default pKnow Function Name Default pKnow Enabled True C False Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Check Values use to seperate each columns LO Numbers Only Figure 45 Default pKnow function window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used LO Number Only probability that the skill is already known before the first instance in using the skill in problem solving S Number Only probability that the student will commit a fault if the skill was already known beforehand G Number Only probability that the student will deduce the correct answer given that skill is not known T Number Only probability that the skill
17. Group Colum Problem Column 15 PercentError problems where errors were made on a skill Skill Column Outcome Column Error Values Sort Columns Group Columns Computes for the probability that th oe omputes for the probability that the 16 pKnow student knows the skill involved in an Check Values action LO Numbers Only S Numbers Only G Numbers Only T Numbers Only Checks if the current action is the Attempt Column student s first attempt on this problem pKnow Column 17 pKnowDirect step If true pknow direct is equal to pknow otherwise pknow direct is Check Value Computes the number of entries that Sort Columns 18 RunningCoun are equal to a given value or values Group Columns tif up to the current cell including the Range Column current cell Check Value ee Computes the number of entries that Sort Columns p n rev are equal to the current cell up to the Group Columns Bue cell before the current cell Range Column Sort Col Computes the standard deviation of a a MT 20 StDev a Group Columns specified column Range Column Sort Columns 21 SumLastN Computes the sum of the last n Group Columns numbers in the selection specified Range Column N Numbers Only Computes time taken in terms of Sort Columns 22 TimeSD number of standard deviations from Group Columns mean time Range Column Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbe
18. Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Chapter 2 System Manual Import The EDM Workbench allows users to import logs in DataShop text format and CSV The data is assumed to be stored in a flat file organized in rows and columns The first row of the import file is assumed to contain each column s name Each succeeding row represents one logged transaction usually between the student and tutor but possibly between two or more students as in the case of collaborative learning scenarios The successfully imported logs may be saved in the Workbench s format for work files a compressed file containing the data in CSV format plus metadata specific to the EDM Workbench Import log file by clicking Import Button root located either in File menu Figure 6 or Toolbar Figure 9 The system will then pop up a dialog box asking what type of logs you want to import CSV or Datashop Text file Figure 13 Click the Select Button after selecting the type of Log DataShop Text Figure 13 Log Selection Another dialog box will ask for the location of the log file Ateneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 Lookin Lab1 20100702 l F227 15 CompileData l F227 16 CompileData xL F227 18 CompileData l F227 19 CompileData l F227 21 CompileData xL F227 29 CompileData A Computer TA Fil
19. Manual V4 00 Note In the above example the user can press the number keys 1 and 2 as shortcut keys for the buttons Confused and Not Confused respectively Press Enter to choose Next to go to the next row Labelling Time Elapsed The GUI now displays how much time each labelling action took Labels Labeler TimeStamp Time Elapsed ancs 2012 Nov 0 0 tral ancis 2012 Nov 0 1 tral ancis 2012 Nov 0 5 ancis 2012 Nov O 6 Figure 659 Time Elapsed Column for Labels Labelling Output As we can see in the figure 70 below the labels are shown with their corresponding timestamps and labeller These column names are present for data organization LJ ll r ael a Xn Load Save Import Export Append Kappa C OE s Add Proccess Clip Sampling Label Add Feature eh F227_1_CompileD Default Dataset 4 TOTAL COMPILES Labeler TimeStamp Time Elapsed Quality Readiness 1 2013 02 15 0 Bad lYes 2 2013 02 15 2 No IGood 3 2013 02 15 4 Unsure Yes 4 2013 02 15 6 Bad Unsure 5 2013 02 15 7 Unsure No 6 2013 02 15 8 Bad No 7 2013 02 15 10 Good lYes 8 2013 02 15 11 Bad No lt ues gt Row Count 25 Figure 70 Sample labelling output Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Save Saves th
20. Manual V4 00 MEZ Select Column Name s to compare Figure 21 Window showing the Time as Clip Type o Cancel Button This cancels clipping o Save Button The Save button saves the set properties applied in the Clipping Form The user supplies a file name and clicks OK File name Clipping Sample Time Figure 22 Save Dialogue Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 o Load Button Allows the user to select and load a previously saved file from a drop down list see Figure 23 Clipping Box Select Column Name s to compare Select Clip Type Time v Class A Section lLab Time Column Sec Class v Custom Sort lid revision Interval 5 TIMESTAMP TA_VERSION J_EXT_VERSION Submit Save Load Cancel SYSUSER HOME OSNAME JOSVER OSARCH IPADDR ISTNAME LocaTion o ROJECT_ID IDELTA START TIME DELTA END TIME FILE PATH FILE NAME v Figure 23 Load Window Note From the list of clipping xml files the selected template is Clipping Sample Time clipping xml o Submit Button This closes the Clipping Form clips the dataset from the current tab and displays it with its properties set in a new tab Double click a row to view the logs within it Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Ig File Functions Help
21. al Data Mining Workbench User Manual V4 00 Select column name Note Click Add Label id Add Label revision TIMESTAMP DELTA_VERSION BJ_EXT_VERSION SYSUSER Remove Label HOME OSNAME Labeler s Name Name of User OSVER OSARCH IPADDR Use Template HOSTNAME LOCATION ID i Add Parameter PROJECT_ID ISESSION_ID PROJECT_PATH PACKAGE PATH DELTA NAME DELTA SEQ NUMBER DELTA START TIME DELTA END TIME FILE PATH IEILE NAME Edit Label Figure 64 Parameter Addition Note The system will automatically select the parameter in the Select Column Name list from the textbox Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Multiple Labels e Users can now as of version 4 put multiple labels on a data set 7 1278050218 7 TIMESTAMP 1278050218 lt Quality Unsure Readiness Unsure Figure 645 Multiple Labels Labeller Name e Users can keep track of labellers by identifying their names via the Labeller Name field This is useful in keeping quality and standards when it comes to labelling datasets Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Labelling Button e Add Parameter Button In constructing sentences users can manually input the parameters by enclosing it in a bracket and wi
22. annrivensetbabnnevcnbeneibnnssbiabeanveevenebenien 84 E s c M 85 MU c 85 E epo er 85 Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Revision History Name Date Reason for Change Version John Paul Contillo 20111121 First draft V1 00 Alipio Gabriel 20111122 Edit the context of the draft V1 00 Alipio Gabriel 20111123 Add and edit the content V1 00 J Contillo 20120221 User manual for version 2 V2 00 Gamaliel dela Cruz 20120526 Edit content V3 00 Francis Bautista 20120607 Formatting and editing V3 00 John Paul Contillo 20111121 Content Addition V3 10 Francis Bautista 20120728 Formatting and editing V3 20 Nadia Leetian 20120814 Edit content V3 50 Dominique Isidro 20120821 Edit content V3 51 Francis Bautista 20121013 Addition of content V3 52 Francis Bautista 20121103 Addition of content V3 53 Francis Bautista 20130214 Addition of content V4 00 Introduction In recent years educational data mining methods have afforded the development of detectors of a range of constructs of educational importance from gaming the system 3 to off task behaviour 2 to motivation 5 to collaboration and argumentation moves 6 The development of these detectors has been supported by the availability of machine learning packages such as RapidMiner 7 WEKA 9 an
23. ature revision lt Add lt lid A lt lt Add All lt lt TIMESTAMP R IDELTA VERSION 2 Remove j5j EXT VERSION gt gt Remove All gt gt sysusER v Swap Contents lt gt Figure 53 Sample add feature window Output Column Names are columns added later in the Datagrid after the user selected values have been processed These columns will also be included in the Required Columns in the Add Process Window Figure 54 Output Column Names use to seperate each columns TimeOnTask Figure 54 Selection of column names Feature Name is the name to be displayed in the Process List see Figure 53 Enabled indicates whether the selected feature will be used in the process or not In Figure 31 the Enabled option was set to true After submission we now see that the feature is checked in the process list see Figure 53 True Value assigned to the result in the Output Column Name if operation returns a true see Figure 53 False Value assigned to the result in the Output Column Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Name if operation returns a false see figure 53 Required Columns USTVWAVIC OSVER OSARCH IPADDR HOSTNAME LOCATION_ID PROJECT_ID SESSION_ID PROJECT_PATH PACKAGE_PATH DELTA_NAME DELTA_SEQ_NUMBER DELTA_START_TIME DELTA_END_TIME FILE_PATH FILE_NAME FILE_CONTENTS FILE_ENCODING Delete Process COMPILE_SUCC
24. ave as a new dataset Sampling can be stratified or random o Random Sampling To randomly select samples from a selected dataset Select Sampling Method gt Random Indicate the number of samples in the Sample Size textbox Sampling Method Random w Auto generate samples randomly Set Samples Size Maximum Sample Size 1339 100 Figure 25 Sampling method selection Note The size inputted in the textbox should not exceed the indicated maximum sample size If the user specifies a number greater than the maximum the operation returns all the rows in the dataset Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 o Stratified Sampling Stratified sampling randomly selects data from within specified subgroups to produce a stratified sample Select Sampling Method gt Stratified Set the number of samples in the Sample Size textbox In the Strata list click the column names that define the groupings Figure 25 Sampling Method Stratified w Divide logs into smaller groups Set Samples Size Maximum Sample Size 1339 10 Select Strata Hold Crtl and dick to select multiple strata IDELTA START TIME IDELTA END TIME Submit Save Figure 26 Strata selection Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 o Save Button Save Button saves the properties as a template
25. ciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Duration Function Name Default Duration Enabled True Date Column Year Month Date Time Column Hour Minute Second v Date Time Column Year Month Date Hour Minute Second v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Figure 36 Default Duration function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Date Column s value should be in the Date Year Month Date format Time Column s value should be in the Time Hour Minute Second format Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected columns Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default FirstAttempt Default FirstAttempt Enabled True Output Column Names use to seperate each columns Group Columns use to seperate each columns id v Add Column Name Date Column Year Month Date Time Column Hour Minute Second v Date Time Column Year Month
26. d KEEL 1 These packages provide large numbers of algorithms of general use reducing the need for implementing algorithms locally however they do not provide algorithms specialized for educational data mining such as the Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 B widely used Bayesian Knowledge Tracing 4 Furthermore effective use of these packages by the educational research and practice communities presumes that key steps in the educational data mining process have already been completed For example many of these detectors have been developed using supervised learning methods which require that labelled instances indicative of the categories of interest be provided Typically many labelled instances on the order of hundreds if not thousands are required to create a reliable behaviour detector Labelling data is a time consuming and laborious task made even more difficult by the lack of tools available to support it A second challenge is the engineering and distillation of relevant and appropriate data features for use in detector development 9 The data that is directly available from log files typically lacks key information needed for optimal machine learned models For instance the gaming detectors of both 3 and 8 rely upon assessments of how much faster or slower a specific action is than the average across all students on a problem step as well as
27. e dataset in the current tab by clicking the Save button located either in File menu Figure 6 or Toolbar Figure 9 The system will ask for the directory and then save it in zip format Note Saving files will take time depending on the size of the dataset and speed of the computer Load Loads EDM files by clicking the load button located either in the File menu Figure 6 or Toolbar Figure 9 Error dialogues will be displayed if any error is found with the specified directory or file Note The action button will be enabled depending on the file loaded Export By clicking the export button located either in the File menu Figure 6 or Toolbar Figure 9 the system will save the current active tab into a CSV file or into another specified format Users must specify the directory in which the file will be saved Note Exporting a file will take time depending on the dataset s size Note In this version we replaced the term the erroneous feature with the more correct operation We apologize for the confusion this has caused and are undertaking measures to correct these in the next version Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 La References 1 I2 3 4 a Ol Alcala Fdez J Sanchez L Garcia S de Jesus M J Ventura S Garrell J M Otero J Romero C Bacardit J amp Rivas V M 2009
28. e name 227 18 CompileData csv Network Files of type csy Figure 14 Selection of Data File to be imported Case 1 Importing a single log file If a user imports a single log file after locating and choosing the log file the Workbench displays the file in the DataGrid Figure 10 Case 2 Importing batches of log files The Workbench can also import nested folders of data where each folder level represents a meaningful subset of the data For example if data from a section of students is collected several times over a school year the researcher may have one folder for the school year one subfolder for each section within the school year one subfolder for a session within each section and finally one file or folder for each student within a session The Workbench allows users to label each level of subfolder creating new columns for these labels appending them to the data tables during importation process Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 After locating and choosing the batch of log files another dialog box will appear asking for a label describing the log files imported e g Class Figure 14 Clicking Submit aggregates all the logs and displays them in the DataGrid Column Header 1 Name sample values D E Class Column Header 2 Name sample values Lab1_20100702 Lab1 5_201 Section Column Header 3 Name sample values F227_12_C
29. each columns Figure 49 Default Sum function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Range Column Range of values used for computation Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default SumLastN Function Name Default SumLastN v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name 2 Output Column Names use to seperate each columns N Numbers Only 0 Figure 50 Default SumLastN function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Range Column Range of values used for computation N Numbers Only if more elements in a group are found only the last N items are kept for processing start count every N rows Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Min
30. eneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 II process xml files available in the process directory upon clicking the load button Run Process Button The system runs all checked processes in the process list The system will display information feedback in the Status Bar on what process it is currently taking and throws an error dialogue when the system encounters an error Required Columns Processes List Class myCopy Section Pisa Lab id revision TIMESTAMP DELTA_VERSION BJ_EXT_VERSION SYSUSER s OSNAME OSVER Uncheck All Pr OSARCH IPADDR Invert Checked Processes HOSTNAME LOCATION ID PROJECT ID SESSION ID PROJECT PATH PACKAGE PATH DELTA NAME DELTA SEQ NUMBER DELTA START TIME Inri A CUD TAr Figure 58 Sample System Process List Ateneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 BE File Functions Help er e X Load Save Import Export Append Kappa mes Figure 59 Sample Clipping display Row Count 3202 e Feb ZI 08 44 28 GMT 08 00 2012 Tue Feb 21 08 44 28 GMT 08 00 2012 Tue Feb 21 08 44 28 GMI 08 00 2012 Tue Feb 21 08 44 28 GMT 08 00 2012 Tue Feb 21 08 44 29 GMI 08 00 2012 Tue Feb 21 08 44 29 GMT 08 00 2012 Process Default Pr My Row done Process Default Pr 2 N
31. ess or not True Value assigned to the result in the Output Column Name if operation returns a true Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 False Value assigned to the result in the Output Column Name if operation returns a false Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default ListUniques Function Name Default ListUniques Enabled True False Input Column Names choose column name s to be used in this feature lt Add lt Row ud TM Anon Student Id Remove ttm gt gt Remove All gt gt Problem Name Swap Contents Output Column Names use to seperate each columns Figure 38 Default ListUniques function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Maximum v Sort Columns use to seperate each columns Row y Add Column Name Group Columns use to seperate each columns Row y Add Column Name Output Column Names use to seperate each columns Figure 39 Default Maximum function window Parameters Needed Enabled indicates whether to the selected feature will be used in the proces
32. ew Time started Process Default Pr 2 New Time done Process Default Pr 3 Size Clip Process started Process Default Pr 3 Size Clip Process done Process Default Pr done HHE PETEN M Figure 60 Clipping feedback KC Unique KC Catego School Class New Time KC696 2005 10 15 KC814 2005 10 15 KC1592 2005 10 15 KC238 2005 10 15 KC1422 KC1415 KC1356 oo KC1329 2005 10 15 KC75 9 2005 10 15 KC496 10 2005 10 15 KC8 11 2005 10 15 KC1410 12 2005 10 15 KC1547 13 2005 10 15 KC1330 14 2005 10 15 KC750 15 2005 10 15 KC808 16 2005 10 15 KC658 17 2005 10 15 KC1397 18 2005 10 15 KCo668 19 2005 10 15 KC742 20 2005 10 15 KC1143 21 2005 10 15 Figure 61 Sample distil features Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Ig Labelling Labelling is an operation that is usually performed after clipping and sampling During labelling the user assigns ground truth labels to clips of data The user first specifies a subset of the clip columns that should be displayed The user also specifies the labels that the observer or expert will use to characterize each clip The expert or observer will have to select between three labels Good Not Bad or Unsure The circumstances under which an expert or observer labels a clip as
33. g Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Ip Select Column Name s to compare lid jrevision TIMESTAMP IDELTA VERSION BJ EXT VERSION Complete dips only IDELTA SEQ NUMBER IDELTA START TIME DELTA END TIME FILE PATH FILE NAME FILE CONTENTS FILE ENCODING COMPILE SUCCESSFUL Figure 19 EDM Clipping Window Custom Sort Button This allows the user to set how the transactions within a clip are ordered by sorting them according to criteria Add Level Button adds another sorting criterion while Delete Level deletes the selected Row Clicking the Submit button will implement the selected formatting properties Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Delete Level revision Ascending Figure 20 EDM Custom Sort o Time as Clip Type By choosing Time as the Clip Type the user will specify a time period per clip e g 1 clip 5 minutes interval The column name with a time element measured in seconds must be specified When done click the submit button and double click the clips to view the inclusive logs o Per Value Change as Clip Type Per Value Change creates a new clip every time the value within the specified column changes Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User
34. g files for use in machine learning such as estimates of student knowledge and context about student response time i e how much faster or slower was the student s action than the average for that problem step Through the use of this tool we hope that the process of developing a detector of relevant metacognitive motivational engagement or collaborative behaviours can eventually be sped up Just the use of text replays on previously collected log data has been shown to speed a key phase of detector development by about 40 times with no reduction in detector goodness 3 This user manual is intended as a guide to the functions and features of the EDM Workbench Please send comments and suggestions to mrodrigo ateneo edu Definition of Terms Batch A group of log files The criteria for grouping are determined by the user Examples of the criteria for grouping include source and timing Clip A subset of logs from a given batch Column A single attribute within the dataset Dataset The data from the imported files DataGrid The central area where all the datasets are displayed EDM Educational Data Mining Log A record of a single action Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 B Log File A file that contains a collection of logs Model A detector of meta cognitive and motivational behaviour Row A set of attributes in the da
35. ile contains logs that may have been previously processed clipped sampled or labelled by the user together with some Workbench specific information Note that because of the additional information the zip file may not be opened using archiving software such as WinZip or WinRar Once loaded the user may make further changes to the file Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Save Button Saves the logs from the active tab in the DataGrid and all its properties such as clipped formats and labels into EDM format Import Button Allows the user to import logs or batches of logs such as Datashop or comma separated value csv files to be processed clipped sample or labelled by the user Export Button Exports the final output from the active tab in the DataGrid as a CSV file or in other specified file formats Append Button Appends a dataset csv txt to the current dataset as displayed in the DataGrid The data sets must have the same column names for this function to work Kappa Button Compares the level of agreement between two separate data sets of the same file type Operation returns the integer 1 if the data sets agree with each other perfectly and 0 if they do not match at all A decimal returned shows incomplete agreement between the data sets however a value closer to one is more true than a value closer to Zero Add Proce
36. in this feature Sample Anon Student Id Problem Hierarchy Problem View Problem Start Time lt gt v Output Column Names use to seperate each columns Valuei Figure 30 Modified function window with the feature And selected Ateneo Laboratory for the Learning Sciences F206 AAMU 2 ro 1 Educational Data Mining Workbench User Manual V4 00 Add Feature Operations e Default And Function Name Default And Enabled True C False Input Column Names choose column name s to be used in this feature lt Add lt ow lt lt Add All lt lt Student Id 2 M oblem Hierarchy Swap Contents Output Column Names use to seperate each columns Figure 31 Default And function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 IIS e Default Compare Function Name Default Compare Enabled True C False Input Column Names choose column name s to be used in this feature lt Add lt ow lt lt Add All lt lt Student Id ve oblem Hierarchy Swap Conten
37. ing Workbench User Manual V4 00 e Default TimeElapsed Function Name Default TimeElapsed Enabled True False Output Column Names use to seperate each columns Date Column Year Month Date v Date Format e g 12 31 2012 11 59 59 MM dd yyyy HH mm ss Figure 51 Default TimeElapsed function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Date Column s value is the date when the actions were taken time stamp Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Date Format is the format of the Date Column where M month H hour d day m minutes y year s seconds e g 31 12 12 11 59 dd MM yy HH mm 12 31 2012 11 59 59 MM dd yyyy HH mm ss e Default TimeSD Function Name Default TimeSD v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Figure 52 Default TimeSD function window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Ka Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for gr
38. mns Group Columns Range Columns N Numbers Only 6 Duration Computes how many seconds the action took Sort Columns Group Columns Date Column Time Column Date Time Column 7 First Attempt Determines if it is the first attempt True Value False Value Group Columns Date Column Time Column 8 Inverse Date Time Column Returns the inverse of a Boolean If the column values equal the true True Value value return the false value instead and vice versa False Value 9 ListUnique Creates a new column with all the unique data from the selection None 10 Maximum Determines the maximum value in the selection provided Sort Columns Group Columns Range Column 11 Mean Computes the arithmetic mean of all the values in the selection Sort Columns Group Columns Range Column 12 MeanCountlf Computes the average number of entries that are equal to a given value or values over all entries Sort Columns Group Columns Range Column Check Value 13 Minimum Determines the minimum value in the selection provided Sort Columns Group Columns Range Column 14 Or Executes a logical OR operation and returns the corresponding Boolean results True Value False value Ateneo Laboratory for the Learning Sciences F206 AAMU 73 Educational Data Mining Workbench User Manual V4 00 Computes the percentage of past Sort Column
39. nch User Manual V4 00 Computes for the time interval per Output Column 23 timeElapsed action in seconds date of current row Date Column minus the date of the first row Date Format Figure 57 Function List Submit Button will include the user selected feature to the Process List Load Button will load available features Save Button will save the user selected feature and add it to the directory of features for later use Add Features in the Clip Level In the clip level there are 5 features which can be imposed on the clips mean max min stdev and listUnique These features functionalities are similar to the ones above Clipped dataset are composed of a parent container and a dataset representing each clip Non clip level operations will append output columns to each of the enclosed clips however a clip level operation will append output columns only to the parent container Add Clipping Allows user to set the desired clipping properties The form applies the selected properties in the clipping form Add Sampling Allows user to set desired sampling properties The form applies the sampling properties set in the sampling form Cancel Button Cancels and closes the Add Process form Save Button The system shall save all the properties set in the Processes List which are then checked into a process xml file Load Button The system will load the all the configured processed list At
40. ndi adeo CERNI S a eE qd NEED RO PEDE E UM MGE 71 o Add Features in the Clip Levels iscsscsssssss sssevcnncssssvsesvennsessverevense vopseseventtssstesrsbussssouetevecssspente 75 ME PRC DUD o 75 MEM Coro unm 75 aec ic Bon e m 75 0 Save Dutton sos hee ode dede ett epe op ER de eet e Et 75 NE Load B tt t Ncc 75 o WKunProcess Button ooo enter e eec Eae i e eoe eee bg 76 jin 78 A Set Up Labelling paren CSS scontati fente e rb ino ai Rs arbi tod oa did bed Elan 79 Use Template c 79 Set p Labelling Parameters irrien iine tup EE EE dai RE USD E a 80 Label Text BOK d 80 Labellers Name User NaHip oso tbe us lup s cam RA aeta adea nus 80 Parameter and sentence fextDOX ei aee nn menie taken la RUE RERUM RR E Ease RAM UR EA KS 80 Merci e 80 p Add Parameter BUE o ee qnt gc eae uc ede cadis aa cms RYAN oM pd au UN 82 OS a ETT 82 Load Template esisi 82 Ateneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 jw Labelling CE Ca tase c E 83 Labelling Obit sssisiscstsssneciasiccnns sponvnsvvsvegnis srnsbostsvaebnsvennstb
41. o csv F227_10_Co csv Lab Figure 15 Label Column with sample parameters Once the logs are loaded the DataGrid should be populated Figure 16 All actions buttons save for the Labelling button should be enabled at this point F227 1 CompileD 36 Default Dataset id revision TIMESTAMP DELTA VERSION BJ EXT VERSION SYSUSER 1 7929440971 2 1278050264 20060907 3 0 1278050275 4 0 1278050276 20060907 26 5 0 1278050277 20060907 26 6 0 1278050287 20060907 2 6 7 0 1278050302 20060907 2 6 ig 0 1278050325 7939440971 y lt i gt Row Count 25 Figure 16 EDM sample Data Set Ateneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 Logs of Students in Section A E Section A txt Section A txt Section A txt Section A txt Section A txt Figure 17 EDM Workbench Data Shop Tab v Mon Feb 20 09 46 48 GMT 08 00 2012 INFO Imported C Users Paul Documents DataShop Figure 18 Status bar with timestamp and file directory The Status bar displayed the information of the file imported together with the location C User Paul Documents Datashop and the current time Monday February 20 9 46 AM and 48 seconds Clipping The EDM Workbench allows the user to define the set of features by which the data should be grouped so that clips do not contain rows from different g
42. of labelling manual Database edited CR Streamed Log Files validation Figure 1 EDM Workbench Entity Diagram Ateneo Laboratory for the Learning Sciences F206 AdMU Educational Data Mining Workbench User Manual V4 00 gto Overall Use Cases System Load Template ds exten Save Template Stratified extends extends 21 ix lt dian gt ample specified number of logs A Load Template Ezvidz Save File as Import Log File Load Datashop File ears ears txt xt fle sextends a or Group Logs S y Save Template VA with specifications extends extends ERE extends Clip by Size Carine K in Add es ronwa Load Template extend p pR zt idswextends Add Sampling Clip by Time Load Features Save Features Add Clipping Figure 2 EDM System Process Map Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Chapter 1 System Overview This section discusses the interface of the system from Top to Bottom including its features buttons and functions File Functions Help Load Save Import Export Append Kappa Add Proccess Clip Sampling Labeling Add Feature EDM Workbench4 0 20131202 Figure 3 EDM workbench upon system launch Title Bar Figure 4 System Title Bar The name of the system may change in later versions e g EDM Workbench version 4 0 is displayed here Ateneo Laboratory f
43. olumns Figure 43 Default Or function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 ig e Default PercentError Function Name Default PercentError Enabled True False Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row y Add Column Name Output Column Names use to seperate each columns Error Values use to seperate each columns Figure 44 Default PercentError function window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Problem Column name of the column corresponding to the problem Skill Column name of the column specifying the skill Error Values used to specify which values constitute an error for use by percent
44. or the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Menu Bar File Functions Help Figure 5 EDM Menu Bar Composed of 3 Menu options File Functions and Help consisting of actions buttons o File Menu The File Menu is composed of 5 actions Load Save Import Export Load r Save and Exit that handle the files and logs Import to be displayed and or saved in the urn DataGrid Ja bi Ctrl Q Figure 6 File Menu Dropdown o Function Menu The Function Menu consists of 4 log processing actions that will either be Clipping 4 Sampling enabled or disabled depending on the state of the system Labeling 4 Add Process Figure 7 EDM Function menu Dropdown Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 o Help Menu The Help Menu contains the About action that displays the system E T o dep description and the current product Load Save Import Export Add version e g 20120227 File Functions Help Figure 8 EDM Help Menu showing the About button Tool Bar LLLA ddd Figure 9 EDM Toolbar with activated buttons The Tool bar is composed of action buttons that are also found in the menu bar for ease of use Load Button Loads log files which were previously saved using the EDM Workbench and stored in an EDM Workbench specific zip file The f
45. ouping rows with the same values for selected column Range Column Range of values used for computation Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Add Feature Buttons e Submit Button The submit button will execute the feature set by the user e Save Button The save button will save the user selected properties to a file to allow the same values to be used again later e Load Button The load button allows the user to reload a template e Cancel Button This cancels the selected feature and removes it from the process list Add Feature Parameters To add anew feature the user will have to set several parameters Depending on the operation that the user needs to perform the user will have to supply a subset of the parameters listed below Input Column Names lists the selected values The user can remove and or add values to the columns Click one or multiple items and click lt Add lt to add the value s or click lt lt Add All lt lt to add all column name Click gt Remove gt to delete one or multiple input column name or gt gt Remove Alb to remove all input column names Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 LH Add Process Default Compare Feature Name Default Compare Enabled True False Input Column Names choose column name s to be used in this fe
46. roups For example if the data should be grouped by student a single clip will contain data from only one student and not multiple students The Workbench also specifies the clip size either by time or by number of transactions Delineation of clips by beginning and ending events is not yet possible but is a feature planned for future implementation The Workbench then generates the clips for analysis according to a sampling scheme discussed in the next section Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 To clip the dataset click Clip Button located either in the Function menu Figure 7 or Toolbar Figure 9 The system will then display a form with the column names the basis for grouping e g group data with the same Logs of Student in Section A E with the same Anon Student Id and with the same Time and so on Clips can be divided by Size Time or Per Value Changed o Size as Clip Type By choosing Size as the Clip Type the user will need to specify the desired number of transactions in a clip Complete Clips Only when checked the system will only select clips where the number of logs is equal to the inputted clip size Allow Overlap when checked the system will produce clips with overlapping logs Given logs 1 2 3 4 5 and a clip size of 3 three clips will be produced 1 2 3 2 3 4 and 3 4 5 Ateneo Laboratory for the Learnin
47. s or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected columns Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Mean v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Figure 40 Default Mean function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected columns Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default MeanCountlf v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name E Output Column Names use to seperate each columns Check Values use to seperate each columns Figure 41 Default MeanCountIf function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Ateneo Laboratory for
48. s use to seperate each columns N Numbers Only 0 Check Values use to seperate each columns Figure 33 Default CountIfLastN function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Range Column Range of values used for computation Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected columns N Numbers Only if more elements in a group are found only the last N items are kept for processing start count every N rows Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default CountLastN Function Name Default Counti astN v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns N Numbers Only
49. ss Button Allows the user to add and possibly save an action to a sequence of actions Clip Button Groups logs from a given batch based on user specified parameters Sampling Button Selects rows from the dataset based on user parameters Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 10 Labelling Button Allows the user to supply ground truth labels for clip 11 Add Feature Allows the user to tailor functions to their specification DataGrid amp distiltest txt 3 TOW lesson name outcome prod type skill 1 ZGeneticsZGeneZInteractioy2PS student WRONG CROSSIPARENT 1 STRING PICK PARENTS 2 ZGeneticsZGeneZInteractioy2PS student WRONG CROSSIPARENT1 STRING PICK PARENTS I 3 ZGeneticsZGeneZInteractioy2PS student WRONG ICROSSIPARENT1 STRING PICK PARENTS 4 ZGeneticsZGeneZInteractioy2PS student HELP CROSSIPARENT 1 BLANK PICK PARENTS 5 ZGeneticsZGeneZInteractioy2PS _ student0 WRONG CROSSIPARENT 1 STRING IPICK PARENTS 6 ZGeneticsZGeneZInteractioy2PS student HELP ICROSSIPARENT1 ANK PICK PARENTS 7 ZGeneticsZGeneZInteractioy2PS _ student0 WRONG CROSSIPARENT 1 RING PICK PARENTS 8 ZGeneticsZGeneZInteractioy2PS _ student0 HELP CROSSIPARENT 1 BLANK PICK PARENTS 9 ZGeneticsZGeneZInteractioy2PS _ student0 HELP CROSS 1PARENT2 A PICK PARENTS 10 ZGeneticsZGeneZInteractioy2PS studen
50. t WRONG ICROSSIPARENT1 STRING PICK PARENTS 11 ZGeneticsZGeneZInteractioy2PS studentd HELP CROSSIPARENT 1 BLANK PICK PARENTS 12 ZGeneticsZGeneZInteractioy2PS student0 WRONG CROSSIPARENT 1 STRING PICK PARENTS 13 ZGeneticsZGeneZInteractioy2PS student IHELP EROS 1 IBLANK IPICK PARENTS bas lt m J Row Count 36395 Figure 10 EDM DataGrid The DataGrid displays the logs that are active and are to be processed The down arrow button hides the data grid Row Count 39468 Row Count controls the amount of rows shown in the active tab Ateneo Laboratory for the Learning Sciences F206 AdMU 15 Educational Data Mining Workbench User Manual V4 00 EEG Status Box hd Wed Nov 07 08 39 34 CST 2012 INFO Imported C Users Francis Desktop EDM v3 5 latest Distribution Copy Nov 5 Sample Wed Nov 07 08 39 42 CST 2012 INFO Imported C Users Francis Desktop EDM v3 5 latest Distribution Copy Nov 5 Sample Wed Nov 07 08 40 33 CST 2012 INFO Imported C Users Francis Desktop EDM v3 5 latest Distribution Copy Nov 5 Sample Figure 11 System Status Box The Status Bar displays feedback information such as status error messages time elapsed and others Loading Animation Loading animation has been added to export import load and save functions to easily identify if the program has either hanged or is still functioning Ini ting File Figure 12 Loading Animation Ateneo Laboratory for the
51. taset that usually refers to 1 log Interface Refers to the system graphical user interface Overall Description The EDM Workbench is a tool that helps researchers with processing data from various sources for developing meta cognitive and behavioural models The concept diagram in figure 1 illustrates the system functionalities and entities interacting with it The EDM Workbench s functions allow users to Define and modify behaviour categories of interest abel provi collected educational DE data with the categories of interest considerably faster than current methods De Collaborate with others in Labelling data by providing ways to communicate and document Labelling guidelines and standards Validate inter rater reliability between multiple labellers of the same educational log data corpus Automatically distil additional information from log files for use in machine ane Export student behaviour data to tools which enable sophisticated secondary analysis Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 DAT Statistical Packages Tag Helper Sequential Analyzer SAS SPSS R PSLC DataShop S condary ext Data econdary lysis Export Import Analysis Export Data Definition amp D nalysis amp Displa D ation Functionality pling Techniques C b Distilling A ollaborative I Act f Collaborative definition Learning System
52. th the correct spelling or by selecting a parameter from the dropdown list and then clicking on the Add Parameter button to insert the selected parameter e Save Template The system allows the user to save the selected Labelling properties A dialogue will be popped up and will ask for a template name The file will be saved as a Labelling xml file Template name WA My Labeling Template Figure 6 File Name input window ier e Load Template The user may select a template from the list of labelling templates displayed by the system The system will then load the properties of the selected template to the labelling form Ateneo Laboratory for the Learning Sciences F206 AAMU B L Educational Data Mining Workbench User Manual V4 00 Figure 67 Labelling template loading window abelling the dataset The Workbench then displays text replays of the clips together with the labelling options Figure 3 A coder reads through the text replay and selects the label that best describes the clip The labels are saved under a new column in the data set NOTE Because a coder may have to label tens of thousands of clips 5 the coder may save his or her work and can continue the labelling process in a later session A 3 49 Not Confused 2 Figure 68 Dataset labelling window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User
53. the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Group Column Used for grouping rows with the same values for selected columns Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Minimum wv Sort Columns use to seperate each columns Row v Add Column Name R Group Columns use to seperate each columns Row v Add Column Name Output Column Names use to seperate each columns Figure 42 Default Minimum function window Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not Sort Column used for sorting the rows within the same group Group Column Used for grouping rows with the same values for selected column Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default Or Function Name Default Or Enabled True False Input Column Names choose column name s to be used in this feature lt Add lt wW lt lt Add All lt lt Sample Student Id 2 2 oblem Hierarchy gt gt Remove All gt gt oblem Name Swap Contents lt Output Column Names use to seperate each c
54. ts lt Output Column Names use to seperate each columns Check Values use to seperate each columns All Strings Check Values use to seperate each columns All Strings True C False Operation Type Numbers Only 0 Figure 32 Default Compare window Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 Parameters Needed Enabled indicates whether to the selected feature will be used in the process or not True Value assigned to the result in the Output Column Name if operation returns a true False Value assigned to the result in the Output Column Name if operation returns a false Check Value is the value to be compared against the Selected Input Column Names This value can either be a string or integer depending on the feature used All String checks if all the column values are strings not numbers or any other type Operation Type contains values from 1 6 that correspond to different operations Strings or integers can be compared in this feature Ateneo Laboratory for the Learning Sciences F206 AAMU Educational Data Mining Workbench User Manual V4 00 e Default CountIfLastN Function Name Default CountIflastN v Sort Columns use to seperate each columns Row v Add Column Name Group Columns use to seperate each columns Row v Add Column Name Output Column Name
Download Pdf Manuals
Related Search
Related Contents
MEP MHD6X KWC FIT-AIR Z.504.020.000 User's Manual HMS99C51 HMS99C52 Samsung GT-S5330 manual de utilizador Manual - Moneual ME770Style Manual de instalación y mantenimiento Actuador eléctrico Manuel d`installation Copyright © All rights reserved.
Failed to retrieve file