Home

Hadoop Deployment Manual - Support

1. Data directories var lib hadoop Apache12 Uhdfs datanode HTTP port 50075 pon E Service handler count 3 b Users amp G Failed volume tolerance 0 Workload A 2 Monitoring Bandwidth for balancer 1048576 bytes second y Authorizat Ss UA ou Figure 4 2 Advanced Configuration Options For The Data Node service In cmgui Typically care must be taken to use non conflicting ports and direc tories for each Hadoop instance 4 1 2 Monitoring And Modifying The Running Of Hadoop Services From cmgui The status of Hadoop services can be viewed and changed in the follow ing locations e The Services tab of the node instance Within the Nodes or Head Nodes resource for a head or regular node instance the Services tab can be selected The Hadoop services can then be viewed and altered from within the display pane figure 4 3 Eile Monitoring View Bookmarks Help node001 E Brighttrunk Cluster p E Node Cat b Head Nodes D Racks D EJ Chassis gt E Virtual SM v Nodes Name wi Status A Monitored Autostart hadoop Apache121 datanode Up Y hadoop Apache121 hbase regionserver Up Y hadoop Apachel 21 tasktracker Up Y hadoop Apache121 zookeeper Up Y Stop Restart Reload Reset Refresh status Figure 4 3 Services In The Services Tab Of A Node In cmgui Bright Computing Inc 4 1 Hadoop Services From cmgui 25 The options butt
2. Configuration directory jetc hadoop Horton120 Temporary directory iimp hadoop Horton120 Log directory Wvar log hadoop Horton120 Topology _ Switch y w 4 Default replication factor Maximum replication factor 512 x Balancing policy dataNode Balancer period 2 hour s Balancer threshold 10 x Figure 3 3 Settings Tab View For A Hadoop Instance In cmgui It allows the following Hadoop related settings to be changed e WebHDFS Allows HDFS to be modified via the web e Description An arbitrary user defined description Directory settings for the Hadoop instance Directories for config uration temporary purposes and logging Topology awareness setting Optimizes Hadoop for racks switches or nothing when it comes to network topology e Replication factors Set the default and maximum replication al lowed Balancing settings Spread loads evenly across the instance 3 1 3 The HDFS Instance Tasks Tab The Tasks tab figure 3 4 displays Hadoop related tasks that the admin istrator can execute Bright Computing Inc 18 Viewing And Managing HDFS From CMDaemon File Monitoring View Bookmarks Help RESOURCES Q Apache220 Bright runk Cluster 5 My Clusters YH Bright trunk Clu D Gi Switches C sop cons ME EE E DS Power Distri D E Software Im D Node Categ y O D Racks D Chassis D Gj Virtual SMP
3. 28 Running Hadoop Jobs org apache hadoop examples terasort TeraGenSCounters 14 03 24 15 07 03 INFO terasort TeraSort starting 14 03 24 15 09 12 INFO terasort TeraSort done gen_test done start doing PI test Working Directory user root bbp During the run the Overview tab in cmgui introduced in section 3 1 1 for the Hadoop instance should show activity as it refreshes its overview every three minutes figure 5 1 gt Apache220 Bright trunk Cluster Nodes 04080 Total capacity 23 62 GiB Heap Memory IA gt 133GB cur of 278 GB Apps running Used capacity 208 MIB Non Heap Memory A y i Remaining capacity 19 44 GiB 297 95 MiB out of 31181MiB ps pending pps Pp g Total files 90 Min used heap memory 53 07 MiB Apps completed Total blocks 32 Max used heap memory 222 13 MiB gt aoo o Apps failed Pending replication blocks 0 Min used non heap memory 21 78 MiB Apps submitted 4 Under replicated blocks 0 Max used non heap memory 31 76 MiB Missing blocks o A Data node critical events o Avg Heart beatsend time 150 ms Avg Block report time 4750 ms Figure 5 1 Overview Of Activity Seen For A Hadoop Instance In cmgui In cmsh the overview command shows the most recent values that can be retrieved when the command is run mk hadoop centos6 gt hadoop overview apache220 Parameter Value Name Apache220 Capacity total 27 56GB Capac
4. Hadoop filesystem gt Nodes D E Cloud Nodes D OI MIC Nodes D E GPU Units D 3 Other Devices gt Ed Node Groups Group v 3 Hadoop HDFS node001 Apache220 Manual failover Figure 3 4 Tasks Tab View For A Hadoop Instance In cmgui The Tasks tab allows the administrator to e Restart start or stop all the Hadoop daemons Start Stop the Hadoop balancer Format the Hadoop filesystem A warning is given before format ting e Carry out a manual failover for the NameNode This feature means making the other NameNode the active one and is available only for Hadoop instances that support NameNode failover 3 1 4 The HDFS Instance Notes Tab The Notes tab is like a jotting pad and can be used by the administrator to write notes for the associated Hadoop instance 3 2 cmsh And hadoop Mode The cmsh front end can be used instead of cmgui to display information on Hadoop related values and to carry out Hadoop related tasks 3 2 1 The show And overview Commands The show Command Within hadoop mode the show command displays parameters that cor respond mostly to cmgui s Settings tab in the Hadoop resource sec tion 3 1 2 Example root bright70 conf cmsh bright70 hadoop bright70 gt hadoop list Name key Hadoop version Hadoop distribution Configuration directory Apachel21 1 2 1 Apache etc hadoop Apachel21 bright70 gt hadoop show apachel21 Parameter Value Bright Computing Inc
5. Installation successfully completed Finished 6 3 2 Kafka Removal With cm kafka setup cm kafka setup should also be used to remove the Kafka instance Example root bright70 cm kafka setup u hdfsl Requested removal of Kafka for Hadoop instance hdfsl Stopping removing services done Removing module file done Removing additional Kafka directories done Updating images done Removal successfully completed Finished 6 4 Pig Apache Pig is a platform for analyzing large data sets Pig consists of a high level language for expressing data analysis programs coupled with infrastructure for evaluating these programs Pig programs are intended by language design to fit well with embarrassingly parallel problems that deal with large data sets The Apache Pig tarball should be down loaded from one of the locations specified in Section 1 2 depending on the chosen distribution 6 4 1 Pig Installation With cm pig setup Bright Cluster Manager provides cm pig setup to carry out Pig instal lation Bright Computing Inc Hadoop related Projects Prerequisites For Pig Installation And What Pig Installation Does The following applies to using cm pig setup e A Hadoop instance must already be installed e cm pig setup installs Pig only on the active head node The script assigns no roles to nodes e Pig is copied by the script to a subdirectory under cm shared hadoop Pig configuration file
6. done Removing module file done Removing additional Storm directories done Updating images done Removal successfully completed Finished 6 7 3 Using Storm The following example shows how to submit a topology and then verify that it has been submitted successfully some lines elided root bright70 module load storm hdfsl root bright70 storm jar cm shared apps hadoop Apache apache storm 0 9 3 exampl s storm starter storm starter topologies jar storm starter WordCountTopology WordCount2 470 main INFO backtype storm StormSubmitter Jar not uploaded to m Bright Computing Inc 6 7 Storm 45 aster yet Submitting Jar 476 main INFO backtype storm StormSubmitter Uploading topology ja r com shared apps hadoop Apache apache storm 0 9 3 examples storm start er storm starter topologies 0 9 3 jar to assigned location tmp storm hdfs1 local nimbus inbox stormjar bflabdd0 f31a 41ff b808 Adaadldfdaa3 1 jar Start uploading file cm shared apps hadoop Apache apache storm 0 9 3 examples storm starter storm starter topologies 0 9 3 jar to tmp stol rm hdfs1 local nimbus inbox stormjar bflabdd0 f31a 41ff b808 4daadldfdal a3 jar 3248859 bytes 3248859 3248859 File cm shared apps hadoop Apache apache storm 0 9 3 examples storm s tarter storm starter topologies 0 9 3 jar uploaded to tmp storm hd sl 1 local nimbus inbox
7. Bright Cluster Manager 7 0 Hadoop Deployment Manual Revision 6197 Date Fri 15 May 2015 2015 Bright Computing Inc All Rights Reserved This manual or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Bright Computing Inc Trademarks Linux is a registered trademark of Linus Torvalds PathScale is a regis tered trademark of Cray Inc Red Hat and all Red Hat based trademarks are trademarks or registered trademarks of Red Hat Inc SUSE is a reg istered trademark of Novell Inc PGI is a registered trademark of The Portland Group Compiler Technology STMicroelectronics Inc SGE is a trademark of Sun Microsystems Inc FLEXIm is a registered trademark of Globetrotter Software Inc Maui Cluster Scheduler is a trademark of Adaptive Computing Inc ScaleMP is a registered trademark of ScaleMP Inc All other trademarks are the property of their respective owners Rights and Restrictions All statements specifications recommendations and technical informa tion contained herein are current or planned as of the date of publication of this document They are reliable as of the time of this writing and are presented without warranty of any kind expressed or implied Bright Computing Inc shall not be liable for technical or editorial errors or omissions which may occur in this document Bright Computing Inc shall not be liable for any damages resulting from the use of this do
8. Bright trunk Cluster gt y Switches gt Gj Networks Nodes 0403080 Total ca gt E Power Distrib i Used ca gt E Software Ima Haya i Remainii gt Node Catego Apps pending 0 gt 0 Head Nodes Total file R Apps completed 0 b fmm Racks PP P Total bic gt Chassis sees gt Gay Virtual SMP Hipa Jill S Pending gt Nodes Apps submitted 0 Under re gt Y Cloud Nodes are gt CI MIC Nodes ASI gt E GPU Units Avg Hea gt J Other Devices Avg Bloc gt C Node Groups Y 3 Hadoop HDFS S ee Metric AlertLevel X More A venice a Workload Ma Figure 3 1 Tabs For A Hadoop Instance In cmgui The possible tabs are Overview section 3 1 1 Settings section 3 1 2 Tasks section 3 1 3 and Notes section 3 1 4 Various sub panes are displayed in the possible tabs Bright Computing Inc 16 Viewing And Managing HDFS From CMDaemon 3 1 1 The HDFS Instance Overview Tab The Overview tab pane figure 3 2 arranges Hadoop related items in sub panes for convenient viewing 1 Horton120 Bright trunk Cluster Nodes 004086 0 Total capacity 0B Apps running 0 Used capacity oB E Remaining capacity 0B Apps pending 0 Total files 0 Al leted 0 loi taoa Total blocks 0 Apps failed o Pending replication blocks 0 Apps submitted 0 Under replicated blocks 0 Missing blocks 0 Avg Heart beat send time Oms Avg Block report time 0 ms Metric AlertLevel More
9. Inthe hadoop2clusterextensioncon xml file the cloud director that is to be used with the Hadoop cloud nodes must be specified by the administrator with the lt edge gt lt edge gt tag pair Example lt edge gt lt hosts gt eu west 1 director lt hosts gt lt edge gt Maintenance operations such as a format will automatically and transparently be carried out by cmdaemon running on the cloud director and not on the head node There are some shortcomings as a result of relying upon the cloud director Cloud nodes depend on the same cloud director While Hadoop installation cm hadoop setup is run on the head node users must run Hadoop commands job submis sions and so on from the director not from the head node It is not possible to mix cloud and non cloud nodes for the same Hadoop instance That is a local Hadoop instance can not be extended by adding cloud nodes Bright Computing Inc Viewing And Managing HDFS From CMDaemon 3 1 cmgui And The HDFS Resource In cmgui clicking on the resource folder for Hadoop in the resource tree opens up an Overview tab that displays a list of Hadoop HDFS instances as in figure 2 1 Clicking on an instance brings up a tab from a series of tabs associated with that instance as in figure 3 1 X Bright Cluster Manager File Monitoring View Bookmarks Help RESOURCES Q Horton120 v 3 My Clusters Overview Settings Tasks Notes
10. ages unless such copyright owner or third party has signed a writing to the contrary Table of Contents Table of Contents o 200002 eee eee i 0 1 About ThisManual 0 000004 iii 0 2 About The Manuals In General iii 0 3 Getting Administrator Level Support iv Introduction 1 1 1 What Is Hadoop About o o o ooo oo ooo 1 1 2 Available Hadoop Implementations 2 1 3 Further Documentation 3 Installing Hadoop 5 2 1 Command line Installation Of Hadoop Using cm hadoop setup c lt filename gt 5 ZE Usage satis as PE re to nde i oS Se Gols la 5 2 1 2 AnlnstallRun 6 2 2 Ncurses Installation Of Hadoop Using cm hadoop setup 8 2 3 Avoiding Misconfigurations During Hadoop Installation 9 2 3 1 NameNode Configuration Choices 9 2 4 Installing Hadoop With Lustre 10 2 4 1 Lustre Internal Server Installation 10 2 4 2 Lustre External Server Installation 11 2 4 3 Lustre Client Installation 11 2 4 4 Lustre Hadoop Configuration 11 2 5 Hadoop Installation In A Cloud 13 Viewing And Managing HDFS From CMDaemon 15 3 1 cmgui And The HDFS Resource 15 3 1 1 The HDFS Instance OverviewTab 16 3 1 2 The HDFS Instance Settings Tab 16 3 1 3 The HDFS Instance Tasks Tab 17 3 1 4 The HDFS Instance Notes
11. cm hadoop setup c tmp hadoopliconf xml Reading config from file tmp hadooplconf xml done Hadoop flavor Apache release 1 2 1 Will now install Hadoop in cm shared apps hadoop Apache 1 2 1 and conf igure instance Myhadoop Hadoop distro being installed done Zookeeper being installed done HBase being installed done Creating module file done Configuring Hadoop instance on local filesystem and images done Updating images starting imageupdate for node node003 started starting imageupdate for node node002 started starting imageupdate for node node001 started starting imageupdate for node node004 started Waiting for imageupdate to finish done Creating Hadoop instance in cmdaemon done Formatting HDFS done Waiting for datanodes to come up done Setting up HDFS done The Hadoop instance should now be running The name defined for it in the XML file should show up within cmgui in the Hadoop HDFS resource tree folder figure 2 1 O Bright Computing Inc Installing Hadoop nitoring View Bookmarks Help Hadoop HDFS Bright trunk Cluster gt Chassis D gt Virtual SMP Nodes gt Nodes D E Cloud Nodes D E MIC Nodes gt GPU Units gt LJ Other Devices gt Node Groups Modified Name wi Distribution Myhadoop Clone Remove Myhadoop Figure 2 1 A Hadoop Instance In cmgui The i
12. 3 2 cmsh And hadoop Mode 19 Automatic failover Balancer Threshold Balancer period Balancing policy Cluster ID Configuration directory Configuration directory for HBase Configuration directory for ZooKeeper Creation time Default replication factor Enable HA for YARN Enable NFS gateway Enable Web HDFS HA enabled HA name service HDFS HDFS HDFS HDFS Hadoop distribution Block size Permission Umask log directory Hadoop root Hadoop version Installation directory for HBase Installation directory for ZooKeeper Installation directory Maximum replication factor Network Revision Root directory for data Temporary directory Topology Use HTTPS Use Lustre Use federation Use only HTTPS YARN automatic failover description name notes The overview Command Disabled 10 2 dataNode etc hadoop Apachel21 etc hadoop Apachel21 hbase etc hadoop Apachel21 zookeeper 07 Feb 2014 17 03 01 CET Fri 3 no no no no 67108864 no 077 var log hadoop Apache121 Apache LeeL cm shared apps hadoop Apache hbase 0 cm shared apps hadoop Apache zookeepe cm shared apps hadoop Apache 1 2 1 512 var lib tmp hadoop Apache121 Switch no no no no Hadoop installed from Apachel21 tmp hadoop 1 2 1 tar Similarly the overview command corresponds somewhat to the cmgui s Overview tab in the Hadoop resource section 3 1 1 and pro vides hadoop related infor
13. HBase and Zookeeper components Some explanations of the items being configured are given along the way In addition some minor validation checks are done and some options are restricted The suggested default values will work Other values can be chosen instead of the defaults but some care in selection usually a good idea This is because Hadoop is a complex software which means that values other than the defaults can sometimes lead to unworkable configurations section 2 3 The ncurses installation results in an XML configuration file This file can be used with the c option of cm hadoop setup to carry out the installation 2 3 Avoiding Misconfigurations During Hadoop Installation A misconfiguration can be defined as a configuration that works badly or not at all For Hadoop to work well the following common misconfigurations should normally be avoided Some of these result in warnings from Bright Cluster Manager validation checks during configuration but can be over ridden An override is useful for cases where the administrator would just like to for example test some issues not related to scale or perfor mance 2 3 1 NameNode Configuration Choices One of the following NameNode configuration options must be chosen when Hadoop is installed The choice should be made with care because changing between the options after installation is not possible Hadoop 1 x NameNode Configuration Choices e NameNode can optionally have a
14. SecondaryNameNode SecondaryNameNode offloads metadata operations from Name Node and also stores the metadata offline to some extent It is not by any means a high availability solution While recovery Bright Computing Inc Installing Hadoop from a failed head node is possible from SecondaryNameNode it is not easy and it is not recommended or supported by Bright Cluster Manager Hadoop 2 x NameNode Configuration Choices e NameNode and SecondaryNameNode can run as in Hadoop 1 x However the following configurations are also possible e NameNode HA with manual failover In this configuration Hadoop has NameNodel and NameNode2 up at the same time with one ac tive and one on standby Which NameNode is active and which is on standby is set manually by the administrator If one NameNode fails then failover must be executed manually Metadata changes are managed by ZooKeeper which relies on a quorum of Journal Nodes The number of JournalNodes is therefore set to 3 5 7 e NameNode HA with automatic failover As for the manual case except that in this case ZooKeeper manages failover too Which NameNode is active and which is on standby is therefore decided automatically e NameNode Federation In NameNode Fedaration the storage of metadata is split among several NameNodes each of which has a corresponding SecondaryNameNode Each pair takes care of a part of HDFS In Bright Cluster Manager there are 4 NameNodes
15. gt hadoop The startbalancer And stopbalancer Commands For Hadoop For efficient access to HDFS file block level usage across nodes should be reasonably balanced Hadoop can start and stop balancing across the instance with the following commands e startbalancer e stopbalancer The balancer policy threshold and period can be retrieved or set for the instance using the parameters O Bright Computing Inc 3 2 cmsh And hadoop Mode 21 e balancerperiod e balancerthreshold e balancingpolicy Example bright70 gt hadoop get apachel21 balancerperiod 2 bright 70 gt hadoop bright70 gt hadoop commit bright 70 gt hadoop Code 0 Starting Hadoop balancer daemon hadoop apachel21 balancer starting ba set apachel21 balancerperiod 3 startbalancer apachel21 lancer logging to var log hadoop apachel21 hdfs hadoop hdfs balancer bright70 out Time Stamp Iteration Bytes Moved Bytes To Move Bytes Being Moved The cluster is balanced Exiting bright 70 gt hadoop Thu Mar 20 15 27 02 2014 notice bright70 Started balancer for apachel 121 For details type events details 152727 The formathdfs Command Usage formathdfs lt HDFS gt The formathdfs command formats an instance so that it can be reused Existing Hadoop services for the instance are stopped first before format ting HDFS and started again after formatting is complete Example bright70 gt hadoop formathdfs a
16. stormjar bflabdd0 f3la 41f f b808 4daadidfdaa3 jar 3248859 bytes 508 main INFO backtype storm StormSubmitter Successfully uploaded topology jar to assigned location tmp storm hdfsl local nimbus inboxl stormjar bflabdd0 f3la 41ff b808 4daadldfdaa3 jar 508 main INFO backtype storm StormSubmitter Submitting topology WM ordCount2 in distributed mode with conf topology workers 3 topology debug true 687 main INFO backtype storm StormSubmitter Finished submitting t opology WordCount2 root hadoopdev storm list Topology_name Status Num_tasks Num_workers Uptime_secs WordCount2 ACTIVE 28 3 15 Bright Computing Inc
17. testing hive client testing beeline client Hive setup validation done Installation successfully completed Finished Bright Computing Inc 36 Hadoop related Projects 6 2 2 Hive Removal With cm hive setup cm hive setup should also be used to remove the Hive instance Data and metadata will not be removed Example root bright70 cm hive setup u hdfsl Requested removal of Hive for Hadoop instance hdfsl Stopping removing services done Removing module file done Removing additional Hive directories done Updating images done Removal successfully completed Finished 6 2 3 Beeline The latest Hive releases include HiveServer2 which supports Beeline com mand shell Beeline is a JDBC client based on the SQLLine CLI http sqlline sourceforge net In the following example Beeline connects to HiveServer2 Example root bright70 beeline u jdbc hive2 node005 cm cluster 10000 d org apache hive jdbc HiveDriver e SHOW TABLES Connecting to jdbc hive2 node005 cm cluster 10000 Connected to Apache Hive version 1 1 0 Driver Hive JDBC version 1 1 0 Transaction isolation TRANSACTION_REPEATABLE_READ tab_name test test2 2 rows selected 0 243 seconds Beeline version 1 1 0 by Apache Hive Closing 0 jdbc hive2 node005 cm cluster 10000 6 3 Kafka Apache Kafka is a distributed publish subscribe
18. 20 10 11 Feb 2014 14 20 00 Nodes configured for this Hadoop HDFS Roles Nodes ZooKeeper DataNode HBaseServer TaskTracker hadoop2 ZooKeeper NameNode HBaseClient DataNode TaskTracker node001 ZooKeeper SecondaryNameNode HBaseClient DataNode TaskTracker node002 ZooKeeper HBaseClient JobTracker DataNode TaskTracker node003 ZooKeeper HBaseClient DataNode TaskTracker node004 Heap Memory 0B out of 0B Non Heap Memory 0B outof 0B Min used heap memory 4GiB Max used heap memory 4GiB Min used non heap memory 4 GiB Max used non heap memory 4 GiB Data node critical events 0 11 Feb 2014 15 15 00 Figure 3 2 Overview Tab View For A Hadoop Instance In cmgui The following items can be viewed e Statistics In the first sub pane row are statistics associated with the Hadoop instance These include node data on the number of appli cations that are running as well as memory and capacity usage e Metrics In the second sub pane row a metric can be selected for display as a graph e Roles In the third sub pane row the roles associated with a node are displayed 3 1 2 The HDFS Instance Settings Tab The Settings tab pane figure 3 3 displays the configuration of the Hadoop instance Bright Computing Inc 3 1 cmgui And The HDFS Resource 17 oP Horton120 E Bright trunk Cluster Name Horton120 Web HDFS Version 12 0 1 3 2 0 111 Distribution HortonWorks Description
19. Tab 18 3 2 cmsh And hadoop Mode 18 3 2 1 The show And overview Commands 18 3 2 2 The Tasks Commands 20 Hadoop Services 23 4 1 Hadoop Services From cmgui o oo 23 4 1 1 Configuring Hadoop Services From cmgui 23 4 1 2 Monitoring And Modifying The Running Of Hadoop Services From cmgui 24 ii Table of Contents 5 Running Hadoop Jobs 5 1 5 2 Shakedown Runs o 0000200 E A Example End User Job Run 6 Hadoop related Projects 6 1 6 2 6 3 6 4 6 5 6 6 6 7 ACUMULAR A ALA 6 1 1 Accumulo Installation With cm accumulo setup 6 1 2 Accumulo Removal With cm accumulo setup 6 1 3 Accumulo MapReduce Example 6 2 1 Hive Installation With cm hive setup 6 2 2 Hive Removal With cm hive setup 62 3 Beeline 3254 24 22 A AR 6 3 1 Kafka Installation With cm kafka setup 6 3 2 Kafka Removal With cm kafka setup 6 4 1 Pig Installation With cm pig setup 6 4 2 Pig Removal With cm pig setup 643 Using Pig oe tne dei ete Be ec a es 6 5 1 Spark Installation With cm spark setup 6 5 2 Spark Removal With cm spark setup 6 5 3 Using Spark wc 8 aw pana e OGOOP ua Bee Lk old Wed OS Ce ER is 6 6 1 Sqoop Installation With cm sqoop setup 6 6 2 Sqoop Removal With cm sqoop setup STOM ech aden genes gk ar oe Sate een ee aio sx 6 7 1 Stor
20. can both run with Lustre under Bright Cluster Manager The configuration for these can be done as follows e IDH A subdirectory of mnt lustre must be specified in the hadoop2lustreconf xml file within the lt afs gt lt afs gt tag pair Example lt afs gt lt fstype gt lustre lt fstype gt lt fsroot gt mnt lustre hadoop lt fsroot gt Bright Computing Inc 12 Installing Hadoop lt afs gt e Cloudera A subdirectory of mnt lustre must be specified in the hadoop2lustreconf xml file within the lt afs gt lt afs gt tag pair In addition an lt fsjar gt lt fsjar gt tag pair must be specified manually for the jar that the Intel IEEL 2 x distribution pro vides Example lt afs gt lt fstype gt lustre lt fstype gt lt fsroot gt mnt lustre hadoop lt fsroot gt lt fsjar gt root lustre hadoop lustre plugin 2 3 0 jar lt fsjar gt lt afs gt The installation of the Lustre plugin is automatic if this jar name is set to the right name when the cm hadoop setup script is run Lustre Hadoop Installation With cm hadoop setup The XML configuration file specifies how Lustre should be integrated in Hadoop If the configuration file is at lt root hadooplustreconfig xml gt then it can be run as Example cm hadoop setup c lt root hadooplustreconfig xml gt As part of configuring Hadoop to use Lustre the execution will e Set the ACLs on the directory specified within the lt fsr
21. city 9 1 gt C Head Nodes gt Racks Apps completed 3 D Chassis gt 3 Virtual SMP Nodes pps o gt G Nodes Apps submitted 3 D E Cloud Nodes gt E MIC Nodes Lustre Hadoop Yes mnt lustre hadoop gt E GPU Units D Other Devices Y Node Groups Y 3 Hadoop HDFS Metric hdfsi_hadoop_lustrefs_Used y More Ceph B Users amp Groups 170M Workload Management bA Monitoring Configuration Authorization B Authentication 160M 150M 07 Nov 2014 22 25 00 NMadac confiaurad for thic Hardnnn UNES pS lt Figure 2 3 A Hadoop Instance With Lustre In cmgui 2 5 Hadoop Installation In A Cloud Hadoop can make use of cloud services so that it runs as a Cluster On Demand configuration Chapter 2 of the Cloudbursting Manual or a Clus ter Extension configuration Chapter 3 of the Cloudbursting Manual In both cases the cloud nodes used should be at least m1 medium e For Cluster On Demand the following considerations apply There are no specific issues After a stop start cycle Hadoop recognizes the new IP addresses and refreshes the list of nodes accordingly section 2 4 1 of the Cloudbursting Manual e For Cluster Extension the following considerations apply To install Hadoop on cloud nodes the XML configuration cm local apps cluster tools hadoop conf hadoop2clusterextensionconf xml can be used as a guide O Bright Computing Inc 14 Installing Hadoop
22. cu ment Limitation of Liability and Damages Pertaining to Bright Computing Inc The Bright Cluster Manager product principally consists of free software that is licensed by the Linux authors free of charge Bright Computing Inc shall have no liability nor will Bright Computing Inc provide any warranty for the Bright Cluster Manager to the extent that is permitted by law Unless confirmed in writing the Linux authors and or third par ties provide the program as is without any warranty either expressed or implied including but not limited to marketability or suitability for a specific purpose The user of the Bright Cluster Manager product shall accept the full risk for the quality or performance of the product Should the product malfunction the costs for repair service or correction will be borne by the user of the Bright Cluster Manager product No copyright owner or third party who has modified or distributed the program as permitted in this license shall be held liable for damages including gen eral or specific damages damages caused by side effects or consequential damages resulting from the use of the program or the un usability of the program including but not limited to loss of data incorrect processing of data losses that must be borne by you or others or the inability of the program to work together with any other program even if a copyright owner or third party had been advised about the possibility of such dam
23. d in regular relational databases The analysis of such data called data mining can be done better with Hadoop than with relational databases for certain types of parallelizable problems Hadoop has the following characteristics in comparison with relational databases 1 Less structured input Key value pairs are used as records for the data sets instead of a database 2 Scale out rather than scale up design For large data sets if the size of a parallelizable problem increases linearly the correspond ing cost of scaling up a single machine to solve it tends to grow ex ponentially simply because the hardware requirements tend to get exponentially expensive If however the system that solves it is a cluster then the corresponding cost tends to grow linearly because it can be solved by scaling out the cluster with a linear increase in the number of processing nodes Scaling out can be done with some effort for database problems using a parallel relational database implementation However scale out is inherent in Hadoop and therefore often easier to implement with Hadoop The Hadoop scale out approach is based on the fol lowing design e Clustered storage Instead of a single node with a special large storage device a distributed filesystem HDFS using commodity hardware devices across many nodes stores the data e Clustered processing Instead of using a single node with many processors the parallel processing needs of th
24. de and on the necessary image s e By default Accumulo Tablet Servers are set to use 1GB of memory A different value can be set via cm accumulo setup e The secret string for the instance is a random string created by cm accumulo setup e A password for the root user must be specified e The Tracer service will use Accumulo user root to connect to Ac cumulo e The services for Garbage Collector Master Tracer and Monitor are by default installed and run on the headnode They can be installed and run on another node instead as shown in the next example using the master option e A Tablet Server will be started on each DataNode e cm accumulo setup tries to build the native map library e Validation tests are carried out by the script The options for cm accumulo setup are listed on running cm accumulo setup h An Example Run With cm accumulo setup The option p lt rootpass gt is mandatory The specified password will also be used by the Tracer service to connect to Accumulo The password will be stored in accumulo site xml with read and write permissions assigned to root only O Bright Computing Inc 6 1 Accumulo 33 The option s lt heapsize gt is not mandatory If not set a default value of 1GB is used The option master lt nodename gt is not mandatory It is used to set the node on which the Garbage Collec tor Master Tracer and Monitor services run If not set then these service
25. dule load pig hdfs1l root bright70 pig Bright Computing Inc 6 5 Spark 39 14 08 26 11 57 41 INFO pig ExecTypeProvider Trying ExecType LOCAL 14 08 26 11 57 41 INFO pig ExecTypeProvider Trying ExecType MAPREDUCE 14 08 26 11 57 41 INFO pig ExecTypeProvider Picked MAPREDUCE as the Exl ecType grunt gt or in batch mode using a Pig Latin script root bright70 module load hadoop hdfs1l root bright70 module load pig hdfs1l root bright70 pig v f tmp smoke pig In both cases Pig runs in MapReduce mode thus working on the corresponding HDFS instance 6 5 Spark Apache Spark is an engine for processing Hadoop data It can carry out general data processing similar to MapReduce but typically faster Spark can also carry out the following with the associated high level tools e stream feed processing with Spark Streaming e SQL queries on structured distributed data with Spark SQL processing with machine learning algorithms using MLlib e graph computation for arbitrarily connected networks with graphX The Apache Spark tarball should be downloaded from http spark apache org where different pre built tarballs are available for Hadoop 1 x for CDH 4 and for Hadoop 2 x 6 5 1 Spark Installation With cm spark setup Bright Cluster Manager provides cm spark setup to carry out Spark installation Prerequisites For Spark Installat
26. e and on the necessary image s e Validation tests are carried out by the script O Bright Computing Inc 44 Hadoop related Projects An Example Run With cm storm setup Example root bright70 cm storm setup i hdfs1 j usr lib jvm jre 1 7 0 openjdk x86_64 t apache storm 0 9 4 tar gz nimbus node005 Storm release 0 9 4 Storm Nimbus and UI services will be run on node node005 Found Hadoop instance hdfsl release 2 2 0 Storm being installed done Creating directories for Storm done Creating module file for Storm done Creating configuration files for Storm done Updating images done Initializing worker services for Storm on DataNodes done Initializing Nimbus services for Storm done Executing validation test done Installation successfully completed Finished The cm storm setup installation script submits a validation topol ogy topology in the Storm sense called WordCount After a success ful installation users can connect to the Storm UI on the host lt nim bus gt the Nimbus server at http lt nimbus gt 10080 There they can check the status of WordCount and can kill it 6 7 2 Storm Removal With cm storm setup The cm storm setup script should also be used to remove the Storm instance Example root bright70 cm storm setup u hdfsl Requested removal of Storm for Hadoop instance hdfsl Stopping removing services
27. e problem are distributed out over many nodes The procedure is called the MapReduce algorithm and is based on the following approach The distribution process maps the initial state of the prob lem into processes out to the nodes ready to be handled in parallel Processing tasks are carried out on the data at nodes them selves Bright Computing Inc Introduction The results are reduced back to one result 3 Automated failure handling at application level for data Repli cation of the data takes place across the DataNodes which are the nodes holding the data If a DataNode has failed then another node which has the replicated data on it is used instead automat ically Hadoop switches over quickly in comparison to replicated database clusters due to not having to check database table consis tency 1 2 Available Hadoop Implementations Bright Cluster Manager supports the Hadoop implementations provided by the following organizations 1 Apache http apache org This is the upstream source for the Hadoop core and some related components which all the other implementations use 2 Cloudera http www cloudera com Cloudera provides some extra premium functionality and components on top of a Hadoop suite One of the extra components that Cloudera provides is the Cloudera Management Suite a major proprietary manage ment layer with some premium features 3 Hortonworks http hortonwork
28. e used to remove the Spark instance Example root bright70 cm spark setup u hdfsl Requested removal of Spark for Hadoop instance hdfs1 Stopping removing services done Removing module file done Removing additional Spark directories done Removal successfully completed Finished 6 5 3 Using Spark Spark supports two deploy modes to launch Spark applications on YARN Considering the SparkPi example provided with Spark e In yarn client mode the Spark driver runs in the client process and the SparkPi application is run as a child thread of Application Master root bright70 module load spark hdfs1l root bright70 spark submit master yarn client class org apache spark examples SparkPi SSPARK_PREFIX lib spark examples jar e In yarn cluster mode the Spark driver runs inside an Application Master process which is managed by YARN on the cluster root bright70 module load spark hdfs1l root bright70 spark submit master yarn cluster class org apache spark examples SparkPi AN SSPARK_PREFIX 1lib spark examples x Jar 6 6 Sqoop Apache Sqoop is a tool designed to transfer bulk data between Hadoop and an RDBMS Sqoop uses MapReduce to import and export data Bright Cluster Manager supports transfers between Sqoop and MySQL RHEL7 and SLES12 use MariaDB and are not yet supported by the available versions of Sqoop at the time of writing April 2015 At pre
29. g main components e Tablet Server which manages subsets of all tables e Garbage Collector to delete files no longer needed e Master responsible of coordination e Tracer collection traces about Accumulo operations e Monitor web application showing information about the instance Also a part of the instance is a client library linked to Accumulo ap plications The Apache Accumulo tarball can be downloaded from http accumulo apache org For Hortonworks HDP 2 1 x the Accumulo tarball can be downloaded from the Hortonworks website section 1 2 O Bright Computing Inc 32 Hadoop related Projects 6 1 1 Accumulo Installation With cm accumulo setup Bright Cluster Manager provides cm accumulo setup to carry out the installation of Accumulo Prerequisites For Accumulo Installation And What Accumulo Installa tion Does The following applies to using cm accumulo setup e A Hadoop instance with ZooKeeper must already be installed e Hadoop can be configured with a single NameNode or NameNode HA but not with NameNode federation e The cm accumulo setup script only installs Accumulo on the ac tive head node and on the DataNodes of the chosen Hadoop in stance e The script assigns no roles to nodes e Accumulo executables are copied by the script to a subdirectory un der cm shared hadoop e Accumulo configuration files are copied by the script to under etc hadoop This is done both on the active headno
30. ger With Hadoop installation ISO in cludes the cm apache hadoop package which contains tarballs from the Apache Hadoop project suitable for cn hadoop setup 2 1 Command line Installation Of Hadoop Using cm hadoop setup c lt filename gt 2 1 1 Usage root bright70 cm hadoop setup h USAGE cm local apps cluster tools bin cm hadoop setup c lt filename gt u lt name gt h OPTIONS c lt filename gt Hadoop config file to use u lt name gt uninstall Hadoop instance h show usage EXAMPLES cm hadoop setup c tmp config xml cm hadoop setup u foo cm hadoop setup no options a gui will be started Some sample configuration files are provided in the directory cm local apps cluster tools hadoop conf hadooplconf xml for Hadoop 1 x hadoop2conf xml for Hadoop 2 x hadoop2haconf xml for Hadoop 2 x with High Availability x with Lustre support 1 2 hadoop2fedconf xml for Hadoop 2 x with NameNode federation 2 2 hadoop2lustreconf xml for Hadoop Bright Computing Inc Installing Hadoop 2 1 2 An Install Run An XML template can be used based on the examples in the directory cm local apps cluster tools hadoop conf In the XML template the path for a tarball component is enclosed by lt archive gt lt archive gt tag pairs The tarball components can be as indicated e lt archive gt hadoop tarball lt archive gt e lt archive gt hba
31. gz Spark release 1 3 0 bin hadoop2 4 Found Hadoop instance hdfsl release 2 6 0 Spark will be installed in YARN client cluster mode Spark being installed done Creating directories for Spark done Creating module file for Spark done Creating configuration files for Spark done Waiting for NameNode to be ready done Copying Spark assembly jar to HDFS done Waiting for NameNode to be ready done Validating Spark setup done Installation successfully completed Finished An Example Run With cm spark setup in standalone mode Example root bright70 cm spark setup i hdfs1 j usr lib jvm jre 1 7 0 openjdk x86_64 t tmp spark 1 3 0 bin hadoop2 4 tgz standalon master node005 Spark release 1 3 0 bin hadoop2 4 Found Hadoop instance hdfsl release 2 6 0 Spark will be installed to work in Standalone mode Spark Master service will be run on node node005 Spark will use all DataNodes as WorkerNodes Spark being installed done Creating directories for Spark done Creating module file for Spark done Creating configuration files for Spark done Updating images done Initializing Spark Master service done Initializing Spark Worker service done Validating Spark setup done O Bright Computing Inc 6 6 Sqoop 41 Installation successfully completed Finished 6 5 2 Spark Removal With cm spark setup cm spark setup should also b
32. igns no roles to nodes e Hive executables are copied by the script to a subdirectory under cm shared hadoop e Hive configuration files are copied by the script to under etc hadoop e The instance of MySQL on the head node is initialized as the Metastore database for the Bright Cluster Manager by the script A different MySQL server can be specified by using the options mysqlserver and mysqlport e The data warehouse is created by the script in HDFS in user hive warehouse e The Metastore and HiveServer2 services are started up by the script e Validation tests are carried out by the script using hive and beeline An Example Run With cm hive setup Example root bright70 cm hive setup i hdfsl j usr lib jvm jre 1 7 0 opl enjdk x86_64 p lt hivepass gt metastoredb lt metastoredb gt t tmp apachel hive 1 1 0 bin tar gz master node005 Hive release 1 1 0 bin Using MySQL server on active headnode Hive service will be run on node node005 Using MySQL Connector J installed in usr share java Hive being installed done Creating directories for Hive done Creating module file for Hive done Creating configuration files for Hive done Initializing database metastore_hdfs1 in MySQL done Waiting for NameNode to be ready done Creating HDFS directories for Hive done Updating images done Waiting for NameNode to be ready done Hive setup validation
33. in a default Na meNode federation user tmp staging hbase User applications do not have to know this mapping This is because ViewFS on the client side maps the selected path to the corresponding NameNode Thus for example hdfs 1s tmp example does not need to know that tmp is managed by another NameNode Cloudera advise against using NameNode Federation for produc tion purposes at present due to its development status 2 4 Installing Hadoop With Lustre The Lustre filesystem has a client server configuration 2 4 1 Lustre Internal Server Installation The procedure for installing a Lustre server varies The Bright Cluster Manager Knowledge Base article describes a procedure that uses Lustre sources from Whamcloud The article is at http kb brightcomputing com faq index php action artikel amp cat 9 amp id 176 and may be used as guidance O Bright Computing Inc 2 4 Installing Hadoop With Lustre 11 2 4 2 Lustre External Server Installation Lustre can also be configured so that the servers run external to Bright Cluster Manager The Lustre Intel IEEL 2 x version can be configured in this manner 2 4 3 Lustre Client Installation It is preferred that the Lustre clients are installed on the head node as well as on all the nodes that are to be Hadoop nodes The clients should be configured to provide a Lustre mount on the nodes If the Lustre client cannot be installed on the head node then Brigh
34. ion And What Spark Installation Does The following applies to using cm spark setup e A Hadoop instance must already be installed e Spark can be installed in two different deployment modes Stan dalone or YARN Standalone mode This is the default for Apache Hadoop 1 x Cloudera CDH 4 x and Hortonworks HDP 1 3 x It is possible to force the Standalone mode deployment by using the additional option standalone When installing in standalone mode the script installs Spark on the active head node and on the DataNodes of the cho sen Hadoop instance Bright Computing Inc 40 Hadoop related Projects The Spark Master service runs on the active head node by default but can be specified to run on another node by using the option master Spark Worker services run on all DataNodes YARN mode This is the default for Apache Hadoop 2 x Cloud era CDH 5 x Hortonworks 2 x and Pivotal 2 x The default can be overridden by using the standalone option When installing in YARN mode the script installs Spark only on the active head node e The script assigns no roles to nodes e Spark is copied by the script to a subdirectory under cm shared hadoop e Spark configuration files are copied by the script to under et c hadoop An Example Run With cm spark setup in YARN mode Example root bright70 cm spark setup i hdfsl Y j usr lib jvm jre 1 7 0 openjdk x86_64 t tmp spark 1 3 0 bin hadoop2 4 t
35. it a run from a pi value estimator from the example jar file as follows some output elided Example fred bright70 module add hadoop Apache220 Apache 2 2 0 fred bright70 hadoop jar HADOOP_PREFIX share hadoop mapreduce hadol op mapreduc xamples 2 2 0 jar pi 1 5 Job Finished in 19 732 seconds Estimated value of Pi is 4 00000000000000000000 The module add line is not needed if the user has the module loaded by default section 2 2 3 of the Administrator Manual The input takes the number of maps and number of samples as options 1 and 5 in the example The result can be improved with greater values for both Bright Computing Inc Hadoop related Projects Several projects use the Hadoop framework These projects may be focused on data warehousing data flow programming or other data processing tasks which Hadoop can handle well Bright Cluster Manager provides tools to help install the following projects e Accumulo section 6 1 e Hive section 6 2 Kafka section 6 3 Pig section 6 4 Spark section 6 5 Sqoop section 6 6 Storm section 6 7 6 1 Accumulo Apache Accumulo is a highly scalable structured distributed key value store based on Google s BigTable Accumulo works on top of Hadoop and ZooKeeper Accumulo stores data in HDES and uses a richer model than regular key value stores Keys in Accumulo consist of several elements An Accumulo instance includes the followin
36. ith releases 5 1 18 or earlier of this package If mysql connector java provides a newer release then the following must be done to ensure that Hive setup works a suitable 5 1 18 or earlier release of Connector J is downloaded from http dev mysql com downloads connector j cm hive setup is run with the conn option to specify the connector version to use Example conn tmp mysql connector Java 5 1 18 bin jar e Before running the script the following statements must be exe cuted explicitly by the administrator using a MySQL client GRANT ALL PRIVILEGES ON lt metastoredb gt TO hive IDENTIFIED BY lt hivepass gt FLUSH PRIVILEGES DROP DATABASE IF EXISTS lt metastoredb gt In the preceding statements lt metastoredb gt is the name of metastore database to be used by Hive The same name is used later by cm hive setup Bright Computing Inc 6 2 Hive 35 lt hivepass gt is the password for hive user The same password is used later by cm hive setup The DROP line is needed only if a database with that name already exists e The cm hive setup script installs Hive by default on the active head node It can be installed on another node instead as shown in the next example with the use of the master option In that case Con nector J should be installed in the software image of the node e The script ass
37. ity used 7 246MB Capacity remaining 16 41GB Heap memory total 280 7MB Heap memory used 152 1MB Heap memory remaining 128 7MB Non heap memory total 258 1MB Non heap memory used 251 9MB Non heap memory remaining 6 155MB Nodes available 3 Nodes dead 0 Nodes decommissioned 0 Nodes decommission in progress 0 Total files 72 Total blocks 31 Missing blocks 0 Bright Computing Inc 5 2 Example End User Job Run 29 Under replicated blocks Scheduled replication blocks Pending replication blocks Block report average Time Applications pending Applications submitted 2 0 0 5 Applications running 1 0 7 Applications completed 6 0 Applications failed High availability Yes automatic failover disabled Federation setup no Role Node DataNode Journal NameNode YARNClient YARNServer ZooKeeper node001 DataNode Journal NameNode YARNClient ZooKeeper node002 5 2 Example End User Job Run Running a job from a jar file individually can be done by an end user An end user fred can be created and issued a password by the ad ministrator Chapter 6 of the Administrator Manual The user must then be granted HDES access for the Hadoop instance by the administrator Example bright70 gt user fred 1 set hadoophdfsaccess apache220 commit The possible instance options are shown as tab completion suggestions The access can be unset by leaving a blank for the instance option The user fred can then subm
38. ker Apache121 M Advanced T YARN Server YARN Client Figure 4 1 Services In The Hadoop Tab Of A Node In cmgui The services displayed are Name Node Secondary Name Node e Data Node Zookeeper e Job Tracker Task Tracker e YARN Server YARN Client e HBase Server HBase Client e Journal For each Hadoop service instance values can be edited via actions in the associated pane O Bright Computing Inc 24 Hadoop Services e Selecting A Hadoop instance The name of the possible Hadoop instances can be selected via a drop down menu e Configuring multiple Hadoop instances More than one Hadoop instance can run on the cluster Using the button adds an in stance e Advanced options for instances The Advanced button allows many configuration options to be edited for each node instance for each service This is illustrated by the example in figure 4 2 where the advanced configuration options of the Data Node service of the node001 instance are shown Head Nodes jj Racks Chassis Virtual SM Nodes e node001 E node002 E node003 node004 y node005 y node006 y nodeoo7 59 node008 E Cloud No IPC port 50020 FMIC Nodes E GPU Units j Other Dev 0 HTTPS port 50475 H Node Gro J Hadoop H Transceiver port 50010 C Name Node Secondary Name Node v Data Node ZooKeeper Apache121 X Advanced Apache121 w Advancec 1 Job Tracker
39. lo system user Example root bright70 su accumulo bash 4 1 module load accumulo hdfsl bash 4 1 cd SACCUMULO_HOME bash 4 1 bin tool sh lib accumulo examples simple jar O Bright Computing Inc Hadoop related Projects org apache accumulo examples simple mapreduce TeraSortIngest i hdfsl z ACCUMULO_ZOOKEEPERS u root p secret count 10 minKeySize 10 maxKeySize 10 minValueSize 78 maxValueSize 78 table sort splits 10 6 2 Hive Apache Hive is a data warehouse software It stores its data using HDFS and can query it via the SQL like HiveQL language Metadata values for its tables and partitions are kept in the Hive Metastore which is an SQL database typically MySQL or Postgres Data can be exposed to clients using the following client server mechanisms e Metastore accessed with the hive client e HiveServer2 accessed with the beeline client The Apache Hive tarball should be downloaded from one of the locations specified in Section 1 2 depending on the chosen distribution 6 2 1 Hive Installation With cm hive setup Bright Cluster Manager provides cm hive setup to carry out Hive in stallation Prerequisites For Hive Installation And What Hive Installation Does The following applies to using cm hive setup e A Hadoop instance must already be installed e Before running the script the version of the mysql connector java package should be checked Hive works w
40. m Installation With cm storm setup 6 7 2 Storm Removal With cm storm setup 0 7 3 MUSSO o fn ie ete at ke y 27 27 29 31 31 32 33 Preface Welcome to the Hadoop Deployment Manual for Bright Cluster Manager 7 0 0 1 About This Manual This manual is aimed at helping cluster administrators install under stand configure and manage the Hadoop capabilities of Bright Cluster Manager The administrator is expected to be reasonably familiar with the Bright Cluster Manager Administrator Manual 0 2 About The Manuals In General Regularly updated versions of the Bright Cluster Manager 7 0 manuals are available on updated clusters by default at cm shared docs cm The latest updates are always online at http support brightcomputing com manuals e The Installation Manual describes installation procedures for a basic cluster e The Administrator Manual describes the general management of the cluster e The User Manual describes the user environment and how to submit jobs for the end user The Cloudbursting Manual describes how to deploy the cloud capa bilities of the cluster The Developer Manual has useful information for developers who would like to program with Bright Cluster Manager e The OpenStack Deployment Manual describes how to deploy Open Stack with Bright Cluster Manager e The Hadoop Deployment Manual describes how to deploy Hadoop with Bright Cluster Manager The UCS Deployment Manual desc
41. mation on the system resources that are used Example bright70 gt hadoop overview apachel21 Parameter Value Name apachel21 Capacity total OB Capacity used OB Capacity remaining OB Heap memory total OB Bright Computing Inc 20 Viewing And Managing HDFS From CMDaemon Heap memory used OB Heap memory remaining OB Non heap memory total OB Non heap memory used OB Non heap memory remaining OB Nodes available 0 Nodes dead 0 Nodes decommissioned 0 Nodes decommission in progress 0 Total files Total blocks issing blocks Under replicated blocks Scheduled replication blocks Pending replication blocks Block report average Time Applications running Applications pending Applications submitted Applications completed 4 oOo oO oO coco co CA oe A a Applications failed Federation setup O Role ode DataNode ZooKeeper bright70 node001 node002 3 2 2 The Tasks Commands Within hadoop mode the following commands run tasks that correspond mostly to the Tasks tab section 3 1 3 in the cmgui Hadoop resource The services Commands For Hadoop Services Hadoop services can be started stopped and restarted with e restartallservices e startallservices e stopallservices Example bright70 gt hadoop restartallservices apachel21 Will now stop all Hadoop services for instance apachel21 done Will now start all Hadoop services for instance apachel21 done bright70
42. messaging system Among other usages Kafka is used as a replacement for message broker for website activity tracking for log aggregation The Apache Kafka tar ball should be downloaded from http kafka apache org where different pre built tarballs are available depeding on the preferred Scala version 6 3 1 Kafka Installation With cm kafka setup Bright Cluster Manager provides cm kafka setup to carry out Kafka installation Prerequisites For Kafka Installation And What Kafka Installation Does The following applies to using cm kafka setup O Bright Computing Inc 6 4 Pig 37 A Hadoop instance with ZooKeeper must already be installed cm kafka setup installs Kafka only on the ZooKeeper nodes The script assigns no roles to nodes Kafka is copied by the script to a subdirectory under cm shared hadoop Kafka configuration files are copied by the script to under etc hadoop An Example Run With cm kafka setup Example root bright70 cm kafka setup i hdfsl j usr lib jvm jre 1 7 0 open jdk x86_64 t tmp kafka_2 11 0 8 2 1 tgz Kafka release 0 8 2 1 for Scala 2 11 Found Hadoop instance hdfsl release 1 2 1 Kafka being installed done Creating directories for Kafka done Creating module file for Kafka done Creating configuration files for Kafka done Updating images done Initializing services for Kafka on ZooKeeper nodes done Executing validation test done
43. n Apache Hadoop 2 2 0 It runs fully integrated on Bright Cluster Manager but is not available as a pack age in the Bright Cluster Manager repositories This is because Pivotal prefers to distribute the software itself The user can download the soft ware from https network pivotal io products pivotal hd after registering with Pivotal 1 3 Further Documentation Further documentation is provided in the installed tarballs of the Hadoop version after the Bright Cluster Manager installation Chapter 2 has been carried out A starting point is listed in the table below Hadoop version Path under cm shared apps hadoop Apache 1 2 1 Apache 1 2 1 docs index html Apache 2 2 0 share doc hadoop Apache 2 2 0 tad il index html Apache 2 6 0 Apache 2 6 0 share doc hadoop index html Cloudera 2 0 0 cdh4 7 1 share Cloudera CDH 4 71 7 E doc index html Cloudera 2 5 0 cdh5 3 2 share Cloudera CDH 5 3 2 bie doc index html Online documentation is available at http Hortonworks HDP p docs hortonworks com Bright Computing Inc Installing Hadoop In Bright Cluster Manager a Hadoop instance can be configured and run either via the command line section 2 1 or via an ncurses GUI sec tion 2 2 Both options can be carried out with the cm hadoop setup script which is run from a head node The script is a part of the cluster tools package and uses tarballs from the Apache Hadoop project The Bright Cluster Mana
44. nstance name is also displayed within cmsh when the 1ist com mand is run in hadoop mode Example root bright70 cmsh bright70 hadoop bright70 gt hadoop list Name key Hadoop version Hadoop distribution Configuration directory Myhadoop lezy Apache etc hadoop Myhadoop The instance can be removed as follows Example root bright70 cm hadoop setup u Myhadoop Requested uninstall of Hadoop instance Myhadoop Uninstalling Hadoop instance Myhadoop Removing etc hadoop Myhadoop var lib hadoop Myhadoop var log hadoop Myhadoop var run hadoop Myhadoop tmp hadoop Myhadoop etc hadoop Myhadoop zookeeper var lib zookeeper Myhadoop var log zookeeper Myhadoop var run zookeeper Myhadoop etc hadoop Myhadoop hbase var log hbase Myhadoop var run hbase Myhadoop etc init d hadoop Myhadoop Module file s deleted Uninstall successfully completed 2 2 Ncurses installation Of Hadoop Using cm hadoop setup Running cm hadoop setup without any options starts up an ncurses GUI figure 2 2 Bright Computing Inc 2 3 Avoiding Misconfigurations During Hadoop Installation Xrxvt o000 Welcome to the Bright Cluster Manager Hadoop Setup utility Add Hadoop instance Remove Remove Hadoop instance Help cm hadoop setup help lt Quit gt Figure 2 2 The cm hadoop setup Welcome Screen This provides an interactive way to add and remove Hadoop instances along with
45. ons act just like for any other service section 3 11 of the Administrator Manual which means it includes the possibility of acting on multiple service selections e The Tasks tab of the Hadoop instance Within the Hadoop HDF resource for a specific Hadoop instance the Tasks tab sec tion 3 1 3 conveniently allows all Hadoop daemons to be re started and stopped directly via buttons Bright Computing Inc Running Hadoop Jobs 5 1 Shakedown Runs The cm hadoop tests sh script is provided as part of Bright Clus ter Manager s cluster tools package The administrator can use the script to conveniently submit example jar files in the Hadoop installation to a job client of a Hadoop instance root bright70 cd cm local apps cluster tools hadoop root bright70 hadoop cm hadoop tests sh lt instance gt The script runs endlessly and runs several Hadoop test scripts If most lines in the run output are elided for brevity then the structure of the truncated output looks something like this in overview Example root bright70 hadoop cm hadoop tests sh apache220 Press CTRL C to stop start cleaning directories clean directories don start doing gen_test 14 03 24 15 05 37 INFO terasort TeraSort Generating 10000 using 2 14 03 24 15 05 38 INFO mapreduce JobSubmitter number of splits 2 Job Counters Map Reduce Framework O Bright Computing Inc
46. oot gt lt fsroot gt tag pair This was set to mnt lust re hadoop earlier on as an ex ample e Copy the Lustre plugin from its jar path as specified in the XML file to the correct place on the client nodes Specifically the subdirectory share hadoop common lib is copied into a directory relative to the Hadoop installation direc tory For example the Cloudera version of Hadoop version 2 30 cdh5 1 2 has the Hadoop installation directory cm share apps hadoop Clouder 2 3 0 cdh5 1 2 The copy is therefore car ried out in this case from root lustre hadoop lustre plugin 2 3 0 jar to cm shared apps hadoop Cloudera 2 3 0 cdh5 1 2 share hadoop common 1lib Bright Computing Inc 2 5 Hadoop Installation In A Cloud 13 Lustre Hadoop Integration In cmsh and cmgui In cmsh Lustre integration is indicated in hadoop mode Example hadoop2 gt hadoop show hdfsl grep i lustre Hadoop root for Lustre mnt lustre hadoop Use Lustre yes In cmgui the Overview tab in the items for the Hadoop HDFS re source indicates if Lustre is running along with its mount point fig ure 2 3 File Monitoring Eilter View Bookmarks Help Debug RESOURCES A hdfs1 E Bright trunk Cluster v Bright trunk Cluster a gt y Switches gt G Networks Nodes 340409098 Total capacity 9 8 gt C Power Distribution Units f olaa 15 gt E Software Images PPS Luu a uy N gt C Node Categories Apps pending 0 Remaining capa
47. pachel21 Will now format and set up HDFS for instance apachel21 Stopping datanodes done Stopping namenodes done Formatting HDFS done Starting namenode host bright70 done Starting datanodes done Waiting for datanodes to come up done Setting up HDFS done bright 70 gt hadoop The manualfailover Command Usage manualfailover f from lt NameNode gt t to lt other Na meNode gt lt HDFS gt The manualfailover command allows the active status of a Na meNode to be moved to another NameNode in the Hadoop instance This is only available for Hadoop instances that support NameNode failover The active status can be move from and or to a NameNode Bright Computing Inc Hadoop Services 4 1 Hadoop Services From cmgui 4 1 1 Configuring Hadoop Services From cmgui Hadoop services can be configured on head or regular nodes Configuration in cmgui can be carried out by selecting the node in stance on which the service is to run and selecting the Hadoop tab of that node This then displays a list of possible Hadoop services within sub panes figure 4 1 View Bookmarks Help node001 iii Bright trunk Cluster agement Network Setup Static Routes FS Mounts FS Exports Roles Hadoop Disk Setup RAID Setup G Data Node v ZooKeeper Apache121 Advanced Apache121 x Advanced A O 7 Job Tracker v Task Trac
48. ribes how to deploy the Cisco UCS server with Bright Cluster Manager If the manuals are downloaded and kept in one local directory then in most pdf viewers clicking on a cross reference in one manual that refers to a section in another manual opens and displays that section in the sec ond manual Navigating back and forth between documents is usually possible with keystrokes or mouse clicks For example lt A1t gt lt Backarrow gt in Acrobat Reader or clicking on the bottom leftmost navigation button of xpdf both navigate back to the previous document iv Table of Contents The manuals constantly evolve to keep up with the development of the Bright Cluster Manager environment and the addition of new hard ware and or applications The manuals also regularly incorporate cus tomer feedback Administrator and user input is greatly valued at Bright Computing So any comments suggestions or corrections will be very gratefully accepted at manuals brightcomputing com 0 3 Getting Administrator Level Support Unless the Bright Cluster Manager reseller offers support sup port is provided by Bright Computing over e mail via support brightcomputing com Section 10 2 of the Administrator Manual has more details on working with support Introduction 1 1 What Is Hadoop About Hadoop is the core implementation of a technology mostly used in the analysis of data sets that are large and unstructured in comparison with the data store
49. s are run on the head node by default Example root bright70 cm accumulo setup i hdfs1 j usr lib jvm jre 1 7 0 openjdk x86_64 p lt rootpass gt s lt heapsize gt t tmp accumulo 1 6 2 bin tar gz master node005 Accumulo release 1 6 2 Accumulo GC Master Monitor and Tracer services will be run on node node005 Found Hadoop instance hdfsl release 2 6 0 Accumulo being installed done Creating directories for Accumulo done Creating module file for Accumulo done Creating configuration files for Accumulo done Updating images done Setting up Accumulo directories in HDFS done Executing accumulo init done Initializing services for Accumulo on DataNodes done Initializing master services for Accumulo done Waiting for NameNode to be ready done Executing validation test done Installation successfully completed Finished 6 1 2 Accumulo Removal With cm accumulo setup cm accumulo setup should also be used to remove the Accumulo in stance Data and metadata will not be removed Example root bright70 cm accumulo setup u hdfsl Requested removal of Accumulo for Hadoop instance hd sl Stopping removing services done Removing module file done Removing additional Accumulo directories done Updating images done Removal successfully completed Finished 6 1 3 Accumulo MapReduce Example Accumulo jobs must be run using accumu
50. s are copied by the script to under etc hadoop An Example Run With cm pig setup Example root bright70 cm pig setup i hdfsl j usr lib jvm jre 1 7 0 openl jdk x86_64 t tmp pig 0 14 0 tar gz Pig release 0 14 0 Pig being installed done Creating directories for Pig done Creating module file for Pig done Creating configuration files for Pig done Waiting for NameNode to be ready Waiting for NameNode to be ready done Validating Pig setup Validating Pig setup done Installation successfully completed Finished 6 4 2 Pig Removal With cm pig setup cm pig setup should also be used to remove the Pig instance Example root bright70 cm pig setup u hdfsl Requested removal of Pig for Hadoop instance hdfs1 Stopping removing services done Removing module file done Removing additional Pig directories done Updating images done Removal successfully completed Finished 6 4 3 Using Pig Pig consists of an executable pig that can be run after the user loads the corresponding module Pig runs by default in MapReduce Mode that is it uses the corresponding HDFS installation to store and deal with the elaborate processing of data More thorough documentation for Pig can be found at http pig apache org docs r0 14 0 start html Pig can be used in interactive mode using the Grunt shell root bright70 module load hadoop hdfs1l root bright70 mo
51. s com Hortonworks Data Platform HDP is a fully open source Hadoop suite 4 Pivotal HD Pivotal Hadoop Distribution is a completely Apache compliant distribution with extensive analytic toolsets The ISO image for Bright Cluster Manager available at http www brightcomput ing com Download can include Hadoop for all 4 im plementations During installation from the ISO the administrator can choose which implementation to install section 3 3 14 of the Installation Manual The contents and versions of the Hadoop implementations provided by Bright Computing at the time of writing April 2015 are as follows e Apache packaged as cm apache hadoop by Bright Computing This provides Hadoop versions 1 2 1 2 2 0 and 2 6 0 HBase version 0 98 11 for Hadoop 1 2 1 HBase version 1 0 0 for Hadoop 2 2 0 and Hadoop 2 6 0 ZooKeeper version 3 4 6 e Cloudera packaged as cm cloudera hadoop by Bright Comput ing This provides CDH 4 7 1 based on Apache Hadoop 2 0 HBase 0 94 15 and Zookeeper 3 4 5 CDH 5 3 2 based on Apache Hadoop 2 5 0 HBase 0 98 6 and ZooKeeper 3 4 5 O Bright Computing Inc 1 3 Further Documentation e Hortonworks packaged as cm hortonworks hadoop by Bright Computing This provides HDP 1 3 9 based on Apache Hadoop 1 2 0 HBase 0 94 6 and ZooKeeper 3 4 5 HDP 2 2 0 based on Apache Hadoop 2 6 0 HBase 0 98 4 and ZooKeeper 3 4 6 Pivotal Pivotal HD version 2 1 0 is based o
52. se tarball lt archive gt e lt archive gt zookeeper tarball lt archive gt The tarball components can be picked up from URLs as listed in sec tion 1 2 The paths of the tarball component files that are to be used should be set up as needed before running cm hadoop setup The downloaded tarball components should be placed in the tmp directory if the default definitions in the default XML files are used Example root bright70 cd cm local apps cluster tools hadoop conf root bright70 conf grep archive gt hadooplconf xml grep o x gz tmp hadoop 1 2 1 tar gz tmp zookeeper 3 4 6 tar gz tmp hbase 0 98 11 hadoopl bin tar gz Files under tmp are not intended to stay around permanently The administrator may therefore wish to place the tarball components in a more permanent part of the filesystem instead and change the XML def initions accordingly A Hadoop instance name for example Myhadoop can also be defined in the XML file within the lt name gt lt name gt tag pair Hadoop NameNodes and SecondaryNameNodes handle HDFS meta data while DataNodes manage HDFS data The data must be stored in the filesystem of the nodes The default path for where the data is stored can be specified within the lt dataroot gt lt dataroot gt tag pair Mul tiple paths can also be set using comma separated paths NameNodes SecondaryNameNodes and DataNodes each use the value or values set within the lt dataroot gt lt da
53. sent the latest Sqoop stable release is 1 4 5 while the latest Sqoop2 version is 1 99 5 Sqoop2 is incompatible with Sqoop it is not feature complete and it is not yet intended for production use The Bright Computing uti tility cm sqoop setup does not as yet support Sqoop2 6 6 1 Sqoop Installation With cm sqoop setup Bright Cluster Manager provides cm sqoop setup to carry out Sqoop installation Bright Computing Inc 42 Hadoop related Projects Prerequisites For Sqoop Installation And What Sqoop Installation Does The following requirements and conditions apply to running the cm sqoop setup script e A Hadoop instance must already be installed e Before running the script the version of the mysql connector java package should be checked Sqoop works with releases 5 1 34 or later of this package If mysql connector java provides a newer release then the following must be done to ensure that Sqoop setup works a suitable 5 1 34 or later release of Connector J is downloaded from http dev mysql com downloads connector j cm sqoop setup is run with the conn option in order to specify the connector version to be used Example conn tmp mysql connector java 5 1 34 bin jar e The cm sqoop setup script installs Sqoop only on the active head node A different node can be specified by using the option master e The script assigns no roles to nodes e Sqoop executables are copied by the script
54. sing Storm can process streams of data Other parallels between Hadoop and Storm e users run jobs in Hadoop and topologies in Storm e the master node for Hadoop jobs runs the JobTracker or Re sourceManager daemons to deal with resource management and scheduling while the master node for Storm runs an analogous dae mon called Nimbus e each worker node for Hadoop runs daemons called TaskTracker or NodeManager while the worker nodes for Storm runs an anal ogous daemon called Supervisor e both Hadoop in the case of NameNode HA and Storm leverage ZooKeeper for coordination 6 7 1 Storm Installation With cm storm setup Bright Cluster Manager provides cm storm setup to carry out Storm installation Prerequisites For Storm Installation And What Storm Installation Does The following applies to using cm storm setup e A Hadoop instance with ZooKeeper must already be installed e The cm storm setup script only installs Storm on the active head node and on the DataNodes of the chosen Hadoop instance by de fault A node other than master can be specified by using the option master or its alias for this setup script nimbus e The script assigns no roles to nodes e Storm executables are copied by the script to a subdirectory under cm shared hadoop e Storm configuration files are copied by the script to under etc hadoop This is done both on the active headnod
55. t Cluster Manager has the following limitations during installation and maintenance e the head node cannot be used to run Hadoop services e end users cannot perform Hadoop operations such as job submis sion on the head node Operations such as those should instead be carried out while logged in to one of the Hadoop nodes In the remainder of this section a Lustre mount point of mnt lustre is assumed but it can be set to any convenient directory mount point The user IDs and group IDs of the Lustre server and clients should be consistent It is quite likely that they differ when first set up The IDs should be checked at least for the following users and groups e users hdfs mapred yarn hbase zookeeper e groups hadoop zookeeper hbase If they do not match on the server and clients then they must be made consistent manually so that the UID and GID of the Lustre server users are changed to match the UID and GID of the Bright Cluster Manager users Once consistency has been checked and read write access is working to LustreFS the Hadoop integration can be configured 2 4 4 Lustre Hadoop Configuration Lustre Hadoop XML Configuration File Setup Hadoop integration can be configured by using the file cm local apps cluster tools hadoop conf hadoop2lustreconf xml as a Starting point for the configuration It can be copied over to for example root hadooplustreconfig xml The Intel Distribution for Hadoop IDH and Cloudera
56. taroot gt tag pair for their root directories If needed more specific tags can be used for each node type This is useful in the case where hardware differs for the various node types For example e a NameNode with 2 disk drives for Hadoop use e a DataNode with 4 disk drives for Hadoop use The XML file used by cm hadoop setup can in this case use the tag pairs e lt namenodedatadirs gt lt namenodedatadirs gt e lt datanodedatadirs gt lt datanodedatadirs gt If these are not specified then the value within the lt dataroot gt lt dataroot gt tag pair is used Bright Computing Inc 2 1 Command line Installation Of Hadoop Using cm hadoop setup c lt filename gt Example e lt namenodedatadirs gt datal data2 lt namenodedatadirs gt e lt datanodedatadirs gt datal data2 data3 data4 lt datanodedatadirs gt Hadoop should then have the following dfs name dir proper ties added to it via the hdfs site xml configuration file For the pre ceding tag pairs the property values should be set as follows Example e dfs namenode name dir with values datal hadoop hdfs namenode data2 hadoop hdfs namenode e dfs datanode name dir with values datal hadoop hdfs datanode data2 hadoop hdfs datanode data3 hadoop hdfs datanode data4 hadoop hdfs datanode An install run then displays output like the following Example rw r r 1 root root 63851630 Feb 4 15 13 hadoop 1 2 1 tar gz root bright70
57. to a subdirectory under cm shared hadoop e Sqoop configuration files are copied by the script and placed under etc hadoop e The Metastore service is started up by the script An Example Run With cm sqoop setup Example root bright70 cm sqoop setup i hdfsl j usr lib jvm jre 1 7 0 opl enjdk x86_64 t tmp sqoop 1 4 5 bin__hadoop 2 0 4 alpha tar gzl conn tmp mysql connector Jjava 5 1 34 bin jar master node005 Using MySQL Connector J from tmp mysql connector java 5 1 34 bin jar Sqoop release 1 4 5 bin__hadoop 2 0 4 alpha Sqoop service will be run on node node005 Found Hadoop instance hdfsl release 2 2 0 Sqoop being installed done Creating directories for Sqoop done Creating module file for Sqoop done Creating configuration files for Sqoop done Updating images done Installation successfully completed Finished Bright Computing Inc 6 7 Storm 43 6 6 2 Sqoop Removal With cm sqoop setup cm sqoop setup should be used to remove the Sqoop instance Example root bright70 cm sqoop setup u hdfsl Requested removal of Sqoop for Hadoop instance hdfs1 Stopping removing services done Removing module file done Removing additional Sqoop directories done Updating images done Removal successfully completed Finished 6 7 Storm Apache Storm is a distributed realtime computation system While Hadoop is focused on batch proces

Hadoop Deployment Manual - Support

Contents

Download Pdf Manuals

Related Search

Related Contents