配置三台机器的集群信息如下:
各台机器的IP地址及HostName
Master CentOS 6.5 64位 Hadoop的NameNode和JobTracker 192.168.133.150 user: hadoop passwd: 151warden
Salve1 CentOS 6.5 64位 Hadoop的DataNode和TaskTracker 192.168.133.151
Salve2 CentOS 6.5 64位 Hadoop的DataNode和TaskTracker 192.168.133.152
安装vim openssh-server
yum -y install vim openssh-server
分别登陆三台服务器,配置HostName
192.168.133.150 vim /etc/sysconfig/network 修改 HOSTNAME=Master.Hadoop
192.168.133.151 HOSTNAME=Slave1.Hadoop
192.168.133.152 HOSTNAME=Slave2.Hadoop
此时ping ip地址可以ping通所有服务器,但是ping HOSTNAME无法ping通,所以修改路由表
vim /etc/hosts # 在末尾追加如下内容
Hadoop IP Config
192.168.133.150 Master.Hadoop
192.168.133.151 Slave1.Hadoop
192.168.133.152 Slave2.Hadoop
在三台机器上都安装wget
yum -y install wget
在各个节点之间配置通过sshkey免密码登陆,从Master到各个Slave,以及从Slave到Master
配置自动登陆
生成sshkey
root@localhost .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
84:11:8c:d4:f1:e6:37:ed:fd:eb:7b:0a:55:39:f0:96 root@localhost
The key’s randomart image is:
+–[ RSA 2048]—-+
| ..++o . |
| . o+ o o|
| . + E.|
| + . …|
| S o . . |
| . o o |
| o . |
| . ..|
| o==|
+—————–+
查看生成结果如下:
[root@localhost .ssh]# ls
id_rsa id_rsa.pub
放入authorized_keys当中
[root@localhost .ssh]# cat id_rsa.pub >> authorized_keys
[root@localhost .ssh]# ls
authorized_keys id_rsa id_rsa.pub
分发到各个服务器Slave节点
[root@localhost .ssh]# scp authorized_keys root@Slave1.Hadoop:/root/.ssh/
The authenticity of host ‘slave1.hadoop (192.168.133.151)’ can’t be established.
RSA key fingerprint is e2:a1:50:be:e2:54:62:88:da:6d:2c:1d:08:05:bb:5a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘slave1.hadoop,192.168.133.151’ (RSA) to the list of known hosts.
root@slave1.hadoop‘s password:
authorized_keys 100% 396 0.4KB/s 00:00
验证无密码登陆
[root@localhost .ssh]# ssh root@Slave1.Hadoop
Last login: Sat Mar 21 07:26:16 2015 from 192.168.133.1
Slave到Master的sshkey配置
各个节点生成sshkey
sshkey-gen -t rsa
上传到master服务器并且将生成的rsa_id.pub加入到master的authorized_keys
scp id_rsa.pub root@Master.Hadoop:~
cat id_rsa.pub >> ~/.ssh/authorized_keys
下载Hadoop&JDK并且上传到各个服务器
mkdir -p hadoop/hadoop-application
mkdir -p hadoop/workspace
mv hadoop-2.5.2.tar.gz hadoop/hadoop-application/
mv jdk-7u51-linux-x64.gz hadoop/hadoop-application/
安装Hadoop和JDK 分别安装在/opt/hadoop/目录下
tar -zxvf hadoop-2.5.2.tar.gz
tar -zxvf jdk-7u51-linux-x64.gz
mkdir /opt/hadoop/
mv hadoop-2.5.2 /opt/hadoop/
mv jdk1.7.0_51/ /opt/hadoop/
配置JDK环境变量和Hadoop环境变量
vim /etc/profile
#set java environment
export JAVA_HOME=/opt/hadoop/jdk1.7.0_51/
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
set Hadoop BIN PATH
export HADOOP_INSTALL=/opt/hadoop/hadoop-2.5.2/
export PATH=$PATH:$HADOOP_INSTALL/bin
配置成功的测试
编译环境变量
[root@localhost hadoop-application]# source /etc/profile
java -version查看版本
[root@localhost hadoop-application]# java -version
java version “1.7.0_51”
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
查看javap
[root@localhost hadoop-application]# javap java.lang.Object
Compiled from “Object.java”
public class java.lang.Object {
public java.lang.Object();
public final native java.lang.Class<?> getClass();
public native int hashCode();
public boolean equals(java.lang.Object);
protected native java.lang.Object clone() throws java.lang.CloneNotSupportedException;
public java.lang.String toString();
public final native void notify();
public final native void notifyAll();
public final native void wait(long) throws java.lang.InterruptedException;
public final void wait(long, int) throws java.lang.InterruptedException;
public final void wait() throws java.lang.InterruptedException;
protected void finalize() throws java.lang.Throwable;
static {};
}
查看javac
[root@localhost hadoop-application]# javac
用法: javac
其中, 可能的选项包括:
-g 生成所有调试信息
-g:none 不生成任何调试信息
-g:{lines,vars,source} 只生成某些调试信息
-nowarn 不生成任何警告
-verbose 输出有关编译器正在执行的操作的消息
-deprecation 输出使用已过时的 API 的源位置
-classpath <路径> 指定查找用户类文件和注释处理程序的位置
-cp <路径> 指定查找用户类文件和注释处理程序的位置
-sourcepath <路径> 指定查找输入源文件的位置
-bootclasspath <路径> 覆盖引导类文件的位置
-extdirs <目录> 覆盖所安装扩展的位置
-endorseddirs <目录> 覆盖签名的标准路径的位置
-proc:{none,only} 控制是否执行注释处理和/或编译。
-processor
-processorpath <路径> 指定查找注释处理程序的位置
-d <目录> 指定放置生成的类文件的位置
-s <目录> 指定放置生成的源文件的位置
-implicit:{none,class} 指定是否为隐式引用文件生成类文件
-encoding <编码> 指定源文件使用的字符编码
-source <发行版> 提供与指定发行版的源兼容性
-target <发行版> 生成特定 VM 版本的类文件
-version 版本信息
-help 输出标准选项的提要
-A关键字[=值] 传递给注释处理程序的选项
-X 输出非标准选项的提要
-J<标记> 直接将 <标记> 传递给运行时系统
-Werror 出现警告时终止编译
@<文件名> 从文件读取选项和文件名
查看hadoop -version
[root@Slave2 hadoop]# hadoop version
Hadoop 2.5.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r cc72e9b000545b86b75a61f4835eb86d57bfafc0
Compiled by jenkins on 2014-11-14T23:45Z
Compiled with protoc 2.5.0
From source with checksum df7537a4faa4658983d397abf4514320
This command was run using /opt/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-common-2.5.2.jar
设置Hadoop TODO
进入配置文件目录:cd ### /opt/hadoop/hadoop-2.5.2/etc/hadoop
/opt/hadoop/hadoop-2.5.2/etc/hadoop/core-site.xml
/opt/hadoop/hadoop-2.5.2/etc/hadoop/hdfs-site.xml
/opt/hadoop/hadoop-2.5.2/etc/hadoop/mapred-site.xml
/opt/hadoop/hadoop-2.5.2/etc/hadoop/yarn-site.xml
配置/opt/hadoop/hadoop-2.5.2/etc/hadoop/slaves
Slave1.Hadoop
Slave2.Hadoop
配置启动脚本 在所有节点上都需要操作
vim /opt/hadoop/hadoop-2.5.2/etc/hadoop/hadoop-env.sh
The java implementation to use.
export JAVA_HOME=${JAVA_HOME}
The jsvc implementation to use. Jsvc is required to run secure datanodes.
#export JSVC_HOME=${JSVC_HOME}
TODO SHENGLING
export JAVA_HOME=/opt/hadoop/jdk1.7.0_51/
vim /opt/hadoop/hadoop-2.5.2/etc/hadoop/yarn-env.sh
some Java parameters
export JAVA_HOME=/home/y/libexec/jdk1.6.0/
TODO SHENGLING
export JAVA_HOME=/opt/hadoop/jdk1.7.0_51/
配置其他服务器免密码登陆Master TODO
关闭各个节点上的防火墙和selinux
service iptables stop
[root@Slave1 hadoop]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
vim /etc/sysconfig/selinux
SELINUX=disabled
setenforce 0
getenforce
运行Hadoop上的Hello World程序,WordCount
启动hadoop
cd /opt/hadoop/hadoop-2.5.2/sbin/
./start-dfs.sh
./start-yarn.sh
在浏览器中分别输入 http://192.168.133.150:50070/查看dfs http://192.168.133.150:8088/cluster查看yarn
Hadoop文件系统上传文件
=======================================================================
[root@Master workspace]# hadoop fs -mkdir -p input # 建立输入文件夹
15/03/21 11:39:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
[root@Master workspace]# hadoop fs -ls # 查看输入文件夹建立情况
15/03/21 11:39:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - root supergroup 0 2015-03-21 11:39 input
运行WordCount
========================================================================
hadoop jar /opt/hadoop/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount input /opt/hadoop/workspace/output/
15/03/21 11:41:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
15/03/21 11:41:14 INFO client.RMProxy: Connecting to ResourceManager at Master.Hadoop/192.168.133.150:8032
15/03/21 11:41:15 INFO input.FileInputFormat: Total input paths to process : 2
15/03/21 11:41:15 INFO mapreduce.JobSubmitter: number of splits:2
15/03/21 11:41:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426962869860_0002
15/03/21 11:41:16 INFO impl.YarnClientImpl: Submitted application application_1426962869860_0002
15/03/21 11:41:16 INFO mapreduce.Job: The url to track the job: http://Master.Hadoop:8088/proxy/application_1426962869860_0002/
15/03/21 11:41:16 INFO mapreduce.Job: Running job: job_1426962869860_0002
15/03/21 11:41:30 INFO mapreduce.Job: Job job_1426962869860_0002 running in uber mode : false
15/03/21 11:41:30 INFO mapreduce.Job: map 0% reduce 0%
15/03/21 11:41:44 INFO mapreduce.Job: map 100% reduce 0%
15/03/21 11:41:56 INFO mapreduce.Job: map 100% reduce 100%
15/03/21 11:41:56 INFO mapreduce.Job: Job job_1426962869860_0002 completed successfully
15/03/21 11:41:57 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=54
FILE: Number of bytes written=291890
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=248
HDFS: Number of bytes written=23
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=24448
Total time spent by all reduces in occupied slots (ms)=8693
Total time spent by all map tasks (ms)=24448
Total time spent by all reduce tasks (ms)=8693
Total vcore-seconds taken by all map tasks=24448
Total vcore-seconds taken by all reduce tasks=8693
Total megabyte-seconds taken by all map tasks=25034752
Total megabyte-seconds taken by all reduce tasks=8901632
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=40
Map output materialized bytes=60
Input split bytes=224
Combine input records=4
Combine output records=4
Reduce input groups=3
Reduce shuffle bytes=60
Reduce input records=4
Reduce output records=3
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=345
CPU time spent (ms)=2670
Physical memory (bytes) snapshot=510644224
Virtual memory (bytes) snapshot=2516897792
Total committed heap usage (bytes)=257171456
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=24
File Output Format Counters
Bytes Written=23
查看输出结果
[root@Master output]# hadoop fs -ls /opt/hadoop/workspace/output/ #与运行文件系统时指定的位置一致
15/03/21 11:52:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Found 2 items
-rw-r–r– 2 root supergroup 0 2015-03-21 11:41 /opt/hadoop/workspace/output/_SUCCESS
-rw-r–r– 2 root supergroup 23 2015-03-21 11:41 /opt/hadoop/workspace/output/part-r-00000
抽取出文件到本地以便查看
hadoop fs -get /opt/hadoop/workspace/output/part-r-00000 /tmp/part-r-0000
vim /tmp/part-r-000
========================================================================
版权声明
本文标题:5-hadoop集群搭建指南
文章作者:盛领
发布时间:2015年04月02日 - 01:44:50
原始链接:http://blog.xiaoyuyu.net/post/decbe4c9.html
许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。
如您有任何商业合作或者授权方面的协商,请给我留言:sunsetxiao@126.com