Hadoop集群实践 之 (2) HBase&Zookeeper搭建


目录结构
Hadoop集群实践 之 (0) 完整架构设计 [Hadoop(HDFS),HBase,Zookeeper,Flume,Hive]
Hadoop集群实践 之 (1) Hadoop(HDFS)搭建
Hadoop集群实践 之 (2) HBase&Zookeeper搭建
Hadoop集群实践 之 (3) Flume搭建
Hadoop集群实践 之 (4) Hive搭建

本文内容
Hadoop集群实践 之 (2) HBase&Zookeeper搭建

参考资料
http://blog.csdn.net/hguisu/article/details/7244413
http://www.yankay.com/wp-content/hbase/book.html
http://blog.nosqlfan.com/html/3694.html

安装配置HBase&Zookeeper
环境准备
OS: Ubuntu 10.10 Server 64-bit
Servers:
hadoop-master:10.6.1.150 内存1024M
- namenode,jobtracker;hbase-master,hbase-thrift;
- secondarynamenode;
- zookeeper-server;
- datanode,taskTracker

hadoop-node-1:10.6.1.151 内存640M
- datanode,taskTracker;hbase-regionServer;
- zookeeper-server;

hadoop-node-2:10.6.1.152 内存640M
- datanode,taskTracker;hbase-regionServer;
- zookeeper-server;

对以上角色做一些简单的介绍:
namenode - 整个HDFS的命名空间管理服务
secondarynamenode - 可以看做是namenode的冗余服务
jobtracker - 并行计算的job管理服务
datanode - HDFS的节点服务
tasktracker - 并行计算的job执行服务
hbase-master - Hbase的管理服务
hbase-regionServer - 对Client端插入,删除,查询数据等提供服务
zookeeper-server - Zookeeper协作与配置管理服务

将HBase与Zookeeper的安装配置放在同一篇文章的原因是:
一个分布式运行的Hbase依赖一个zookeeper集群,所有的节点和客户端都必须能够访问zookeeper。

本文定义的规范,避免在配置多台服务器上产生理解上的混乱:
所有直接以 $ 开头,没有跟随主机名的命令,都代表需要在所有的服务器上执行,除非后面有单独的//开头的说明。

1. 安装前的准备
已经完成了 Hadoop集群实践 之 (1) Hadoop(HDFS)搭建

配置NTP时钟同步
dongguo@hadoop-master:~$ sudo /etc/init.d/ntp start
dongguo@hadoop-node-1:~$ sudo ntpdate hadoop-master
dongguo@hadoop-node-2:~$ sudo ntpdate hadoop-master

配置ulimit与nproc参数
$ sudo vim /etc/security/limits.conf

hdfs		-	nofile		32768
hbase		-	nofile		32768

hdfs		soft	nproc		32000
hdfs		hard	nproc		32000

hbase		soft	nproc		32000
hbase		hard	nproc		32000

$ sudo vim /etc/pam.d/common-session

session required	pam_limits.so

退出并重新登录SSH使设置生效

2. 在hadoop-master上安装hbase-master
dongguo@hadoop-master:~$ sudo apt-get install hadoop-hbase-master
dongguo@hadoop-master:~$ sudo apt-get install hadoop-hbase-thrift

3. 在hadoop-node上安装hbase-regionserver
dongguo@hadoop-node-1:~$ sudo apt-get install hadoop-hbase-regionserver
dongguo@hadoop-node-2:~$ sudo apt-get install hadoop-hbase-regionserver

4. 在HDFS中创建HBase的目录
dongguo@hadoop-master:~$ sudo -u hdfs hadoop fs -mkdir /hbase
dongguo@hadoop-master:~$ sudo -u hdfs hadoop fs -chown hbase /hbase

5. 配置hbase-env.sh
$ sudo vim /etc/hbase/conf/hbase-env.sh

export JAVA_HOME="/usr/lib/jvm/java-6-sun"
export HBASE_MANAGES_ZK=true

6. 配置hbase-site.xml
$ sudo vim /etc/hbase/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://hadoop-master:8020/hbase</value>
</property>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>hadoop-master,hadoop-node-1,hadoop-node-2</value>
</property>
</configuration>

hbase.cluster.distributed指定了Hbase的运行模式。false是单机模式,true是分布式模式。
hbase.rootdir目录是region server的共享目录,用来持久化Hbase。

hbase.zookeeper.quorum是Zookeeper集群的地址列表,用逗号分割。

运行一个zookeeper也是可以的,但是在生产环境中,最好部署3,5,7个节点。
部署的越多,可靠性就越高,当然只能部署奇数个,偶数个是不可以的。
需要给每个zookeeper 1G左右的内存,如果可能的话,最好有独立的磁盘确保zookeeper是高性能的。
如果你的集群负载很重,不要把Zookeeper和RegionServer运行在同一台机器上面,就像DataNodes和TaskTrackers一样。

7. 配置regionservers
$ sudo vim /etc/hbase/conf/regionservers

hadoop-node-1
hadoop-node-2

8. 安装Zookeeper
$ sudo apt-get install hadoop-zookeeper-server

$ sudo vim /etc/zookeeper/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper
clientPort=2181
maxClientCnxns=0
server.1=hadoop-master:2888:3888
server.2=hadoop-node-1:2888:3888
server.3=hadoop-node-2:2888:3888

$ sudo mkdir /data/zookeeper
$ sudo chown zookeeper:zookeeper /data/zookeeper

9. 创建myid文件
dongguo@hadoop-master:~$ sudo -u zookeeper vim /data/zookeeper/myid

1

dongguo@hadoop-node-1:~$ sudo -u zookeeper vim /data/zookeeper/myid

2

dongguo@hadoop-node-2:~$ sudo -u zookeeper vim /data/zookeeper/myid

3

10. 启动Hbase与Zookeeper服务
在hadoop-master上
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-zookeeper-server start
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-hbase-master start
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-hbase-thrift start

在hadoop-node上
dongguo@hadoop-node-1:~$ sudo /etc/init.d/hadoop-zookeeper-server start
dongguo@hadoop-node-1:~$ sudo /etc/init.d/hadoop-hbase-regionserver start
dongguo@hadoop-node-2:~$ sudo /etc/init.d/hadoop-zookeeper-server start
dongguo@hadoop-node-2:~$ sudo /etc/init.d/hadoop-hbase-regionserver start

11. 查看服务的状态
通过网页查看 http://10.6.1.150:60010

12. 进行HBase Shell练习对HBase的操作
dongguo@hadoop-master:~$ hbase shell

HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

#创建一个名为test的表,这个表只有一个column family为cf
#可以列出所有的表来检查创建情况,然后插入一些值
hbase(main):001:0> create 'test','cf'
0 row(s) in 5.1280 seconds

hbase(main):002:0> list 'table'
TABLE                                                                                                                           

0 row(s) in 0.0540 seconds

hbase(main):003:0> put 'test','row1','cf:a','value 1'
0 row(s) in 0.5000 seconds

hbase(main):004:0> put 'test','row2','cf:b','value 2'
0 row(s) in 0.0180 seconds

hbase(main):005:0> put 'test','row3','cf:c','value 3'
0 row(s) in 0.0150 seconds


#以上我们分别插入了3行。第一个行key为row1, 列族为 cf:a, 值是 value1。
#Hbase中的列是由 column family前缀和列的名字组成的,以冒号间隔,例如这一行的列名就是a。

#检查插入情况,Scan这个表,操作如下
hbase(main):006:0> scan 'test'
ROW                                   COLUMN+CELL                                                                               

 row1                                 column=cf:a, timestamp=1349697932338, value=value 1                                       

 row2                                 column=cf:b, timestamp=1349697945102, value=value 2                                       

 row3                                 column=cf:c, timestamp=1349697953054, value=value 3                                       

                  
3 row(s) in 0.1250 seconds

#Get一行,操作如下
hbase(main):007:0> get 'test','row1'
COLUMN                                CELL                                                                                      

 cf:a                                 timestamp=1349697932338, value=value 1                                                    

                  
1 row(s) in 0.0220 seconds

#disable 再 drop 这张表,可以清除刚刚的操作
hbase(main):008:0> disable 'test'
0 row(s) in 2.1400 seconds

hbase(main):009:0> drop 'test'
0 row(s) in 1.2590 seconds

#关闭shell
hbase(main):010:0> exit

13. 通过zookeeper-client对Zookeeper进行操作
dongguo@hadoop-master:~$ zookeeper-client

[zk: localhost:2181(CONNECTED) 0] ls /
[hbase, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls /hbase
[splitlog, unassigned, rs, root-region-server, table, master, shutdown]
[zk: localhost:2181(CONNECTED) 2] get /hbase/rs

cZxid = 0x100000004
ctime = Mon Oct 08 19:31:27 CST 2012
mZxid = 0x100000004
mtime = Mon Oct 08 19:31:27 CST 2012
pZxid = 0x10000001c
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2
[zk: localhost:2181(CONNECTED) 4] get /hbase/master
hadoop-master:60000
cZxid = 0x100000007
ctime = Mon Oct 08 19:31:27 CST 2012
mZxid = 0x100000007
mtime = Mon Oct 08 19:31:27 CST 2012
pZxid = 0x100000007
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x23a4021bcb60000
dataLength = 19
numChildren = 0

19. 至此,HBase&Zookeeper的搭建就已经完成。

20. 接着,我们可以开始以下过程:
Hadoop集群实践 之 (3) Flume搭建
Hadoop集群实践 之 (4) Hive搭建

, , ,

  1. No comments yet.
(will not be published)
*