标签为 Oozie 的文章
分享一个Oozie Job Debug脚本
由 mcsrainbow 发表在 Linux&Unix 分类,时间 2016/05/24
参考资料:
https://oozie.apache.org/docs/4.0.0/WebServicesAPI.html
背景介绍:
在我们的线上Hadoop集群中,采用了Oozie来作为Workflow的管理,而平时有不少Workflow在执行过程中会因为各种问题而失败。
于是,我们通常都会通过Oozie Web Console去Troubleshooting,但是整个过程并不方便,在研究了Oozie API之后,我写了一个脚本来自动化的帮我们完成绝大部分的Troubleshooting步骤。
具体配置:
整个脚本需要模拟的Troubleshooting思路如下:
1. 获取整个Workflow所有步骤的信息,通常的状态有:OK,RUNNING,FAILED,KILLED,ERROR
2. 对FAILED,KILLED,ERROR状态的步骤,首先获取其consoleUrl,然后进一步获取更有价值的logsLinks,同时打印相关的调试信息,并导出该步骤的相关XML配置文件
脚本地址:https://github.com/mcsrainbow/python-demos/blob/master/demos/debug_oozie_job.py
执行示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
[dong@idc1-server1 ~]$ debug_oozie_job.py --server idc1-hive1 --job_id 0011096-160121234010195-oozie-oozi-C@2387 ################################## externalId: 0061222-160121234010195-oozie-oozi-W status: 'OK', name: 'fork-1' status: 'OK', name: ':start:' status: 'OK', name: 'check-point' status: 'OK', name: 'daily-decision' status: 'OK', name: 'extract-labelpair-profiles' status: 'OK', name: 'extract-web-profiles' status: 'ERROR', name: 'extract-nobid-profiles' consoleUrl: 'http://idc1-rm1.heylinux.com:8100/proxy/application_1458783473169_227279' logsLinks: http://idc1-rm1.heylinux.com:19888/jobhistory/logs/idc1-node1.heylinux.com:43483/container_1458783473169_227279_01_000002/attempt_1458783473169_227279_m_000000_0/oozie *DEBUG*: status: 'ERROR' retries: '0' transition: 'email-error' stats: 'None' startTime: 'Tue, 24 May 2016 02:50:19 GMT' toString: 'Action name[extract-nobid-profiles] status[ERROR]' cred: 'null' errorMessage: 'None' errorCode: 'None' consoleUrl: 'http://idc1-rm1.heylinux.com:8100/proxy/application_1458783473169_227279' externalId: 'job_1458783473169_227279' externalStatus: 'FAILED/KILLED' conf: '/tmp/0011096-160121234010195-oozie-oozi-C@2387_extract-nobid-profiles.xml' type: 'map-reduce' trackerUri: 'idc1-rm1:8032' externalChildIDs: '' endTime: 'Tue, 24 May 2016 03:24:42 GMT' data: 'None' id: '0061222-160121234010195-oozie-oozi-W@extract-nobid-profiles' name: 'extract-nobid-profiles' status: 'OK', name: 'extract-data-profiles' status: 'OK', name: 'extract-optout-profiles' status: 'OK', name: 'fail' status: 'OK', name: 'email-error' ################################## Please check the URLs in "logsLinks" above for detailed informations. Do NOT ignore the messages in "Log Type: stdout". |
Hadoop运维笔记 之 CDH5.0.0升级到CDH5.3.0
由 mcsrainbow 发表在 Corporation, Linux&Unix 分类,时间 2015/05/08
参考资料:
Hadoop: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_earlier_cdh5_upgrade.html?scroll=topic_8
Oozie: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_oozie_upgrade.html
Hive: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_hive_upgrade.html
Pig: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_pig_upgrade.html
1. 在所有Hadoop服务器上停止Monit(我们线上使用了Monit来监听进程)
登录idc2-admin1(我们线上使用了idc2-admin1作为管理机以及Yum repo服务器)
# mkdir /root/cdh530_upgrade_from_500
# cd /root/cdh530_upgrade_from_500
# pssh -i -h idc2-hnn-rm-hive ‘service monit stop’
# pssh -i -h idc2-hmr.active ‘service monit stop’
2. 确认本地的CDH5.3.0的Yum repo服务器已经就绪
http://idc2-admin1/repo/cdh/5.3.0/
http://idc2-admin1/repo/cloudera-gplextras5.3.0/
3. 在Ansible中更新相应的repo模板(我们线上使用了Ansible作为配置管理工具)
{% if "idc2" in group_names %} ... {% if "cdh5-all" in group_names %} [heylinux.el6.cloudera-cdh5.3.0] name= el6 yum cloudera cdh5.3.0 baseurl=http://idc2-admin1/repo/cdh/5.3.0 enabled=1 gpgcheck=0 [heylinux.el6.cloudera-gplextras5.3.0] name= el6 yum cloudera gplextras5.3.0 baseurl=http://idc2-admin1/repo/cloudera-gplextras5.3.0 enabled=1 gpgcheck=0 {% else %} ... {% endif %}
4. 更新所有Hadoop服务器的repo文件(/etc/yum.repos.d/heylinux.repo)
# ansible-playbook –private-key /path/to/key_root -u root –vault-password-file=/path/to/vault_passwd.file base.yml -i hosts.idc2 –tags localrepos –limit cdh5-all
5. 升级HDFS相关内容
5.1. 获取当前的Activie Namenode(我们在线上的DNS服务器中创建了一个始终检查并指向Active Namenode的CNAME)
# host active-idc2-hnn
active-idc2-hnn.heylinux.com is an alias for idc2-hnn2.heylinux.com
idc2-hnn2.heylinux.com has address 172.16.2.12
5.2. 在Active NameNode上进入safe mode并生成新的fsimage,并等待整个过程结束。
# sudo -u hdfs hdfs dfsadmin -safemode enter
# sudo -u hdfs hdfs dfsadmin -saveNamespace
5.3 关闭所有的Hadoop服务
回到idc2-admin1上的工作目录
# cd /root/cdh530_upgrade_from_500
首先通过pssh批量关闭Namenode,ResourceManager以及Hive服务器上的Hadoop相关进程(将对应的服务器地址或主机名列表写入到idc2-hnn-rm-hive与idc2-hmr.active)
# pssh -i -h idc2-hnn-rm-hive ‘for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x status ; done’
# pssh -i -h idc2-hmr.active ‘for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x status ; done’
# pssh -i -h idc2-hnn-rm-hive ‘for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done’
# pssh -i -h idc2-hmr.active ‘for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done’
# 检查如果存在与新版本相冲突的libhadoop.so文件,如果存在则删除(我们线上安装了Snappy,它会自己生成一个与CDH5.3.0自带的libhadoop.so相冲突的文件并放置到当前的JDK lib目录下面)。
# pssh -i -h idc2-hnn-rm-hive ‘rm -f /usr/java/jdk1.7.0_45/jre/lib/amd64/libhadoop.so’
# pssh -i -h idc2-hmr.active ‘rm -f /usr/java/jdk1.7.0_45/jre/lib/amd64/libhadoop.so’
Backup the HDFS metadata on the NameNodes
在Namenodes上备份metadata文件(我们线上有两个Namenode组成的HA,分别为idc2-hnn1与idc2-hnn2:
# mkdir /root/cdh530upgrade
# cd /root/cdh530upgrade
# tar -cf /root/nn_backup_data.data1.`date +%Y%m%d`.tar /data1/dfs/nn
# tar -cf /root/nn_backup_data.data2.`date +%Y%m%d`.tar /data2/dfs/nn
6. 升级Hadoop相关软件包
登录并升级Hive服务器idc2-hive1
# yum clean all; yum upgrade hadoop
登录并升级ResourceManager服务器idc2-rm1与idc2-rm2
# yum clean all; yum upgrade hadoop
回到idc2-admin1并升级所有的Datanode服务器idc2-hmr*
# pssh -i -h idc2-hmr.active ‘yum clean all; yum upgrade hadoop hadoop-lzo -y’
登录并升级idc2-hnn1(Standby Namenode,由之前的host active-idc2-hnn命令判断得来)
# yum clean all; yum upgrade hadoop hadoop-lzo
登录并升级idc2-hnn2(Active Namenode,由之前的host active-idc2-hnn命令判断得来)
# yum clean all; yum upgrade hadoop hadoop-lzo
回到idc2-admin1并升级所有的Hadoop Clients
# pssh -i -h idc2-client ‘yum clean all; yum upgrade hadoop -y’
7. 启动相关服务
登录并启动Journal Nodes服务(我们线上为idc2-hnn1, idc2-hnn2, idc2-rm1三台服务器)
# service hadoop-hdfs-journalnode start
登录所有的DataNode并启动服务(我们线上为idc2-hmr*服务器)
# service hadoop-hdfs-datanode start
登录Active NameNode并更新HDFS Metadata
# service hadoop-hdfs-namenode upgrade
# tailf /var/log/hadoop/hadoop-hdfs-namenode-`hostname -s`.heylinux.com.log
一直等待直到整个过程结束,例如在Log中出现如下类似内容:
/var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete.
等待直至NameNode退出Safe Mode,然后重启Standby NameNode
登录Standby NameNode并重启服务
# sudo -u hdfs hdfs namenode -bootstrapStandby
# service hadoop-hdfs-namenode start
登录所有的ResourceManager并启动服务
# service hadoop-yarn-resourcemanager start
登录所有的NodeManager并启动服务(我们线上为idc2-hmr*服务器)
# service hadoop-yarn-nodemanager start
在Active ResourceManager上启动HistoryServer(我们线上为idc2-rm1服务器)
# service hadoop-mapreduce-historyserver start
至此,整个Hadoop相关的升级就结束了,下面,将对Hive,Oozie和Pig的升级做相应的介绍。
8. 升级Hive与Oozie服务器(我们线上统一安装到了一台机器idc2-hive1)
8.1 升级Hive服务器
备份Metastore数据库
# mkdir -p /root/backupfiles/hive
# cd /root/backupfiles/hive
# mysqldump -uoozie -pheylinux metastore > metastore.sql.bak.`date +%Y%m%d`
更新hive-site.xml
Confirm the following settings are present in hive-site.xml <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property>
停止Hive相关服务
# service hive-server2 stop
# service hive-metastore stop
升级Hive相关软件包
# yum upgrade hive hive-metastore hive-server2 hive-jdbc
# yum install hive-hbase hive-hcatalog hive-webhcat
升级Hive的Metastore
# sudo -u oozie /usr/lib/hive/bin/schematool -dbType mysql -upgradeSchemaFrom 0.12.0
启动Hive服务
# service hive-metastore start
# service hive-server2 start
8.2 升级Oozie服务器
备份Oozie数据库
# mkdir -p /root/backupfiles/hive
# cd /root/backupfiles/hive
# mysqldump -uoozie -pheylinux oozie > oozie.sql.bak.`date +%Y%m%d`
备份Oozie配置文件
# tar cf oozie.conf.bak.`date +%Y%m%d` /etc/oozie/conf
停止Oozie
# service oozie stop
升级Oozie软件包
# yum upgrade oozie oozie-client
仔细校对新的配置文件中与原有配置文件中的参数,并将原有配置文件中的参数更新到新的配置文件
备份Oozie lib目录
# tar cf oozie.lib.bak.`date +%Y%m%d` /var/lib/oozie
升级Oozie数据库
# sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -run
升级Oozie Shared Library
# sudo -u oozie hadoop fs -mv /user/oozie/share /user/oozie/share.orig.`date +%Y%m%d`
# sudo oozie-setup sharelib create -fs hdfs://idc1-hnn2:8020 -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz
将所有的library从目录/user/oozie/share/lib/lib_<new_date_string>移动到/user/oozie/share/lib(<new_date_string>为目录生成后上面的时间戳)
# sudo -u oozie mv /user/oozie/share/lib/lib_<new_date_string>/* /user/oozie/share/lib/
检查HDFS中/user/oozie/share目录下的所有文件,并与备份后的share.orig.`date +%Y%m%d`中的文件进行一一对比,除了带有”cdh5″版本字样的软件包仅保留更新的以外,其它的都复制到新的lib目录下。
启动Oozie服务器
# service oozie start
9. 升级Pig
杀掉所有正在运行的Pig进程
# pkill -kill -f pig
更新Pig软件包
# yum upgrade pig
10. 在所有的软件包都升级完毕,并且HDFS也能正常工作的情况下,执行finalizeUpgrade命令做最后的收尾
登录Active Namenode并执行以下命令
# sudo -u hdfs hdfs dfsadmin -finalizeUpgrade
Hadoop集群(CHD4)实践之 (4) Oozie搭建
由 mcsrainbow 发表在 Corporation, Linux&Unix 分类,时间 2013/12/02
目录结构
Hadoop集群(CDH4)实践之 (0) 前言
Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建
Hadoop集群(CDH4)实践之 (2) HBase&Zookeeper搭建
Hadoop集群(CDH4)实践之 (3) Hive搭建
Hadoop集群(CHD4)实践之 (4) Oozie搭建
Hadoop集群(CHD4)实践之 (5) Sqoop安装
本文内容
Hadoop集群(CHD4)实践之 (4) Oozie搭建
环境准备
OS: CentOS 6.4 x86_64
Servers:
hadoop-master: 172.17.20.230 内存10G
– namenode
– hbase-master
hadoop-secondary: 172.17.20.234 内存10G
– secondarybackupnamenode,jobtracker
– hive-server,hive-metastore
– oozie
hadoop-node-1: 172.17.20.231 内存10G sudo yum install hbase-regionserver
– datanode,tasktracker
– hbase-regionserver,zookeeper-server
hadoop-node-2: 172.17.20.232 内存10G
– datanode,tasktracker
– hbase-regionserver,zookeeper-server
hadoop-node-3: 172.17.20.233 内存10G
– datanode,tasktracker
– hbase-regionserver,zookeeper-server
对以上角色做一些简单的介绍:
namenode – 整个HDFS的命名空间管理服务
secondarynamenode – 可以看做是namenode的冗余服务
jobtracker – 并行计算的job管理服务
datanode – HDFS的节点服务
tasktracker – 并行计算的job执行服务
hbase-master – Hbase的管理服务
hbase-regionServer – 对Client端插入,删除,查询数据等提供服务
zookeeper-server – Zookeeper协作与配置管理服务
hive-server – Hive的管理服务
hive-metastore – Hive的元存储,用于对元数据进行类型检查与语法分析
oozie – Oozie是一种Java Web应用程序,用于工作流的定义和管理
本文定义的规范,避免在配置多台服务器上产生理解上的混乱:
以下操作都只需要在 Oozie 所在主机,即 hadoop-secondary 上执行。
1. 安装前的准备
Hadoop集群(CDH4)实践之 (3) Hive搭建
2. 安装Oozie
$ sudo yum install oozie oozie-client
3. 创建Oozie数据库
$ mysql -uroot -phiveserver
mysql> create database oozie; mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie'; mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie'; mysql> exit;
4.配置oozie-site.xml
$ sudo vim /etc/oozie/conf/oozie-site.xml
<?xml version="1.0"?> <configuration> <property> <name>oozie.service.ActionService.executor.ext.classes</name> <value> org.apache.oozie.action.email.EmailActionExecutor, org.apache.oozie.action.hadoop.HiveActionExecutor, org.apache.oozie.action.hadoop.ShellActionExecutor, org.apache.oozie.action.hadoop.SqoopActionExecutor, org.apache.oozie.action.hadoop.DistcpActionExecutor </value> </property> <property> <name>oozie.service.SchemaService.wf.ext.schemas</name> <value>shell-action-0.1.xsd,shell-action-0.2.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcp-action-0.1.xsd</value> </property> <property> <name>oozie.system.id</name> <value>oozie-${user.name}</value> </property> <property> <name>oozie.systemmode</name> <value>NORMAL</value> </property> <property> <name>oozie.service.AuthorizationService.security.enabled</name> <value>false</value> </property> <property> <name>oozie.service.PurgeService.older.than</name> <value>30</value> </property> <property> <name>oozie.service.PurgeService.purge.interval</name> <value>3600</value> </property> <property> <name>oozie.service.CallableQueueService.queue.size</name> <value>10000</value> </property> <property> <name>oozie.service.CallableQueueService.threads</name> <value>10</value> </property> <property> <name>oozie.service.CallableQueueService.callable.concurrency</name> <value>3</value> </property> <property> <name>oozie.service.coord.normal.default.timeout </name> <value>120</value> </property> <property> <name>oozie.db.schema.name</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.create.db.schema</name> <value>true</value> </property> <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.pool.max.active.conn</name> <value>10</value> </property> <property> <name>oozie.service.HadoopAccessorService.kerberos.enabled</name> <value>false</value> </property> <property> <name>local.realm</name> <value>LOCALHOST</value> </property> <property> <name>oozie.service.HadoopAccessorService.keytab.file</name> <value>${user.home}/oozie.keytab</value> </property> <property> <name>oozie.service.HadoopAccessorService.kerberos.principal</name> <value>${user.name}/localhost@${local.realm}</value> </property> <property> <name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name> <value> </value> </property> <property> <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name> <value> </value> </property> <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/etc/hadoop/conf</value> </property> <property> <name>oozie.service.WorkflowAppService.system.libpath</name> <value>/user/${user.name}/share/lib</value> </property> <property> <name>use.system.libpath.for.mapreduce.and.pig.jobs</name> <value>false</value> </property> <property> <name>oozie.authentication.type</name> <value>simple</value> </property> <property> <name>oozie.authentication.token.validity</name> <value>36000</value> </property> <property> <name>oozie.authentication.signature.secret</name> <value>oozie</value> </property> <property> <name>oozie.authentication.cookie.domain</name> <value></value> </property> <property> <name>oozie.authentication.simple.anonymous.allowed</name> <value>true</value> </property> <property> <name>oozie.authentication.kerberos.principal</name> <value>HTTP/localhost@${local.realm}</value> </property> <property> <name>oozie.authentication.kerberos.keytab</name> <value>${oozie.service.HadoopAccessorService.keytab.file}</value> </property> <property> <name>oozie.authentication.kerberos.name.rules</name> <value>DEFAULT</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.oozie.groups</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property> <property> <name>oozie.action.mapreduce.uber.jar.enable</name> <value>true</value> </property> <property> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> <value>hdfs,viewfs</value> </property> </configuration>
5. 配置Oozie Web Console
$ cd /tmp/
$ wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
$ cd /var/lib/oozie/
$ sudo unzip /tmp/ext-2.2.zip
$ cd ext-2.2/
$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
6. 配置Oozie ShareLib
$ mkdir /tmp/ooziesharelib
$ cd /tmp/ooziesharelib
$ tar xzf /usr/lib/oozie/oozie-sharelib.tar.gz
$ sudo -u oozie hadoop fs -put share /user/oozie/share
$ sudo -u oozie hadoop fs -ls /user/oozie/share
$ sudo -u oozie hadoop fs -ls /user/oozie/share/lib
$ sudo -u oozie hadoop fs -put /usr/lib/hive/lib/hbase.jar /user/oozie/share/lib/hive/
$ sudo -u oozie hadoop fs -put /usr/lib/hive/lib/zookeeper.jar /user/oozie/share/lib/hive/
$ sudo -u oozie hadoop fs -put /usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar /user/oozie/share/lib/hive/
$ sudo -u oozie hadoop fs -put /usr/lib/hive/lib/guava-11.0.2.jar /user/oozie/share/lib/hive/
$ sudo ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar
7. 启动Oozie
$ sudo service oozie start
8. 访问Oozie Web Console
http://hadoop-secondary:11000/oozie
9. 至此,Oozie的搭建就已经完成。
近期评论(Recent Comments)
问题找到啦,非常感谢
感谢提供解决问题的思路。我的情况是因为文件有损坏,使用hotcopy 会出现“svnadmin: Can't open file '/SVN_PATH/db/revprops/24/24685'...
大神,您好。 你的博客 都是使用什么软件和主题搭建的哈?关注你的博客很久了。 也想自己搭建一个 总结 反思自己。谢谢大神...
int result = 0; for (int i = 0; i < 101; i++) { result ^= data[i]; ...
如果确认所有的表都是INNODB引擎,没有任何MyISAM表,还可以加上--no-lock参数。...
讲的不错, mark
答案无疑是本地端口转发了,它的命令格式是: ssh -L :: 原来是这个原理...