Archive for January, 2014

2013 Annual Review

I still remember the most conversations when 'Tom' interviewed me last February, we had a good talk about Linux and many operation stuff.
At that time, my life did not go well, I was working overtime untill 11pm+ every day. I expected a challenging job but not too much busy, wanted more time with my family.

Now I believe I found it. In the past year I have gained a lot on my work, the knowledge, the team, a better life and a trip to the US.
The knowledge I've learned:
1. Python
I've heard Python, but did not use it before. For our project, I learned Python about two weeks, then wrote some useful tools to improve our operation. The most helpful tools are:
"", it makes us to create local virtual machines much easier, all the vms(200+) in our new colo were created by it.
"", it makes us to manage the servers inventory much easier, it collects the informations and auto audit them into the racktables with their rackspaces.
2. Puppet, Salt, Ansible
I used Chef as IT automation software before, our project uses Puppet, I learned it, wrote modules to install Hadoop, graphite, statsd and MooseFS. Then learned Salt and Ansible, wrote modules to do the same things as Puppet. And I also learned the HA solution for puppet servers.
3. Hadoop
I had experiences on Hadoop operation, but just for some small clusters. Our project has two big clusters, I learned many troubleshooting skills from our team and the runbooks, helped to fix some serious incidents when I was on-call. And learned how to integrate LDAP/Kerberos with Hadoop, upgrade CDH3 to CDH4 and Impala.
4. Nagios
I used Zabbix as monitor system before, our project uses Nagios. I learned how Nagios works then wrote some scripts to check new services, like Dyn QPS report, Disk errors, web api connection, time server and DNS.
5. Database
I learned MySQL auto-failover and Percona XtraDB Cluster to improve the high availability. Improved backup scripts and fixed a backup issue.
6. Others
I also learned many interesting stuff like MooseFS, DRBD, RPMBuild, Jenkins CI, CouchBase, Redis HA and BTSync.

The team I've gained:
1. Good team leader
'Tom' has a broad view, leads us to learn new technologies and improve our operation works, open-minded for suggestions. So we can enjoy the work and improve our skills.
2. Warm-hearted co-workers
Our team is small, but it's warm. We learn from each other, and we help each other. Especially at the beginning of my on-call time, 'Jack' helped me a lot very patiently. I love this team.
3. A better life I've gained:
I had more time to stay with my family, before I was single, I thought if I get married I wouldn't work harder because I have to take care of my family. But I was totally wrong, now I have the responsibility to work harder to make sure I can give them a better life.

A trip to the US:
Maybe this one is quite normal for many people, but it's amazing for me. Worked at US with you guys for half a month, I experienced many different things. The culture, the company, the people. In China only few company like that.

2013 is a great year for me, but I knew clearly that I didn't do well on some works:
1. Hadoop cluster operation
If the hadoop clusters have incidents about our log systems and oozie workflows, I fell difficult to find the root causes, sometimes the runbooks didn't cover all situations, if I don't understand them very well, I can't resolve the incidents. I need to learn more on these two parts.
2. Suggestions for operation
I should bring more useful suggestions. Not just follow the tickets, emails and the on-call. For example the Nagios, compared with Zabbix, its graphic tools are sucks. If something is better, I should learn it more and push it to improve our project.

Thanks. I will still enjoy the work, and work harder, keep learning to make our company better, make my life better.

No Comments

Fork me on GitHub