Installing hadoop/hive on ubuntu

If you are a hive noob like me, this post may be of use to you. I just wanted to install hadoop/hive on my ubuntu (on a dell) box, so that i can run hive (hive -e “”) commands from eclipse, before I commit the python scripts.

Installing hadoop and hive seems to be pretty straight forward. Except if you are someone who always makes wrong choices (like me!). I edited the wrong config file and had to spend considerable time to figure this simple thing out.

Installing Hadoop:

Installing hive:

Following the above instructions, I was able to install hive, except the following step.

As mentioned in the above post, copying the lib directory from hive-0.12.0.tar.gz to the $HIVE_HOME directory (/opt/hive in my case), solved the problem.


Hive, by default stores the metadata in derby database. Its good enough apparently but strangely writes derby.log file and metastore_db, in whichever directory we start hive shell from. There might be ways to fix this, but i decided to get rid of derby and use mysql instead.

Configuring hive with mysql:

If you follow the instructions above, you should be fine. Just make sure you edit the correct hive-site.xml file.

On my box,

hduser@learningbox:~$ locate hive-site.xml








As mentioed in the above blog post, editing /opt/hive/conf/hive-site.xml gets the stuff done. Except if you edited another file, like I did.

Other Notes:

JSON serde.

Creating a table with json serde and inserting turned out to be frustrating. The solution that worked was to create table, move data to hdfs and add the partition.

I also faced this issue in hive 0.13.

Papernotes: Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Oops, I just forgot to save my notes as draft. Will come back to it later since this seems to be a very important paper. And something I found hard to grasp.

This is the notes for the following legendary paper by Dawid and Skene.


Someone has implemented Dawid Skene’s example problem (patients). With excellent comments.

Another implementation is available on pypi.

And DS is so useful that someone tried to offer DS as a service! Though it doesn’t seem to be working now.

Create a free website or blog at

Up ↑