Installing hadoop/hive on ubuntu

If you are a hive noob like me, this post may be of use to you. I just wanted to install hadoop/hive on my ubuntu (on a dell) box, so that i can run hive (hive -e “”) commands from eclipse, before I commit the python scripts.

Installing hadoop and hive seems to be pretty straight forward. Except if you are someone who always makes wrong choices (like me!). I edited the wrong config file and had to spend considerable time to figure this simple thing out.

Installing Hadoop:

http://dmitrypukhov.pro/install-hadoop-on-ubuntu/

Installing hive:

http://dmitrypukhov.pro/install-hive-on-ubuntu/

Following the above instructions, I was able to install hive, except the following step.

http://wenda.baba.io/questions/5129058/missing-hive-execution-jar-usr-local-hadoop-hive-lib-hive-exec-jar.html

As mentioned in the above post, copying the lib directory from hive-0.12.0.tar.gz to the $HIVE_HOME directory (/opt/hive in my case), solved the problem.

*****

Hive, by default stores the metadata in derby database. Its good enough apparently but strangely writes derby.log file and metastore_db, in whichever directory we start hive shell from. There might be ways to fix this, but i decided to get rid of derby and use mysql instead.

Configuring hive with mysql:

http://java.dzone.com/articles/how-configure-mysql-metastore

If you follow the instructions above, you should be fine. Just make sure you edit the correct hive-site.xml file.

On my box,

hduser@learningbox:~$ locate hive-site.xml

/etc/hive/conf.dist/hive-site.xml

/opt/hive/common/src/test/resources/hive-site.xml

/opt/hive/conf/hive-site.xml

/opt/hive/data/conf/hive-site.xml

/opt/hive/data/conf/tez/hive-site.xml

/opt/hive/hcatalog/conf/proto-hive-site.xml

/opt/hive/hcatalog/src/packages/templates/conf/hive-site.xml.template

As mentioed in the above blog post, editing /opt/hive/conf/hive-site.xml gets the stuff done. Except if you edited another file, like I did.

Other Notes:

JSON serde.

http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html

https://github.com/rcongiu/Hive-JSON-Serde#start-of-content

Creating a table with json serde and inserting turned out to be frustrating. The solution that worked was to create table, move data to hdfs and add the partition.

I also faced this issue in hive 0.13.

https://issues.apache.org/jira/browse/HIVE-8538

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: