Chao Sun

Installing Hive and Hadoop locally

Posted on November 13, 2016 by Chao Sun

Recently, I got a new laptop and therefore needed to do a fresh installation for Hive and Hadoop. There are quite a few steps along the way. Here’s my recordings:

Setup SSH

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys


However, I’ve already have an key pair that is associated with a password. To avoid overwriting the key, I need to generate a separate key pair and use that to access localhost:

$ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/Users/chao/.ssh/id_rsa): /Users/chao/.ssh/id_rsa_local ...$ cat ~/.ssh/id_rsa_local.pub >> ~/.ssh/authorized_keys


Then, add (or create the file if it doesn’t exist) this to the ~/.ssh/config:

This tells SSH to use the key id_rsa_local when accessing localhost, or 0.0.0.0. This is necessary to start a one-node Hadoop cluster.

With this, now try:

ssh localhost


It should succeed.

For this step, you’ll need to download the Hadoop package and setup some configuration files. For me, I use CDH 5.4.x Hadoop version from here. After the downloading is finished, uncompress the file and put it in a proper location (e.g., /opt). Then, add the following to your .bash_profile:

export HADOOP_CLASSPATH=
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH  Also, you’ll need to add configs to a few files under $HADOOP_HOME/etc/hadoop:

1. core-site.xml:

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

2. hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/path/to/namenode/dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/path/to/datanode/dir</value>
</property>

3. mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

4. yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>


For the 2), you’ll need to specify a dir. I use a .hadoop dir under my home directory, and have dirs datanode and namenode under it.

Also, (at least for CDH 5.4.x) there’s a bug in $HADOOP_HOME/libexec/hadoop-conf.sh, which will result Yarn to use the hardcoded /bin/java, instead of /usr/bin/java. This can be fixed by following the comments in this JIRA. Now, you should be able to start hdfs and yarn by: start-dfs.sh start-yarn.sh  Check whether the services are started by using jps. You should see something like the following: 3296 NodeManager 3111 SecondaryNameNode 3017 DataNode 3210 ResourceManager 2943 NameNode  If the NameNode doesn’t start. You should check whether the datanode and namenode dirs have been created yet. Also, you can try hadoop namenode -format  to reformat the namenode dir. Install MySQL On Mac, MySQL can be installed via Homebrew. Simply use the following command: brew install mysql  After this, you can start MySQL server by: mysql.server restart  Next, you’ll need to create a user for Hive. Go to MySQL CLI: and type: > CREATE DATABASE metastore; > CREATE USER 'hive'@'localhost' IDENTIFIED BY 'some_pass'; > USE metastore; > GRANT ALL ON *.* TO 'hive'@'localhost' IDENTIFIED BY 'some_pass';  After installing the MySql database. You’ll need to run: bin/schematool -dbType mysql -initSchema  to initialize the database for Hive usage. Note that you only need to run this ONCE at the beginning. Install Hive First, you can clone the Hive repository here. Then, suppose you put the Hive repo under $HOME/git/hive, to build Hive, do:

mvn clean install -DskipTests -Phadoop-2


(Note that for upstream Hive the -Phadoop-2 flag is no longer necessary).

You also need to set $HIVE_HOME env variable. For this I usually set it to the path under: $HOME/git/hive/packaging/target/$HIVE_VERSION-bin/$HIVE_VERSION-bin, where $HIVE_VERSION is the version of Hive you’re using. For instance: apache-hive-1.1.0-cdh5.4.7. Also remember to add $HIVE_HOME/bin to your $PATH. Ther eare a few Hive configurations that need to be changed. First, in $HIVE_HOME/conf/hive-site.xml, you should have at least the following:

<name>javax.jdo.option.ConnectionURL</name>
<!-- useSSL to false to avoid warning messages -->
<value>jdbc:mysql://localhost/metastore?useSSL=false</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>


which tells Hive to use MySql as underlying meta DB and the metastore is the database to access.

To launch Hive, first you’ll need to start the MySQL server:

mysql.server start


To start HiveMetastore, do:

hive --service metastore &> /tmp/hive/metastore.log &


To start HiveServer2, do:

hive --service hiveserver2 &> /tmp/hive/hiveserver2.log &


Then you can launch Beeline and connect to localhost on port 10000, like normal.