Hive 环境搭建

2017/6/22 posted in  软件配置 comments

前提

  1. 已经搭建并启动 Hadoop 集群。
  2. hadoop 命令已经加入到 PATH 中,否则需要手动配置 HADOOP_HOME
  3. 下载并解压 apache-hive-2.1.1-bin.tar.gz,并把 Hive 的 bin 加入到 PATH 中
  4. 已安装并启动 MySQL 数据库
  5. 下载对应的 JDBC 驱动(如:mysql-connector-java-5.1.41.jar),然后放到 Hive 的 lib 目录下

配置

编辑 conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- HDFS 上的存储路径 -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/hive/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

  <!-- 本地的存储路径 -->
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/Users/tangjiujun/Applications/Hive/tmp/scratchdir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/Users/tangjiujun/Applications/Hive/tmp/resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/Users/tangjiujun/Applications/Hive/tmp/querylog</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/Users/tangjiujun/Applications/Hive/tmp/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

  <!-- 元信息储存位置 -->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive_metastore_db?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>
  
  <!-- 默认 Thrift 连接的用户名和密码 -->
  <property>
    <name>hive.server2.thrift.client.user</name>
    <value>tangjiujun</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>123456</value>
    <description>Password to use against thrift client</description>
  </property>
</configuration>

启动

# 初始化 Hive, 将 MySQL 作为 Hive 的元数据库。仅首次启动前运行一次
schematool -dbType mysql -initSchema

# 在同一个进程中启动服务并连接,仅用于测试
beeline -u jdbc:hive2://

# 启动 HiveServer,默认连接端口为 10000,Web 访问地址是:http://localhost:10002
hiveserver2

# 连接 Hive
beeline -u jdbc:hive2://localhost:10000 

常见问题

  1. User: tangjiujun is not allowed to impersonate anonymous

    在 hadoop 中 etc/core-site.xml 中增加如下配置并重启

    <property>
        <name>hadoop.proxyuser.tangjiujun.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.tangjiujun.groups</name>
        <value>*</value>
    </property>