手把手教你零基础搭建Hive数据仓库保姆级教程

张开发
2026/4/8 21:25:14 15 分钟阅读

分享文章

手把手教你零基础搭建Hive数据仓库保姆级教程
手把手教你零基础搭建Hive数据仓库保姆级教程文章目录手把手教你零基础搭建Hive数据仓库保姆级教程步骤一上传解压安装文件步骤二配置Hive环境变量步骤三MySql的安装步骤四修改配置文件步骤五基本测试你是否曾面对 HDFS 上海量的日志文件却不知如何快速统计出关键指标你是否厌倦了编写冗长且难以调试的 MapReduce代码只为完成一个简单的分组求和操作你是否梦想着能用熟悉的 SQL 语句轻松驾驭 PB 级别的数据海洋如果你的答案是肯定的那么 Apache Hive 正是你一直在寻找的利器。 Hive 构建在 Hadoop之上它将复杂的数据处理任务抽象成一张张我们熟悉的“表”让我们能够使用类似 SQL 的 HiveQL语言对大数据进行查询和分析。它极大地降低了大数据处理的门槛让数据分析师和后端开发者都能快速上手。 万丈高楼平地起一个配置正确的 Hive 环境是所有后续工作的基石。本教程将详细拆解 Hive的搭建步骤从依赖环境的准备到核心配置的解读力求清晰、准确帮助你顺利搭建起自己的第一个 Hive 数据仓库。配套软件包环境下载Hive数据仓库搭建配套环境软件安装包下载链接步骤一上传解压安装文件利用xftp软件将apache-hive-2.3.6-bin.tar.gzmysql-5.7.18-1.el7.x86_64.rpm-bundle.tar和mysql-connector-java-5.1.34-bin.jar三个文件上传到/opt/software目录下。使用tar命令解压安装包到指定目录,并重命名。[rootmaster ~]# cd /opt/software/[rootmaster software]# lsapache-hive-2.3.6-bin.tar.gz mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar hadoop-2.7.3.tar.gz mysql-connector-java-5.1.34-bin.jar jdk-8u231-linux-x64.tar.gz# 解压到统一安装目录[rootmaster software]# tar -zvxf apache-hive-2.3.6-bin.tar.gz -C /opt/module/[rootmaster software]# cd /opt/module/[rootmaster module]# lsapache-hive-2.3.6-bin hadoopjava[rootmaster module]# mv apache-hive-2.3.6-bin/ hive/[rootmaster module]# lshadoop hivejava步骤二配置Hive环境变量编辑/etc/profile文件。[rootmaster module]# cd[rootmaster ~]# vi /etc/profile将以下配置信息添加到/etc/profile文件的末尾。# HIVE_HOMEexportHIVE_HOME/opt/module/hiveexportPATH$HIVE_HOME/bin:$PATHexportHIVE_CONF_DIR$HIVE_HOME/conf执行source /etc/profile命令使配置的环境变量在系统全局范围生效。[rootmaster ~]# source /etc/profile以上命令只需要在hive安装的节点操作本文在master主节点进行Hive部署步骤三MySql的安装① 解压mysql安装包[rootmaster ~]# cd /opt/software/[rootmaster software]# lsapache-hive-2.3.6-bin.tar.gz mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar hadoop-2.7.3.tar.gz mysql-connector-java-5.1.34-bin.jar jdk-8u231-linux-x64.tar.gz[rootmaster software]# mkdir mysql[rootmaster software]# tar -vxf mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar -C mysql[rootmaster software]# rm -f mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar② 安装mysql包[rootmaster software]# cd mysql#检测是否已安装了mysql和mariadb[rootmaster mysql]# rpm -qa|grep mysql[rootmaster mysql]# rpm -qa|grep mariadb#卸载mariadb[rootmaster mysql]# rpm -e --nodeps mariadb-libs-5.5.64-1.el7.x86_64psrpm -qa 列出所有已安装软件包rpm -e packagename 删除软件包rpm -e --nodeps packagename 强制删除软件和依赖包rpm -q 包名 查询包是否安gr装rpm -ivh 包名 安装软件包#使用rpm安装mysql[rootmaster mysql]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm[rootmaster mysql]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm[rootmaster mysql]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm#安装server前需要安装两个依赖perl和net-tools[rootmaster mysql]# yum -y install perl[rootmaster mysql]# yum -y install net-tools[rootmaster mysql]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm③ 安装完查看安装情况[rootmaster mysql]# rpm -qa | grep mysqlmysql-community-common-5.7.18-1.el7.x86_64 mysql-community-client-5.7.18-1.el7.x86_64 mysql-community-libs-5.7.18-1.el7.x86_64 mysql-community-server-5.7.18-1.el7.x86_64④ 启动MySql服务并查看状态[rootmaster ~]# systemctl start mysqld[rootmaster ~]# systemctl status mysqld⑤ 查看MySql原始密码[rootmaster ~]# cat /var/log/mysqld.log | grep password2025-10-24T15:54:22.542871Z1[Note]A temporary password is generatedforrootlocalhost: jdNZjcex6Wo系统默认给的临时密码[rootmaster ~]# mysql -uroot -pEnter password: jdNZjcex6Wo刚才查到的密码复制过来密码不会显示粘贴完直接回车 也可以直接输入然后回车[rootmaster ~]# mysql -uroot -pjdNZjcex6Wo⑥ 进入MySql修改密码mysqlsetglobalvalidate_password_policy0;Query OK,0rows affected(0.00sec)mysqlsetglobalvalidate_password_length4;Query OK,0rows affected(0.00sec)mysqlalter userrootlocalhostidentified by1234;Query OK,0rows affected(0.00sec)mysqlexit;Bye⑦ 修改完重新登录[rootmaster ~]# mysql -uroot -p1234#进入mysql后修改所有用户登录权限和登录密码mysqlgrant all privileges on *.* toroot%identified by1234with grant option;grant赋权命令 all privileges当前用户的所有权限 on介词 *.*当前用户对所有数据库和表的相应操作权限 to介词 ‘root’’%’权限赋给root用户所有ip都能连接 Identified by1234连接时输入密码密码为1234 With grant option允许级联赋权#刷新权限mysqlflush privileges;#查看mysql中的数据库mysqlshow databases;--------------------|Database|--------------------|information_schema||mysql||performance_schema||sys|--------------------4rowsinset(0.00sec)⑧ 退出MySql用exit和quit都可以步骤四修改配置文件Hadoop 3.x版本中/opt/module/hadoop/share/hadoop/common/lib中的jar包版本为guava-27.0-jre.jar而hive中的jar包版本为guava-19.0.1.jar删除hive中版本低的jar包将hadoop中版本高的jar包复制到hive的lib中[rootmaster ~]# cd /opt/module/hive/lib[rootmaster libd]#rm -f guava-19.0.jar[rootmaster~]#cp /opt/module/hadoop/share/hadoop/common/lib/guava-27.0-jre-jar /opt/module/hive/lib① Hive-env.sh[rootmaster module]# cd hive[rootmaster hive]# lsbin conf hcatalog lib NOTICE scripts binary-package-licenses examples jdbc LICENSE RELEASE_NOTES.txt[rootmaster hive]# cd conf[rootmaster conf]# lsbeeline-log4j2.properties.template ivysettings.xml hive-default.xml.template llap-cli-log4j2.properties.template hive-env.sh.template llap-daemon-log4j2.properties.template hive-exec-log4j2.properties.template parquet-logging.properties hive-log4j2.properties.template#先复制一个模板过来进行修改[rootmaster conf]# cp hive-env.sh.template hive-env.sh[rootmaster conf]# lsbeeline-log4j2.properties.template hive-log4j2.properties.template hive-default.xml.template ivysettings.xml hive-env.sh llap-cli-log4j2.properties.template hive-env.sh.template llap-daemon-log4j2.properties.template hive-exec-log4j2.properties.template parquet-logging.properties[rootmaster conf]# vi hive-env.sh文件最后添加以下exportJAVA_HOME/opt/module/javaexportHADOOP_HOME/opt/module/hadoopexportHIVE_CONF_DIR/opt/module/hive/confexportHIVE_AUX_JARS_PATH/opt/module/hive/lib② Hive-site.xml[rootmaster conf]# vi hive-site.xml?xml version1.0??xml-stylesheet typetext/xsl hrefconfiguration.xsl?configurationpropertynamejavax.jdo.option.ConnectionURL/namevaluejdbc:mysql://master:3306/hive_db?createDatabaseIfNotExisttrue/value/property!--mysql用户名--propertynamejavax.jdo.option.ConnectionUserName/namevalueroot/value/property!--mysql中hive用户密码--propertynamejavax.jdo.option.ConnectionPassword/namevalue1234/value/property!--mysql驱动--propertynamejavax.jdo.option.ConnectionDriverName/namevaluecom.mysql.jdbc.Driver/value/propertypropertynamehive.downloaded.resources.dir/namevalue/opt/module/hive/tmp/value/propertypropertynamehive.exec.local.scratchdir/namevalue/opt/module/hive/tmp/${hive.session.id}_resources/value/propertypropertynamehive.querylog.location/namevalue/opt/module/hive/tmp/value/propertypropertynamehive.server2.logging.operation.log.location/namevalue/opt/module/hive/tmp/operation_logs/value/propertypropertynamehive.server2.webui.host/namevaluemaster/value/propertypropertynamehive.server2.webui.port/namevalue10002/value/property/configurationjavax.jdo.option.ConnectionURLJDBC连接字符串默认为自带的metastore_db数据库现以MySQL作为元数据存储地址需建立JDBC连接hive_db处为创建的MySQL数据库存储元数据。hive.downloaded.resources.dir远程资源下载的临时目录。hive.exec.local.scratchdirhive用来存储不同阶段的map/reduce的执行计划的目录同时也存储中间输出结果。hive.querylog.locationhive日志存放的地址hive.server2.logging.operation.log.locationhiveserver2操作日志存放地址hive.server2.webui.hosthiveserver2 WebUI页面访问地址hive.server2.webui.porthiveserver2 WebUI页面访问端口③ 将连接驱动jar包复制到lib目录下#复制jar包[rootmaster conf]# cp /opt/software/mysql-connector-java-5.1.34-bin.jar /opt/module/hive/lib/④ Hadoop集群启动#启动Hadoop集群 [rootmaster conf]# start-all.sh #在集群上创建文件夹 [rootmaster conf]# hdfs dfs -mkdir -p /user/hive/warehouse [rootmaster conf]# hdfs dfs -mkdir /tmp (如果存在tmp先删除再创建) [rootmaster conf]# hdfs dfs –rm –r /tmp [rootmaster conf]# hdfs dfs -mkdir /tmp⑤ 初始化数据库将Hive与MySQL连接把元数据写入MySql中生成一个hive_db的数据库[rootmaster conf]# schematool -initSchema -dbType mysql步骤五基本测试①进入MySql[rootmaster conf]# mysql -uroot -p1234mysql:[Warning]Using a password on thecommandline interface can be insecure. Welcome to the MySQL monitor. Commands end with;or\g. Your MySQL connectionidis12Server version:5.7.18 MySQL Community Server(GPL)Copyright(c)2000,2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Typehelp;or\hforhelp. Type\ctoclearthe current input statement.#显示所有数据库mysqlshow databases;--------------------|Database|--------------------|information_schema||hive_db||mysql||performance_schema||sys|--------------------5rowsinset(0.00sec)#选择hive_db数据库mysqluse hive_db;Reading table informationforcompletion of table andcolumnnames You can turn off this feature to get a quicker startup with-ADatabase changed#查看hive_db数据库中所有的表mysqlshow tables;---------------------------|Tables_in_hive_db|---------------------------|AUX_TABLE||BUCKETING_COLS||CDS||COLUMNS_V2||COMPACTION_QUEUE||COMPLETED_COMPACTIONS||COMPLETED_TXN_COMPONENTS||DATABASE_PARAMS||DBS||DB_PRIVS||DELEGATION_TOKENS||FUNCS||FUNC_RU||GLOBAL_PRIVS||HIVE_LOCKS||IDXS||INDEX_PARAMS||KEY_CONSTRAINTS||MASTER_KEYS||NEXT_COMPACTION_QUEUE_ID||NEXT_LOCK_ID||NEXT_TXN_ID||NOTIFICATION_LOG||NOTIFICATION_SEQUENCE||NUCLEUS_TABLES||PARTITIONS||PARTITION_EVENTS||PARTITION_KEYS||PARTITION_KEY_VALS||PARTITION_PARAMS||PART_COL_PRIVS||PART_COL_STATS||PART_PRIVS||ROLES||ROLE_MAP||SDS||SD_PARAMS||SEQUENCE_TABLE||SERDES||SERDE_PARAMS||SKEWED_COL_NAMES||SKEWED_COL_VALUE_LOC_MAP||SKEWED_STRING_LIST||SKEWED_STRING_LIST_VALUES||SKEWED_VALUES||SORT_COLS||TABLE_PARAMS||TAB_COL_STATS||TBLS||TBL_COL_PRIVS||TBL_PRIVS||TXNS||TXN_COMPONENTS||TYPES||TYPE_FIELDS||VERSION||WRITE_SET|---------------------------57rowsinset(0.00sec)mysqlexit;Bye②CLI访问hive[rootmaster conf]# hivewhich: no hbasein(/opt/module/hive/bin:/opt/module/hadoop/sbin:/opt/module/hadoop/bin:/opt/module/java/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found bindingin[jar:file:/opt/module/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found bindingin[jar:file:/opt/module/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is oftype[org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configurationinjar:file:/opt/module/hive/lib/hive-common-2.3.6.jar!/hive-log4j2.properties Async:trueHive-on-MR is deprecatedinHive2and may not be availableinthe future versions. Consider using a different execution engine(i.e. spark, tez)or using Hive1.X releases. hiveshow databases;OK default Time taken:9.547seconds, Fetched:1row(s)hiveuse dafault;FAILED: SemanticException[Error10072]: Database does not exist: dafault hiveuse default;OK Time taken:0.049seconds hiveshow tables;OK Time taken:0.07seconds hiveexit;③beeline远程访问hive#首先修改hadoop集群的配置文件core-site.xml在configuration标签中添加属性包括masterslave1和slave2propertynamehadoop.proxyuser.root.hosts/namevalue*/value/propertypropertynamehadoop.proxyuser.root.groups/namevalue*/value/property#然后重启Hadoop集群[rootmaster ~]# stop-all.sh[rootmaster ~]# start-all.sh#启动hiveserver2服务[rootmaster ~]# hiveserver2#另开一个窗口启动远程登录#退出beeline#启动了hiveserver2就可以去web页面端查看master:10002

更多文章