本文基于B站视频教程2022最新黑马程序员大数据Hadoop入门视频教程,最适合零基础自学的大数据Hadoop教程 ,p51-p83,本文软件版本,行文顺序等可能与视频略有不同
所需安装包等可以关注【黑马程序员】公众号,回复【hadoop】获取
一. Hive理解
架构图
二、 安装 1.安装MySQL 1.1 卸载mariadb 卸载Centos7自带的mariadb,先查找已经安装的mariadb
卸载
1 rpm -e mariadb-libs-5.5.68-1.el7.x86_64 --nodeps
1.2 安装MySQL 1 2 3 4 5 6 7 mkdir /export/software/mysqlcd /export/software/mysql tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar yum -y install libaio yum -y install net-tools rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm
1.3 MySQL初始化设置 1 2 3 4 5 6 7 8 9 10 11 12 mysqld --initializechown mysql:mysql /var/lib/mysql -R systemctl start mysqld.servicecat /var/log/mysqld.log
1.4 准备使用 修改root密码 授权远程访问 设置开机自启动
然后输入上边生成的密码,回车,通过下边的命令修改root密码为hadoop
1 alter user user () identified by "hadoop";
授权
1 2 3 use mysql;GRANT ALL PRIVILEGES ON * .* TO 'root' @'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION; FLUSH PRIVILEGES;
Ctrl
+D
退出MySQL命令界面。
MySQL启、停、状态命令
1 2 3 systemctl stop mysqld systemctl status mysqld systemctl start mysqld
设置开机启动
1.5 干净卸载 CentOS7 干净卸载MySQL 5.7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # 关闭mysql服务 systemctl stop mysqld.service# 查找安装mysql的rpm包 [root@node3 ~]# rpm -qa | grep -i mysql mysql-community-libs-5.7.29-1.el7.x86_64 mysql-community-common-5.7.29-1.el7.x86_64 mysql-community-client-5.7.29-1.el7.x86_64 mysql-community-server-5.7.29-1.el7.x86_64# 卸载 [root@node3 ~]# yum remove mysql-community-libs-5.7.29-1.el7.x86_64 mysql-community-common-5.7.29-1.el7.x86_64 mysql-community-client-5.7.29-1.el7.x86_64 mysql-community-server-5.7.29-1.el7.x86_64# 查看是否卸载干净 rpm -qa | grep -i mysql# 查找mysql相关目录 删除 [root@node1 ~]# find / -name mysql /var/lib/mysql /var/lib/mysql/mysql /usr/share/mysql [root@node1 ~]# rm -rf /var/lib/mysql [root@node1 ~]# rm -rf /var/lib/mysql/mysql [root@node1 ~]# rm -rf /usr/share/mysql# 删除默认配置 日志 rm -rf /etc/my.cnf rm -rf /var/log/mysqld.log
2. 安装Hive 2.1 上传安装包 上传Hive安装包到【/export/server/】目录下,然后解压
1 2 cd /export/server/ tar zxvf apache-hive-3.1.2-bin.tar.gz
解决Hive与Hadoop之间guava版本差异
1 2 3 cd /export/server/apache-hive-3.1.2-bin/rm -f lib/guava-19.0.jarcp /export/server/hadoop-3.3.0/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/
2.2 修改配置文件
hive-env.sh
1 2 3 cd /export/server/apache-hive-3.1.2-bin/confmv hive-env.sh.template hive-env.sh vim hive-env.sh
结尾追加
1 2 3 export HADOOP_HOME=/export/server/hadoop-3.3.0export HIVE_CONF_DIR=/export/server/apache-hive-3.1.2-bin/confexport HIVE_AUX_JARS_PATH=/export/server/apache-hive-3.1.2-bin/lib
hive-site.xml
直接vim打开新文件
编辑如下
点击展开配置内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 <configuration > <property > <name > javax.jdo.option.ConnectionURL</name > <value > jdbc:mysql://node1:3306/hive3?createDatabaseIfNotExist=true& useSSL=false& useUnicode=true& characterEncoding=UTF-8</value > </property > <property > <name > javax.jdo.option.ConnectionDriverName</name > <value > com.mysql.jdbc.Driver</value > </property > <property > <name > javax.jdo.option.ConnectionUserName</name > <value > root</value > </property > <property > <name > javax.jdo.option.ConnectionPassword</name > <value > hadoop</value > </property > <property > <name > hive.server2.thrift.bind.host</name > <value > node1</value > </property > <property > <name > hive.metastore.uris</name > <value > thrift://node1:9083</value > </property > <property > <name > hive.metastore.event.db.notification.api.auth</name > <value > false</value > </property > </configuration >
2.3 上传驱动 上传MySQL jdbc驱动到Hive安装包lib下
2.4 初始化元数据 1 2 3 cd /export/server/apache-hive-3.1.2-bin/ bin/schematool -initSchema -dbType mysql -verbos
2.5 解决注释信息中文乱码 MySQL 中执行以下语句
1 2 3 4 5 6 use hive3;alter table hive3.COLUMNS_V2 modify column COMMENT varchar (256 ) character set utf8;alter table hive3.TABLE_PARAMS modify column PARAM_VALUE varchar (4000 ) character set utf8;alter table hive3.PARTITION_PARAMS modify column PARAM_VALUE varchar (4000 ) character set utf8;alter table hive3.PARTITION_KEYS modify column PKEY_COMMENT varchar (4000 ) character set utf8;alter table hive3.INDEX_PARAMS modify column PARAM_VALUE varchar (4000 ) character set utf8;
2.6 创建Hive存储目录 在HDFS创建Hive存储目录(如存在则不用操作)
1 2 3 4 hadoop fs -mkdir /tmp hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse
三、 使用 1. 启动Hive 1 2 3 4 5 6 7 /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore --hiveconf hive.root.logger=DEBUG,console nohup /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore &
1.2 启动hiveserver2服务 1 2 nohup /export/server/apache-hive-3.1.2-bin/bin/hive --service hiveserver2 &
2. beeline客户端连接 拷贝node1安装包到node3上
1 scp -r /export/server/apache-hive-3.1.2-bin/ node3:/export/server/
连接
1 2 3 4 5 /export/server/apache-hive-3.1.2-bin/bin/beeline beeline> ! connect jdbc:hive2://node1:10000 beeline> root beeline> 直接回车
3. DataGrip连接Hive(略) 4. 库表语法 4.1 库
查看库
创建库
1 create database [if not exists ] ifnxs [comment "库描述"] [with dbproperties ('createdBy' = 'dr34m' )];
使用库
删除库
1 drop database [if exists ] test [cascade];
cascade表示强制删除,默认为restrict,这意味着仅在数据库为空时才删除它
4.2 表
查看表
创建表
1 2 3 create table [if not exists ] [xxx.]zzz (col_name data_type [comment "字段描述"], ...) [comment "表描述"] [row format delimited ...];
数据类型
最常用string
和int
分隔符
例
1 2 3 4 5 6 create table ifnxs.t_user ( id int comment "编号", name string comment "姓名" ) comment "用户表"row format delimited fields terminated by "\t";
从select创建表
1 create table xxx as select id,name from ifnxs.t_user;
查看表结构
删除表
1 drop table [if exists ] xxx;
5. DML语法与函数 5.1 Load语法规则 1 LOAD DATA [LOCAL ] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename;
5.2 Insert语法 Hive推荐清洗数据成为结构化文件,再使用Load语法加载数据到表中
insert+select例 1 insert into user select id,name from student;
5.3 select语法 1 2 3 4 5 6 SELECT [ALL | DISTINCT ] select_expr, select_expr, ...FROM table_reference [WHERE where_condition] [GROUP BY col_list] [ORDER BY col_list] [LIMIT [offset ,] rows ];
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 select * from t_usa_covid19;select county, cases, deaths from t_usa_covid19;select 1 from t_usa_covid19;select current_database(); select distinct state from t_usa_covid19;select distinct county,state from t_usa_covid19;
5.3.1 where条件
5.3.2 聚合操作 count
sum
max
min
avg
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 select count (county) as county_cnts from t_usa_covid19;select count (distinct county) as county_cnts from t_usa_covid19;select count (county) from t_usa_covid19 where state = "California";select sum (deaths) from t_usa_covid19 where state = "Texas";select max (cases) from t_usa_covid19;
5.3.3 GROUP BY GROUP BY语句用于结合聚合函数,根据一个或多个列对结果集进行分组
1 2 3 4 5 6 7 8 select count (county) from t_usa_covid19 where count_date = "2021-01-28" group by state;select state,count (county) as county_nums from t_usa_covid19 where count_date = "2021-01-28" group by state;select state,count (county),sum (deaths) from t_usa_covid19 where count_date = "2021-01-28" group by state;
5.3.4 HAVING 由于SQL执行顺序决定where在分组前执行,所以where中不能使用聚合函数,比如下边的错误示范
1 2 select state,sum (deaths) from t_usa_covid19 where count_date = "2021-01-28" and sum (deaths) > 10000 group by state;
例
1 2 3 4 5 select state,sum (deaths) from t_usa_covid19 where count_date = "2021-01-28" group by state having sum (deaths) > 10000 ;select state,sum (deaths) as cnts from t_usa_covid19 where count_date = "2021-01-28" group by state having cnts> 10000 ;
5.3.6 ORDER BY 1 2 3 4 5 6 7 8 select * from t_usa_covid19 order by cases;select * from t_usa_covid19 order by cases asc ;select * from t_usa_covid19 where state = "California" order by cases desc ;
5.3.7 LIMIT 1 2 3 4 5 6 7 8 9 select * from t_usa_covid19 where count_date = "2021-01-28" and state = "California";select * from t_usa_covid19 where count_date = "2021-01-28" and state = "California" limit 5 ;select * from t_usa_covid19 where count_date = "2021-01-28" and state = "California" limit 2 ,3 ;
5.3.8 执行顺序
from
> where
> group
(含聚合)> having
> order
> select
聚合语句(sum,min,max,avg,count)要比having子句优先执行
where子句在查询过程中执行优先级别优先于聚合语句(sum,min,max,avg,count)
5.4 JOIN
join_table
1 2 3 table_reference [INNER ] JOIN table_factor [join_condition]| table_reference {LEFT } [OUTER ] JOIN table_reference join_condition
join_condition:
5.4.1 inner join 内连接 其中inner可以省略:inner join == join
1 2 3 4 5 6 7 8 9 10 11 12 13 14 select e.id,e.name,e_a.city,e_a.streetfrom employee e inner join employee_address e_aon e.id = e_a.id;select e.id,e.name,e_a.city,e_a.streetfrom employee e join employee_address e_aon e.id = e_a.id;select e.id,e.name,e_a.city,e_a.streetfrom employee e, employee_address e_awhere e.id = e_a.id;
5.4.2 left join 左连接 左外连接(Left Outer Join)或者左连接,其中outer可以省略
1 2 3 4 5 6 7 8 select e.id,e.name,e_conn.phno,e_conn.emailfrom employee e left join employee_connection e_connon e.id = e_conn.id;select e.id,e.name,e_conn.phno,e_conn.emailfrom employee e left outer join employee_connection e_connon e.id = e_conn.id;
5.5 函数 5.5.1 概述
查看所有可用函数
描述函数用法
1 describe function extended xxxx;
例
1 describe function extended count;
分类
5.5.2 常用内置函数 官网
String Functions 字符串函数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 select length("itcast");select reverse("itcast");select concat("angela","baby");select concat_ws('.' , 'www' , array ('itcast' , 'cn' ));select substr("angelababy",-2 ); select substr("angelababy",2 ,2 );select split('apache hive' , ' ' );select split('apache hive' , ' ' )[0 ];select split('apache hive' , ' ' )[1 ];
Date Functions 日期函数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 select current_date ();select unix_timestamp();select unix_timestamp("2011-12-07 13:01:03");select unix_timestamp('20111207 13:01:03' ,'yyyyMMdd HH:mm:ss' );select from_unixtime(1618238391 );select from_unixtime(0 , 'yyyy-MM-dd HH:mm:ss' );select datediff('2012-12-08' ,'2012-05-09' );select date_add('2012-02-28' ,10 );select date_sub('2012-01-1' ,10 );
相比视频额外扩展,工作中常用
1 2 3 4 select date_format(current_date (), 'yyyy-MM' );select * from userlist where date_format(create_time, 'yyyy-MM' ) = '2022-01' ;
Mathematical Functions 数学函数 1 2 3 4 5 6 7 8 select round(3.1415926 );select round(3.1415926 ,4 );select rand();select rand(3 );
Conditional Functions 条件函数 1 2 3 4 5 6 7 8 9 10 11 select if(1 = 2 ,100 ,200 );select if(sex = '男' ,'M' ,'W' ) from student limit 3 ;select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end ;select case sex when '男' then 'male' else 'female' end from student limit 3 ;select nvl("allen","itcast");select nvl(null ,"itcast");