Hadoop+Hive+Mysql數(shù)據(jù)分析與入庫 | 您所在的位置:網(wǎng)站首頁 › 屬龍的找什么樣的屬相好婚姻 › Hadoop+Hive+Mysql數(shù)據(jù)分析與入庫 |
環(huán)境準(zhǔn)備
JDK 1.8Hadoop環(huán)境 集群模式(偽分布式亦可),3.3.0hive環(huán)境 3.1.2mysql環(huán)境 5.7sqoop工具 1.4.7_hadoop
以上工具安裝步驟省略,hadoop集群安裝參考 配置hive及初始化 配置元數(shù)據(jù)到mysql編輯 $HIVE_HOME/conf/hive-site.xml文件 javax.jdo.option.ConnectionURL jdbc:mysql://rm-2ze8m9018ujn2k1s3.mysql.rds.aliyuncs.com:3306/hive javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName iwanvi_wr javax.jdo.option.ConnectionPassword 2019@iwanvi_rds 初始化注意需要與配置選擇一致 $HIVE_HOME/bin/schematool -dbType mysql -initSchema ## dbType有多種: # derby 本地服務(wù)器數(shù)據(jù)庫,java編寫,小巧 # mysql # postgresql 使用hive本文僅簡(jiǎn)單示例,詳細(xì)使用請(qǐng)參考官方文檔 注意 hive啟動(dòng)需要hadoop先啟動(dòng) 訪問hive # 如果僅使用可以直接啟動(dòng): $HIVE_HOME/bin/hive # 如果需要啟動(dòng)hive和webui則啟動(dòng): $HIVE_HOME/bin/hiveserver2 ## hive端口 10000,可使用beeline訪問,如果有驗(yàn)證,請(qǐng)已正確用戶登錄 beeline -u jdbc:hive2://localhost:10000 ## web ui 端口10002 hive ddl本文僅簡(jiǎn)單示例,詳細(xì)使用請(qǐng)參考官方文檔 # create 創(chuàng)建;show 展示;describe 詳細(xì);drop 刪除;alter 修改 hive> create database db_name; # 創(chuàng)建表 ## ROW FORMAT 行標(biāo)準(zhǔn) ## FIELDS TERMINATED BY 字段分隔符 ## LOCATION 數(shù)據(jù)庫數(shù)據(jù)存儲(chǔ)位置 ## PARTITIONED BY 分區(qū)字段 hive> CREATE TABLE IF NOT EXISTS dmp_tb ( \ `ip` string comment 'source ip', \ `time` bigint comment 'log record time', \ `did` string comment 'user device unique sign', \ `host` string comment 'ad land url host address') \ ROW FORMAT DELIMITED \ FIELDS TERMINATED BY '|' \ PARTITIONED BY (`hour` string) \ LOCATION '/data/logs/mysql/dmp_mysql_db'; hive dml本文僅簡(jiǎn)單示例,詳細(xì)使用請(qǐng)參考官方文檔 # load 加載數(shù)據(jù);insert 插入數(shù)據(jù);select 查詢;update 更新;delete 刪除 ## 注意 hive推薦增量添加,不推薦修改,所以默認(rèn)不支持update和delete,需要開啟事務(wù) # 加載數(shù)據(jù) local為本地文件標(biāo)識(shí),否則以hdfs文件目錄查找;OVERWRITE 是否重寫表 hive> LOAD DATA [LOCAL] INPATH '/data/logs/test.log' [OVERWRITE] INTO TABLE dmp_tb; # 插入新數(shù)據(jù) OVERWRITE是否重寫表 hive> insert [OVERWRITE] into dmp_tb(`ip`,`time`,`did`,`host`) values('192.168.78.100','165885454','dfsdsfew5465','www.test.com'); # 從查詢結(jié)果插入新數(shù)據(jù) hive> insert into dmp_tb select dmp.`ip`,dmp.`time`,dmp.`did`,dmp.`host` from dmp_tb; # select 不僅可以從表中查找數(shù)據(jù),還可以從方法中查找結(jié)果 hive> select * from dmp_tb; Hive表與Mysql同步本文使用sqoop工具同步,詳細(xì)命令請(qǐng)參考sqoop help 需要準(zhǔn)備以下信息 mysql信息(數(shù)據(jù)庫連接、作用戶、用戶密碼、同步表、同步字段[如果全同步可以不需要])hive信息(待同步表或者表目錄、字段分割符) # 示例 sqoop export --connect "jdbc:mysql://db.test.com:3306/hive?characterEncoding=UTF-8" --username root --password root --table dmp_test --fields-terminated-by '|' --columns "ip,time,did,host" --export-dir '/data/logs/mysql/dmp_mysql_db' # 小提示 --password root 可以更換為 -Proot 喲 提示以上所有命令,如果執(zhí)行異常,都可以根據(jù)即時(shí)日志查看,如果涉及hadoop,可通過hadoop服務(wù)器:8088查看各個(gè)任務(wù)mr的日志,具體請(qǐng)參考實(shí)際環(huán)境 整理Hive問題1、需要配置HADOOP_HOME環(huán)境變量 Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path2、Hive中部分包與Hadoop的版本不一致:guava.jar,將Hadoop的該包( H A D O O P H O M E / s h a r e d / h a d o o p / c o m m o n / ) 替 換 到 h i v e 的 ( HADOOP_HOME/shared/hadoop/common/)替換到hive的( HADOOPH?OME/shared/hadoop/common/)替換到hive的(HIVE_HOME/lib/) Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V3、類加載器無法轉(zhuǎn)換:JDK版本問題,之前使用了JDK 11導(dǎo)致hive啟動(dòng)失敗,可以改為JDK1.8(官方建議1.7) java.lang.RuntimeException: Error applying authorization policy on hive configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')4、hadoop未啟動(dòng):必須先啟動(dòng)hadoop,hive啟動(dòng)會(huì)主動(dòng)連接,啟動(dòng)后會(huì)自動(dòng)恢復(fù)正常 java.lang.RuntimeException: Error applying authorization policy on hive configuration: java.net.ConnectException: Call From tmp.dongle.com/192.168.78.100 to tmp.dongle.com:9000 failed on connection exception: java.net.ConnectException: 拒絕連接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused5、beeline連接錯(cuò)誤:提示root不允許 org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate anonymous修改hadoop中的core-site.xml文件,修改一下配置(注意根據(jù)不同用戶替換root) hadoop.proxyuser.root.hosts * hadoop.proxyuser.root.groups *6、hive無權(quán)限訪問hdfs文件 Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode="/data/logs/test/dmp":hadoop:hadoop:drwxr-xr-x修改hadoop中的hdfs-site.xml文件,修改配置 dfs.permissions false7、未知Hadoop域名 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop-master) FAILED: SemanticException Unable to determine if hdfs://hadoop-master:9000/data/logs/test/dmp is encrypted: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop-master這里著重說一下,如果hadoop配置的是域名設(shè)置或者域名做了更改,hive最好重新初始化,或者做以下更改 # 如果事務(wù)沒有開啟,最好直接通過數(shù)據(jù)庫修改,如果通過hive會(huì)提示事務(wù)未開啟,不能修改的問題 hive> update DBS set DB_LOCATION_URI=REPLACE(DB_LOCATION_URI,'old host','new host'); hive> update SDS set LOCATION=REPLACE(LOCATION,'old host','new host');8、事務(wù)不支持 FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.更新刪除需要事務(wù)支持,并且hive不推薦修改,盡量只做增量添加數(shù)據(jù)作 hive.support.concurrency true hive.enforce.bucketing true hive.exec.dynamic.partition.mode nonstrict hive.txn.manager org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on true hive.compactor.worker.threads 1 hive.in.test true |
CopyRight 2018-2019 實(shí)驗(yàn)室設(shè)備網(wǎng) 版權(quán)所有 |