将spark依赖包传入HDFS_spark.yarn.jar和spark.yarn.archive的使用
2020-12-13 04:26
标签:park height bin dfs line strong 标记 font chmod 启动Spark任务时,在没有配置 1.在本地创建zip文件 注:zip包为全量包 2.上传至HDFS并更改权 3.配置spark-defaut.conf 1. 上传依赖jar包 2.配置spark-defaut.conf 注:本地配置local,hdfs标记为hdfs目录即可 将spark依赖包传入HDFS_spark.yarn.jar和spark.yarn.archive的使用 标签:park height bin dfs line strong 标记 font chmod 原文地址:https://www.cnblogs.com/yyy-blog/p/11110388.html一、参数说明
spark.yarn.archive
或者spark.yarn.jars
时, 会看到不停地上传jar,非常耗时;使用spark.yarn.archive
可以大大地减少任务的启动时间,整个处理过程如下。二、spark.yarn.archive使用
silent@bd01:~/env/spark$ cd jars/
silent@bd01:~/env/spark$ zip spark2.0.0.zip ./*
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -mkdir /tmp/spark-archive
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -put ./spark2.0.0.zip /tmp/spark-archive
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -chmod 775 /tmp/spark-archive/spark2.0.0.zip.zip
spark.yarn.archive hdfs:///tmp/spark-archive/spark2.0.0.zip
三、spark.yarn.jars使用说明
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -mkdir hdfs://bd01/user/asiainfo/jars/
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -put ./spark2.0.0.zip hdfs://bd01/user/asiainfo/jars/
silent@bd01:~/env/spark$ /usr/ndp/current/hdfs_client/bin/hdfs dfs -chmod 775 hdfs://bd01/user/asiainfo/jars/spark2.0.0.zip.zipspark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/jars/*,local:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/hive/*,hdfs://bd01/user/asiainfo/jars/*.jar
上一篇:排序算法之计数排序
下一篇:全栈工程师之Java基础篇(二)
文章标题:将spark依赖包传入HDFS_spark.yarn.jar和spark.yarn.archive的使用
文章链接:http://soscw.com/essay/29492.html