标签：Spark

Spark将计算结果写入到Mysql中

　　建议用Spark 1.3.0提供的写关系型数据库的方法，参见《Spark RDD写入RMDB(Mysql)方法二》。　　在《Spark与Mysql(JdbcRDD)整合开发》文章中我们介绍了如何通过Spark读取Mysql中的数据，当时写那篇文章的时候，Spark还未提供通过Java来使用JdbcRDD的API，不过目前的Spark提供了Java使用JdbcRDD的API。　　今天主要来谈谈如果将Spark计算的结果

w397090770 9年前 (2015-03-10) 36810℃ 5评论33喜欢

Spark

Apache Spark相比Hadoop的优势

　　以下的话是由Apache Spark committer的Reynold Xin阐述。　　从很多方面来讲，Spark都是MapReduce 模式的最好实现。比如从程序抽象的角度来看：　　1、他抽象出Map/Reduce两个阶段来支持tasks的任意DAG。大多数计算通过依赖将maps和reduces映射到一起(Most computation maps (no pun intended) into many maps and reduces with dependencies among them. )。而在Spark的RDD

w397090770 9年前 (2015-03-09) 8035℃ 0评论9喜欢

Spark

Spark函数讲解：coalesce

　　对RDD中的分区重新进行合并。函数原型[code lang="scala"]def coalesce(numPartitions: Int, shuffle: Boolean = false)　　　　(implicit ord: Ordering[T] = null): RDD[T][/code]　　返回一个新的RDD，且该RDD的分区个数等于numPartitions个数。如果shuffle设置为true，则会进行shuffle。实例[code lang="scala"]/** * User: 过往记忆 * Date: 15-03-09 * Time: 上午0

w397090770 9年前 (2015-03-09) 14119℃ 1评论5喜欢

Spark

Spark函数讲解序列文章

　　本博客近日将对Spark 1.2.1 RDD中所有的函数进行讲解，主要包括函数的解释，实例以及注意事项，每日一篇请关注。以下是将要介绍的函数，按照字母的先后顺序进行介绍，可以点的说明已经发布了。　　aggregate、aggregateByKey、cache、cartesian、checkpoint、coalesce、cogroup groupWith collect, toArraycollectAsMap combineByKey computecontext, spar

w397090770 9年前 (2015-03-08) 7233℃ 0评论6喜欢

Spark

Spark函数讲解：checkpoint

　　为当前RDD设置检查点。该函数将会创建一个二进制的文件，并存储到checkpoint目录中，该目录是用SparkContext.setCheckpointDir()设置的。在checkpoint的过程中，该RDD的所有依赖于父RDD中的信息将全部被移出。对RDD进行checkpoint操作并不会马上被执行，必须执行Action操作才能触发。函数原型[code lang="scala"]def checkpoint()[/code]实例

w397090770 9年前 (2015-03-08) 60511℃ 0评论7喜欢

Spark

Spark函数讲解：cartesian

　　从名字就可以看出这是笛卡儿的意思，就是对给的两个RDD进行笛卡儿计算。官方文档说明：Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.函数原型[code lang="scala"]def cartesian[U: ClassTag](other: RDD[U]): RDD[(T, U)][/code]　　该函数返回的是Pair类型的RDD，计算结果

w397090770 9年前 (2015-03-07) 11175℃ 0评论5喜欢

Spark

Spark函数讲解：cache

　　使用MEMORY_ONLY储存级别对RDD进行缓存，其内部实现是调用persist()函数的。官方文档定义：Persist this RDD with the default storage level (`MEMORY_ONLY`).函数原型[code lang="scala"]def cache() : this.type[/code]实例[code lang="scala"]/** * User: 过往记忆 * Date: 15-03-04 * Time: 下午06:30 * bolg: * 本文地址：/archives/1274 * 过往记忆博客，

w397090770 9年前 (2015-03-04) 14170℃ 0评论8喜欢

Spark

Spark函数讲解：aggregateByKey

　　该函数和aggregate类似，但操作的RDD是Pair类型的。Spark 1.1.0版本才正式引入该函数。官方文档定义：Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, as in scala.Traversabl

w397090770 9年前 (2015-03-02) 39549℃ 2评论35喜欢

Scala

Spark函数讲解：aggregate

　　我们先来看看aggregate函数的官方文档定义：Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions

w397090770 9年前 (2015-02-12) 37277℃ 5评论23喜欢

Spark

Learning Spark完整版下载

Learning Spark这本书链接是完整版，和之前的预览版是不一样的，我不是标题党。这里提供的Learning Spark电子书格式是mobi、pdf以及epub三种格式的文件，如果你有亚马逊Kindle电子书阅读器，是可以直接阅读mobi、pdf。但如果你用电脑，也可以下载相应的PC版阅读器。如果你需要阅读器，可以找我。如果想及时了解Spark、Hadoop或者Hbase相

w397090770 9年前 (2015-02-11) 50570℃ 305评论70喜欢

上一页
1
···
31
32
33
34
35
36
37
38
39
40
41
...
44
下一页
共 44 页