分类：Spark函数

Spark函数讲解：cache

　　使用MEMORY_ONLY储存级别对RDD进行缓存，其内部实现是调用persist()函数的。官方文档定义：Persist this RDD with the default storage level (`MEMORY_ONLY`).函数原型[code lang="scala"]def cache() : this.type[/code]实例[code lang="scala"]/** * User: 过往记忆 * Date: 15-03-04 * Time: 下午06:30 * bolg: * 本文地址：/archives/1274 * 过往记忆博客，

w397090770 9年前 (2015-03-04) 14170℃ 0评论8喜欢

Spark函数讲解：aggregateByKey

　　该函数和aggregate类似，但操作的RDD是Pair类型的。Spark 1.1.0版本才正式引入该函数。官方文档定义：Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, as in scala.Traversabl

w397090770 9年前 (2015-03-02) 39554℃ 2评论35喜欢

Spark函数讲解：aggregate

　　我们先来看看aggregate函数的官方文档定义：Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions

w397090770 9年前 (2015-02-12) 37281℃ 5评论23喜欢

上一页
1
2
共 2 页