欢迎关注Hadoop、Spark、Flink、Hive、Hbase、Flume等大数据资料分享微信公共账号:iteblog_hadoop
  1. 文章总数:1013
  2. 浏览总数:12,643,274
  3. 评论:4033
  4. 分类目录:106 个
  5. 注册用户数:6491
  6. 最后更新:2019年3月20日
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
iteblog_hadoop
大数据技术博客公众号bigdata_ai
大数据猿:
bigdata_ai

这是Learning Spark的目录,点击相应的标题可以进入阅读页面。PDF文档正在整理中,完整之后会公布下载地址。


如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共帐号:iteblog_hadoop

Table of Contents

  1. Preface
    1. Audience
    2. How This Book is Organized
    3. Supporting Books
    4. Code Examples
    5. Early Release Status and Feedback
  2. 1. Introduction to Data Analysis with Spark
    1. What is Apache Spark?
    2. A Unified Stack
      1. Spark Core
      2. Spark SQL
      3. Spark Streaming
      4. MLlib
      5. GraphX
      6. Cluster Managers
    3. Who Uses Spark, and For What?
      1. Data Science Tasks
      2. Data Processing Applications
    4. A Brief History of Spark
    5. Spark Versions and Releases
    6. Spark and Hadoop
  3. 2. Downloading and Getting Started
    1. Downloading Spark
    2. Introduction to Spark’s Python and Scala Shells
    3. Introduction to Core Spark Concepts
    4. Standalone Applications
      1. Initializing a SparkContext
    5. Conclusion
  4. 3. Programming with RDDs
    1. RDD Basics
    2. Creating RDDs
    3. RDD Operations
      1. Transformations
      2. Actions
      3. Lazy Evaluation
    4. Passing Functions to Spark
      1. Python
      2. Scala
      3. Java
    5. Common Transformations and Actions
      1. Basic RDDs
        1. Transformations
        2. Element-wise transformations
        3. Pseudo Set Operations
        4. Actions
      2. Converting Between RDD Types
        1. Scala
        2. Java
        3. Python
    6. Persistence (Caching)
    7. Conclusion
  5. 4. Working with Key-Value Pairs
    1. Motivation
    2. Creating Pair RDDs
    3. Transformations on Pair RDDs
      1. Aggregations
        1. Tuning the Level of Parallelism
      2. Grouping Data
      3. Joins
      4. Sorting Data
    4. Actions Available on Pair RDDs
    5. Data Partitioning
      1. Determining an RDD’s Partitioner
      2. Operations that Benefit from Partitioning
      3. Operations that Affect Partitioning
      4. Example: PageRank
      5. Custom Partitioners
    6. Conclusion
  6. 5. Loading and Saving Your Data
    1. Motivation
    2. Choosing a Format
    3. Formats
      1. Text Files
      2. JSON
      3. CSV (Comma Separated Values) / TSV (Tab Separated Values)
      4. Sequence Files
      5. Object Files
      6. Hadoop Input and Output Formats
        1. Protocol Buffers
      7. Hive and Parquet
    4. File Systems
      1. Local/"Regular” FS
        1. Amazon S3
      2. HDFS
    5. Compression
    6. Databases
      1. Elasticsearch
      2. Mongo
      3. Cassandra
      4. HBase
      5. Java Database Connectivity (JDBC)
    7. Conclusion
  7. About the Authors
  8. Copyright
本博客文章除特别声明,全部都是原创!
转载本文请加上:转载自过往记忆(https://www.iteblog.com/)
本文链接: 【Learning Spark(目录)】(https://www.iteblog.com/learning-spark-table-of-contents/)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!
(1)个小伙伴在吐槽
  1. 谢谢分享
    xiaogang08052014-12-30 13:16 回复