Spark 2.0.1-1611 API Changes

This topic describes the public API changes that occurred between Apache Spark 1.6.1 and Spark 2.0.1.

Removed Methods

The following items have been removed from Apache Spark 2.0.1:
  • Bagel (the Spark implementation of Google Pregel)
  • Most of the deprecated methods from Spark 1.x, including:
Category Subcategory Instead of this removed API... Use...
GraphX   mapReduceTriplets aggregateMessages
    runSVDPlusPlus run
    GraphKryoRegistrator  
SQL DataType DataType.fromCaseClassString DataType.fromJson
  DecimalType DecimalType() DecimalType(precision, scale) to provide precision explicitly
    DecimalType(Option[PrecisionInfo]) DecimalType(precision scale)
    PrecisionInfo DecimalType(precision, scale)
    precisionInfo precision and scale
    Unlimited (No longer supported)
  Column Column.in() isin()
  DataFrame toSchemaRDD toDF
    createJDBCTable write.jdbc()
    saveAsParquetFile write.parquet()
    saveAsTable write.saveAsTable()
    save write.save()
    insertInto write.mode(SaveMode.Append).saveAsTable()
  DataframeReader DataFrameReader.load(path) option("path", path).load()
  Functions cumeDist cume_dist
    denseRank dense_rank
    percentRank percent_rank
    rowNumber row_number
    inputFileName input_file_name
    isNaN isnan
    sparkPartitionId spark_partition_id
    callUDF udf
Core SparkContext Constructors no longer take prefferedNodeLocationData param
    tachyonFolderName externalBlockStoreFolderName
    initLocalProperties, clearFiles, clearJars (No longer needed)
    runJob method no longer takes allowLocal param
    defaultMinSplits defaultMinPartitions
    [Double, Int, Long, Float]AccumulatorParam implicit objects from AccumulatorParam
    rddTo[Pair, Async, Sequence, Ordered]RDDFunctions implicit functions from RDD
    [double, numeric]RDDToDoubleRDDFunctions implicit functions from RDD
    intToIntWritable, longToLongWritable, floatToFloatWritable, doubleToDoubleWritable, boolToBoolWritable, bytesToBytesWritable, stringToText implicit functions from WriteableFactory
    [int, long, double, float, boolean, bytes, string, writable]WritableConverter implicit functions from WritableConverter
  TaskContext runningLocally isRunningLocally
    addOnCompleteCallback addTaskCompletionListener
    attemptId attemptNumber
  JavaRDDLike splits partitions
    toArray collect
  JavaSparkContext defaultMinSplits defaultMinPartitions
    clearJars, clearFiles (No longer needed)
  PairRDDFunctions PairRDDFunctions.reduceByKeyToDriver reduceByKeyLocally
  RDD mapPartitionsWithContext Taskcontext.get
    mapPartitionsWithSplit mapPartitionsWithIndex
    mapWith mapPartitionsWithIndex
    flatMapWith mapPartitionsWithIndex and flatMap
    foreachWith mapPartitionsWithIndex and foreach
    filterWith mapPartitionsWithIndex and filter
    toArray collect
  TaskInfo TaskInfo.attempt TaskInfo.attemptNumber
  Guava Optional Guava Optional org.apache.spark.api.java.Optional
  Vector Vector, VectorSuite  
Configuration options and params   --name  
    --driver-memory spark.driver.memory
    --driver-cores spark.driver.cores
    --executor-memory spark.executor.memory
    --executor-cores spark.executor.cores
    --queue spark.yarn.queue
    --files spark.yarn.dist.files
    --archives spark.yarn.dist.archives
    --addJars spark.yarn.dist.jars
    --py-files spark.submit.pyFiles
Note also the following deprecated configuration options and parameters:
  • Methods from Python DataFrame that returned RDD have been moved to dataframe.rdd. For example, df.map is now df.rdd.map.
  • Some streaming connectors (Twitter, Akka, MQTT, and ZeroMQ) have been removed.
  • org.apache.spark.shuffle.hash.HashShuffleManager no longer exists. SortShuffleManager is the default since Spark 1.2.
  • DataFrame is no longer a class. It is a subtype of DataSet.

Behavior Changes

Spark 2.0.1 implements the following behavior changes:

Other Deprecated Items