This topic describes the public API changes that occurred between Apache Spark 1.6.1 and Spark 2.0.1.
| Category | Subcategory | Instead of this removed API... | Use... |
|---|---|---|---|
| GraphX | mapReduceTriplets | aggregateMessages | |
| runSVDPlusPlus | run | ||
| GraphKryoRegistrator | |||
| SQL | DataType | DataType.fromCaseClassString | DataType.fromJson |
| DecimalType | DecimalType() | DecimalType(precision, scale) to provide precision explicitly | |
| DecimalType(Option[PrecisionInfo]) | DecimalType(precision scale) | ||
| PrecisionInfo | DecimalType(precision, scale) | ||
| precisionInfo | precision and scale | ||
| Unlimited | (No longer supported) | ||
| Column | Column.in() | isin() | |
| DataFrame | toSchemaRDD | toDF | |
| createJDBCTable | write.jdbc() | ||
| saveAsParquetFile | write.parquet() | ||
| saveAsTable | write.saveAsTable() | ||
| save | write.save() | ||
| insertInto | write.mode(SaveMode.Append).saveAsTable() | ||
| DataframeReader | DataFrameReader.load(path) | option("path", path).load() | |
| Functions | cumeDist | cume_dist | |
| denseRank | dense_rank | ||
| percentRank | percent_rank | ||
| rowNumber | row_number | ||
| inputFileName | input_file_name | ||
| isNaN | isnan | ||
| sparkPartitionId | spark_partition_id | ||
| callUDF | udf | ||
| Core | SparkContext | Constructors no longer take prefferedNodeLocationData param | |
| tachyonFolderName | externalBlockStoreFolderName | ||
| initLocalProperties, clearFiles, clearJars | (No longer needed) | ||
| runJob method no longer takes allowLocal param | |||
| defaultMinSplits | defaultMinPartitions | ||
| [Double, Int, Long, Float]AccumulatorParam | implicit objects from AccumulatorParam | ||
| rddTo[Pair, Async, Sequence, Ordered]RDDFunctions | implicit functions from RDD | ||
| [double, numeric]RDDToDoubleRDDFunctions | implicit functions from RDD | ||
| intToIntWritable, longToLongWritable, floatToFloatWritable, doubleToDoubleWritable, boolToBoolWritable, bytesToBytesWritable, stringToText | implicit functions from WriteableFactory | ||
| [int, long, double, float, boolean, bytes, string, writable]WritableConverter | implicit functions from WritableConverter | ||
| TaskContext | runningLocally | isRunningLocally | |
| addOnCompleteCallback | addTaskCompletionListener | ||
| attemptId | attemptNumber | ||
| JavaRDDLike | splits | partitions | |
| toArray | collect | ||
| JavaSparkContext | defaultMinSplits | defaultMinPartitions | |
| clearJars, clearFiles | (No longer needed) | ||
| PairRDDFunctions | PairRDDFunctions.reduceByKeyToDriver | reduceByKeyLocally | |
| RDD | mapPartitionsWithContext | Taskcontext.get | |
| mapPartitionsWithSplit | mapPartitionsWithIndex | ||
| mapWith | mapPartitionsWithIndex | ||
| flatMapWith | mapPartitionsWithIndex and flatMap | ||
| foreachWith | mapPartitionsWithIndex and foreach | ||
| filterWith | mapPartitionsWithIndex and filter | ||
| toArray | collect | ||
| TaskInfo | TaskInfo.attempt | TaskInfo.attemptNumber | |
| Guava Optional | Guava Optional | org.apache.spark.api.java.Optional | |
| Vector | Vector, VectorSuite | ||
| Configuration options and params | --name | ||
| --driver-memory | spark.driver.memory | ||
| --driver-cores | spark.driver.cores | ||
| --executor-memory | spark.executor.memory | ||
| --executor-cores | spark.executor.cores | ||
| --queue | spark.yarn.queue | ||
| --files | spark.yarn.dist.files | ||
| --archives | spark.yarn.dist.archives | ||
| --addJars | spark.yarn.dist.jars | ||
| --py-files | spark.submit.pyFiles |
Spark 2.0.1 implements the following behavior changes: