MapR supports most Spark features. However, there a few exceptions.
- Spark Thrift JDBC/ODBC Server Support
- Running the Spark Thrift JDBC/ODBC Server on a secure cluster is supported only on
Spark 2.1.0 or later.
- You can run the Spark Thrift JDBC/ODBC Server to enable connections to Hive 1.2.1
using Beeline; however, you can connect only to Hive versions supported by your Spark
version.
- Spark SQL and Hive Support for Spark 2.1.0
- Spark 2.1.0 is able to connect to Hive 2.1 Metastore; however, only features of Hive
1.2 are supported.
- Spark SQL and Hive Support for Spark 2.0.1
- Spark SQL is supported, but it is not fully compatible with Hive. For details, see the
Apache Spark documentation.
The following Hive functions are not supported
in Spark SQL:
- Tables with buckets
- UNION type
- Unique join
- Column statistics collecting
- Output formats: File format (for CLI), Hadoop Archive
- Block-level bitmap indexes and virtual columns
- Automatic determination of the number of reducers for JOIN and GROUP BY
- Metadata-only query
- Skew data flag
- STREAMTABLE hint in JOIN
- Merging of multiple small files for query results
- Spark SQL and Hive Support for Spark 1.6.1
- Spark SQL is supported, but it is not fully compatible with Hive. For details, see the
Apache Spark documentation. The following Spark SQL operations support the
following Hive table formats:
| |
Hive 1.2 Table Format |
| Spark SQL Operations |
AVRO |
ORC |
Parquet |
RC |
default |
| create |
Yes |
Yes |
Yes |
Yes |
Yes |
| drop |
Yes |
Yes |
Yes |
Yes |
Yes |
| insert into |
Yes |
Yes |
Yes |
Yes |
Yes |
| insert overwrite |
Yes |
Yes |
Yes |
Yes |
Yes |
| select |
Yes |
Yes |
Yes |
Yes |
Yes |
| load data |
Yes |
Yes |
Yes |
Yes |
Yes |