With
MEPs 5.0.4 or 6.3.0 and
later, you can enable high availability for the Spark Thrift Server.
Note the following
characteristics of high availability for the Spark Thrift Server:
- Unlike a HiveServer2 high-availability (HA) configuration, all Spark Thrift Servers are
in an active state. ZooKeeper keeps track of the Thrift Servers. ZooKeeper chooses one of
them to work and makes a record of the choice. If one of the Thrift Servers goes down,
ZooKeeper looks for another Thrift Server, makes a record, and works with it.
- After configuration, you can use Beeline to connect to the Spark Thrift Server on each
node. The Control System displays one thrift server as active with the others on standby, but
you can connect to any of them.
- If a Spark Thrift Server stops or fails, ZooKeeper removes the record for the failed
Spark Thrift Server, and the client connects to the next one in the ZooKeeper list.
- At its core, the running Spark Thrift Server is a job that you can start in YARN mode.
This makes it possible to configure queues for the Spark Thrift Server in a multi-tenant
cluster if high availability is enabled. You can do this by using the
./sbin/start-thriftserver script and applying the special properties
that YARN provides for managing queues.
- You don't need to configure load balancing. Spark handles load-balancing automatically
through the use of parallelized requests and efficient resource
management.
To enable high availability, use the following steps:
- Install Spark Thrift Server on all the cluster nodes where it is needed:
- On Ubuntu
-
apt-get install mapr-spark-thriftserver
- On Red Hat / CentOS
-
yum install mapr-spark-thriftserver
- On SUSE
-
zypper install mapr-spark-thriftserver
- Add the following properties to the
/opt/mapr/spark/spark-<spark_version>/conf/hive-site.xml file on all
the nodes where the Spark Thrift Server is
installed<property>
<name>hive.zookeeper.quorum</name>
<value><zk_host1_>,<zk_host_2>,…,<zk_host_n></value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value><zk_port></value>
</property>
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value><zk_namespace></value>
</property>
For
example:<property>
<name>hive.zookeeper.quorum</name>
<value>node1.cluster.com,node2.cluster.com,node3.cluster.com</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>5181</value>
</property>
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>ts2-ts2</value>
</property>
Note: The
values that you provide for the hive.server2.zookeeper.namespace
property should be different for the hive-site.xml in the Spark and
Hive directories.
- Restart the Spark Thrift Server to apply the changes following the script in the
.sbin directory at
/opt/mapr/spark/spark-<spark_version>/ or by running a
maprcli command on all configured
nodes:./sbin/stop-thriftserver.sh
./sbin/start-thriftserver.sh
ormaprcli node services -nodes <host_1>,<host_2>,<host_n> -name spark-thriftserver -action restart
- Launch the Zookeeper command line interface, and check the Spark Thriftserver znode by
running the following
commands:
/opt/mapr/zookeeper/zookeeper-<version>/bin/zkCli.sh -server <ip:port of zookeeper instance>
ls /<hive.server2.zookeeper.namespace>
For
example:/opt/mapr/zookeeper/zookeeper-3.4.11/bin/zkCli.sh -server node1.cluster.com:5181
ls /ts2-ts2
[serverUri=node1.cluster.com:2304;version=;sequence=0000000000]
- Using Beeline, you can connect to the Spark Thrift Server by using the following
string:
beeline> !connect jdbc:hive2://<hostname -f>:5181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hive.server2.zookeeper.namespace>;
For
example:./bin/beeline
Warning: Unable to determine $DRILL_HOME
Beeline version 1.2.0-mapr-spark-MEP-6.0.0-1912 by Apache Hive
beeline> !connect jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl;
Connecting to jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl;
20/03/29 21:38:19 WARN MaprSaslClient: SASL Server qopProperty: auth-confis different from Client: auth-conf,auth-int,auth.Using Server one
Connected to: Spark SQL (version 2.4.4.0-mapr-630)
Driver: Hive JDBC (version 1.2.0-mapr-spark-MEP-6.0.0-1912)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://node1.cluster.com:5181/defaul> show databases;
+-----------------+
| databaseName |
+-----------------+
| default |
+-----------------+
1 row selected (0.11 seconds)
Note: High availability for the Spark Thrift Server can be used in conjunction with
HiveServer2 high availability. For more information about HiveServer2 high availability, see
Enabling High Availability for Hive.