Storing dataframe into HBase using Spark

Solution for Storing dataframe into HBase using Spark
is Given Below:

I am storing dataframe to hbase table from the pyspark dataframe in CDP7, following this example, the components that I use are:

  • Spark version 3.1.1
  • Scala version 2.12.10
  • shc-core-1.1.1-2.1-s_2.11.jar

The command that i use:

spark3-submit  --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories --files /etc/hbase/conf/hbase-site.xml

However, I got this error, it is quite long that I need to put it in as below:

error snippet:

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/", line 45, in <module>
  File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/", line 24, in main
    writeDF.write.options(catalog=writeCatalog, newtable=5).format(dataSourceFormat).save()
  File "/opt/cloudera/parcels/SPARK3-", line 1107, in save
  File "/opt/cloudera/parcels/SPARK3-", line 1305, in __call__
  File "/opt/cloudera/parcels/SPARK3-", line 111, in deco
  File "/opt/cloudera/parcels/SPARK3-", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
: java.lang.NoClassDefFoundError: scala/Product$class
    at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:73)
    at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)

What should I do to fix the error? I tried to find other connector. However, only found SHC connector. Im not using any Maven repo here. But, not sure if there is missing dependencies or other error.

This is a scala version conflict. Your shc_core.jar is compiled for scala 2.11 but you’re using scala 2.12 which is not binary compatible with 2.11.

Easiest fix would be to recompile shc_core from source for scala 2.12 (although you may end up with compatibility issues since the project is
obviously not tested with scala 2.12)

Other ways you can explore to solve your issue: