pyspark job too slow -tried all optimisation

Solution for pyspark job too slow -tried all optimisation
is Given Below:

import os
import sys
import time
from pyspark.sql import SparkSession



spark_packages = ",".join(['org.postgresql:postgresql:42.2.10','org.apache.hadoop:hadoop-aws:2.7.0' ,'com.oracle.database.jdbc:ojdbc6:11.2.0.4'])

spark = SparkSession 
    .builder 
    .appName("Spark_E6") 
    .config("spark.jars.packages", spark_packages) 
    .config("spark.sql.execution.arrow.pyspark.fallback.enabled", "false") 
    .config("spark.scheduler.mode", "FAIR") 
    .config("spark.scheduler.allocation.file", "/home/hadoop/fairscheduler.xml")
    .config("spark.driver.extraClassPath", "/home/hadoop/cdata.jdbc.netsuite.jar") 
    .config("spark.executor.extraClassPath", "/home/hadoop/cdata.jdbc.netsuite.jar") 
    .config("spark.sql.inMemoryColumnarStorage.compressed", True) 
    .config('spark.sql.inMemoryColumnarStorage.batchSize', 1000) 
    .config('spark.sql.shuffle.partitions', 12) 
    .config("spark.dynamicAllocation.minExecutors", 10) 
    .config("spark.dynamicAllocation.enabled", True) 
    .config("spark.dynamicAllocation.maxExecutors", 30) 
    .config("spark.dynamicAllocation.initialExecutors", 10) 
    .config("spark.sql.files.maxPartitionBytes", 268435456) 
    .config("spark.hadoop.fs.s3a.access.key", "**")
    .config("spark.hadoop.fs.s3a.secret.key", "**") 
    .config("spark.hadoop.fs.s3a.multipart.size", 104857600) 
    .config("jars", "/home/hadoop/*") 
    .getOrCreate()

for lines in data[-1]:
    url = "{}".format(data[0]['url'])
    table_name = lines
    start_time = time.time()
    source_df = spark.read.format("jdbc") 
        .option("url", url) 
        .option("dbtable", table_name) 
        .option("fetchSize", "5000") 
        .option("numPartitions", 10) 
        .option("partitionColumn", "part") 
        .option("lowerBound", 1) 
        .option("upperBound", 10) 
        .option("driver", "cdata.jdbc.netsuite.NetSuiteDriver") 
        .load()


    
    mode = "overwrite"
    properties = {"user": "{}".format(data[1]['db_user']), "password": "{}".format(data[1]['db_pwd']),
                  "driver": "org.postgresql.Driver", "batchsize": "5000"}
                  
    df.write.jdbc(url=url, table="db.{}".format(table), mode=mode, properties=properties)


    print("to_sql total duration: {} seconds".format(time.time() - start_time))

*This is the code above have used .
everything.
1.FAIR
2.Partition
3.Multiple executors /parallelism

Still, the job is too slow it’s just a read from a JDBC writing to Postgres.
Still, 35000 records take 40minutes
any suggestions or do you get to know how I can boost up the job?
*

Have you tried to analyze the postgres table? it is actually the gathering of statistics on table that sometimes causes this.. In my opinion it is not linked to Spark

You didn’t mentioned how many executor cores. By default 1 core.

For better understanding check the spark UI and understand which job/stage is taking more time. For that one check anything you can tune and decrease the time.

Try to check while reading data you can can read some more records.

How many CPU cores do you have? Choose number of parallel executors equal to CPU cores
Add some logging to capture how much time it take for read from jdbc and writing to postgres db? Please capture these IO operations latency? What’s the datasize of each read write query ? Are they too large in size? Also check network bandwidth that you have.