Traceback (most recent call last):
File "/cosr/back/spark/jobs/pagerank.py", line 459, in <module>
job.run()
File "/cosr/back/cosrlib/spark.py", line 207, in run
self.run_job(sc, sqlc)
File "/cosr/back/spark/jobs/pagerank.py", line 75, in run_job
self.custom_pagerank(sc, sqlc)
File "/cosr/back/spark/jobs/pagerank.py", line 289, in custom_pagerank
compression="gzip" if self.args.gzip else "none"
File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 632, in text
File "/usr/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'path file:/cosr/back/out/pagerank already exists.;'
This would avoid errors late in the job like this:
reported by @HenriqueLimas