Skip to content

PageRank & other jobs: check if output directory already exists #62

@sylvinus

Description

@sylvinus

This would avoid errors late in the job like this:

Traceback (most recent call last):
  File "/cosr/back/spark/jobs/pagerank.py", line 459, in <module>
    job.run()
  File "/cosr/back/cosrlib/spark.py", line 207, in run
    self.run_job(sc, sqlc)
  File "/cosr/back/spark/jobs/pagerank.py", line 75, in run_job
    self.custom_pagerank(sc, sqlc)
  File "/cosr/back/spark/jobs/pagerank.py", line 289, in custom_pagerank
    compression="gzip" if self.args.gzip else "none"
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 632, in text
  File "/usr/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'path file:/cosr/back/out/pagerank already exists.;'

reported by @HenriqueLimas

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions