I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Okay, you need to modify the `-exec` part of your `find` command to achieve two things: 1. **Generate the correct output filename:** `txt/basename.txt` instead of `txt/path/to/file.pdf.txt`. 2. **Check for the existence** of the target `.txt` file before running the Python script. The most robust way to do this within `find -exec` is to invoke a small shell script using `sh -c`. Here's the command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # First, ensure the output directory exists mkdir -p txt # Now, find and process the PDFs conditionally find . -name "*.pdf" -exec sh -c ' pdf_file="$0" # Extract base name without .pdf extension base_name=$(basename "$pdf_file" .pdf) # Construct the target output file path output_file="txt/${base_name}.txt" # Check if the output file does NOT exist if [ ! -e "$output_file" ]; then echo "Processing: $pdf_file -> $output_file" # Run the python script if the output file doesn't exist python3 process.py "$pdf_file" "$output_file" else echo "Skipping: $output_file already exists" fi ' sh {} \; |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # First, ensure the output directory exists mkdir -p txt # Now, find and process the PDFs conditionally find . -name "*.pdf" -exec sh -c ' pdf_file="$0" # Extract base name without .pdf extension base_name=$(basename "$pdf_file" .pdf) # Construct the target output file path output_file="txt/${base_name}.txt" # Check if the output file does NOT exist if [ ! -e "$output_file" ]; then echo "Processing: $pdf_file -> $output_file" # Run the python script if the output file doesn't exist python3 process.py "$pdf_file" "$output_file" else echo "Skipping: $output_file already exists" fi ' sh {} \; |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # First, ensure the output directory exists mkdir -p txt # Now, find and process the PDFs conditionally find . -name "*.pdf" -exec sh -c ' pdf_file="$0" # Extract base name without .pdf extension base_name=$(basename "$pdf_file" .pdf) # Construct the target output file path output_file="txt/${base_name}.txt" # Check if the output file does NOT exist if [ ! -e "$output_file" ]; then echo "Processing: $pdf_file -> $output_file" # Run the python script if the output file doesn't exist python3 process.py "$pdf_file" "$output_file" else echo "Skipping: $output_file already exists" fi ' sh {} \; |
find: missing argument to `-exec'
Traceback (most recent call last):
File "/usr/src/app/process.py", line 2, in <module>
open(sys.argv[2], 'w').write(open(sys.argv[1]).read())
^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: ''
main.sh: line 18: syntax error near unexpected token `else'
main.sh: line 18: ` else'
Traceback (most recent call last): File "/usr/src/app/check.py", line 6, in <module> print('final: ' + str(check_ok())) ^^^^^^^^^^ File "/usr/src/app/check.py", line 3, in check_ok return open("txt/a.txt").read() == "done1" and open("txt/b.txt").read() == "hello2" and open("txt/c.txt").read() == "done3" and len(os.listdir("txt")) == 3 ^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'txt/b.txt' Resulting in output: False