I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify the command so it: 1. Only processes files that don't already exist 2. Changes the output naming from `.pdf.txt` to just `.txt` Here's the solution:
1 2 3 4 5 6 7 8 9 10 11 | mkdir -p txt && find . -name "*.pdf" -exec bash -c ' pdf="$1" base=$(basename "$pdf" .pdf) dest="txt/${base}.txt" if [ ! -f "$dest" ]; then echo "Processing: $pdf -> $dest" python3 process.py "$pdf" "$dest" else echo "Skipping: $dest already exists" fi ' _ {} \; |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 | mkdir -p txt && find . -name "*.pdf" -exec bash -c ' pdf="$1" base=$(basename "$pdf" .pdf) dest="txt/${base}.txt" if [ ! -f "$dest" ]; then echo "Processing: $pdf -> $dest" python3 process.py "$pdf" "$dest" else echo "Skipping: $dest already exists" fi ' _ {} \; |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 | mkdir -p txt && find . -name "*.pdf" -exec bash -c ' pdf="$1" base=$(basename "$pdf" .pdf) dest="txt/${base}.txt" if [ ! -f "$dest" ]; then echo "Processing: $pdf -> $dest" python3 process.py "$pdf" "$dest" else echo "Skipping: $dest already exists" fi ' _ {} \; |
Processing: ./b.pdf -> txt/b.txt
Skipping: txt/c.txt already exists
Skipping: txt/a.txt already exists
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To meet your requirements, I'll modify the command to: 1. Skip processing PDF files if the corresponding txt file already exists 2. Change the output filename from `.pdf.txt` to just `.txt` Here's the command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | find . -name "*.pdf" -exec sh -c ' in_file="$1" # Remove leading ./ if present in_file="${in_file#./}" # Compute the output file path (remove .pdf and add .txt) out_file="txt/${in_file%.pdf}.txt" # Create the output directory if it doesn't exist mkdir -p "$(dirname "$out_file")" # Only process if output file doesn't exist if [ ! -f "$out_file" ]; then echo "Processing $1 -> $out_file" python3 process.py "$1" "$out_file" else echo "Skipping $1 (output file already exists)" fi ' sh {} \; |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | find . -name "*.pdf" -exec sh -c ' in_file="$1" # Remove leading ./ if present in_file="${in_file#./}" # Compute the output file path (remove .pdf and add .txt) out_file="txt/${in_file%.pdf}.txt" # Create the output directory if it doesn't exist mkdir -p "$(dirname "$out_file")" # Only process if output file doesn't exist if [ ! -f "$out_file" ]; then echo "Processing $1 -> $out_file" python3 process.py "$1" "$out_file" else echo "Skipping $1 (output file already exists)" fi ' sh {} \; |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | find . -name "*.pdf" -exec sh -c ' in_file="$1" # Remove leading ./ if present in_file="${in_file#./}" # Compute the output file path (remove .pdf and add .txt) out_file="txt/${in_file%.pdf}.txt" # Create the output directory if it doesn't exist mkdir -p "$(dirname "$out_file")" # Only process if output file doesn't exist if [ ! -f "$out_file" ]; then echo "Processing $1 -> $out_file" python3 process.py "$1" "$out_file" else echo "Skipping $1 (output file already exists)" fi ' sh {} \; |
Processing ->
find: missing argument to `-exec'
Traceback (most recent call last):
File "/usr/src/app/process.py", line 2, in <module>
open(sys.argv[2], 'w').write(open(sys.argv[1]).read())
^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: ''
main.sh: line 17: unexpected EOF while looking for matching `''
Traceback (most recent call last): File "/usr/src/app/check.py", line 6, in <module> print('final: ' + str(check_ok())) ^^^^^^^^^^ File "/usr/src/app/check.py", line 3, in check_ok return open("txt/a.txt").read() == "done1" and open("txt/b.txt").read() == "hello2" and open("txt/c.txt").read() == "done3" and len(os.listdir("txt")) == 3 ^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'txt/b.txt' Resulting in output: False