I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify your `find` command so that it: 1. **Does not overwrite existing `.txt` files**, and 2. **Outputs files with a `.txt` extension instead of `.pdf.txt`** you can use a combination of `find`, `bash`, and conditional checks within a shell command. Here's how you can achieve this: ### Updated Command
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | find . -name "*.pdf" -exec bash -c ' for filepath do # Construct the output path by replacing the .pdf extension with .txt and prefixing with txt/ output="txt/${filepath%.pdf}.txt" # Check if the output file already exists if [ ! -f "$output" ]; then # Create the necessary directories for the output file mkdir -p "$(dirname "$output")" # Run the Python script with the input and output paths python3 process.py "$filepath" "$output" fi done ' _ {} + |
1 2 | for filepath do |
1 2 | output="txt/${filepath%.pdf}.txt" |
1 2 | if [ ! -f "$output" ]; then |
1 2 | mkdir -p "$(dirname "$output")" |
1 2 | python3 process.py "$filepath" "$output" |
1 2 3 4 5 6 7 8 | find . -name "*.pdf" -print0 | while IFS= read -r -d '' filepath; do output="txt/${filepath%.pdf}.txt" if [ ! -f "$output" ]; then mkdir -p "$(dirname "$output")" python3 process.py "$filepath" "$output" fi done |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/bin/bash find . -name "*.pdf" -exec bash -c ' for filepath do # Construct the output path by replacing the .pdf extension with .txt and prefixing with txt/ output="txt/${filepath%.pdf}.txt" # Check if the output file already exists if [ ! -f "$output" ]; then # Create the necessary directories for the output file mkdir -p "$(dirname "$output")" # Run the Python script with the input and output paths python3 process.py "$filepath" "$output" fi done ' _ {} + |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/bin/bash find . -name "*.pdf" -exec bash -c ' for filepath do # Construct the output path by replacing the .pdf extension with .txt and prefixing with txt/ output="txt/${filepath%.pdf}.txt" # Check if the output file already exists if [ ! -f "$output" ]; then # Create the necessary directories for the output file mkdir -p "$(dirname "$output")" # Run the Python script with the input and output paths python3 process.py "$filepath" "$output" fi done ' _ {} + |
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify your `find` command so that it: 1. **Doesn't overwrite existing `.txt` files** in the `txt/` directory. 2. **Changes the output filename** from `filename.pdf.txt` to `filename.txt`. You can use a combination of `find`, `sh`, and shell scripting to achieve this. Here's how you can do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | find . -name "*.pdf" -exec sh -c ' for filepath do # Extract the base filename without the .pdf extension filename=$(basename "$filepath" .pdf) # Define the output file path in the txt/ directory outfile="txt/${filename}.txt" # Check if the output file already exists if [ ! -f "$outfile" ]; then # If it doesn't exist, process the PDF and create the txt file python3 process.py "$filepath" "$outfile" fi done ' sh {} + |
1 | find . -name "*.pdf" -exec sh -c 'outfile="txt/$(basename "$1" .pdf).txt"; [ ! -f "$outfile" ] && python3 process.py "$1" "$outfile"' _ {} \; |
1 2 | mkdir -p txt |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/bin/bash find . -name "*.pdf" -exec sh -c ' for filepath do # Extract the base filename without the .pdf extension filename=$(basename "$filepath" .pdf) # Define the output file path in the txt/ directory outfile="txt/${filename}.txt" # Check if the output file already exists if [ ! -f "$outfile" ]; then # If it doesn\'t exist, process the PDF and create the txt file python3 process.py "$filepath" "$outfile" fi done ' sh {} + |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/bin/bash find . -name "*.pdf" -exec sh -c ' for filepath do # Extract the base filename without the .pdf extension filename=$(basename "$filepath" .pdf) # Define the output file path in the txt/ directory outfile="txt/${filename}.txt" # Check if the output file already exists if [ ! -f "$outfile" ]; then # If it doesn\'t exist, process the PDF and create the txt file python3 process.py "$filepath" "$outfile" fi done ' sh {} + |
find: missing argument to `-exec'
Traceback (most recent call last):
File "/usr/src/app/process.py", line 2, in <module>
open(sys.argv[2], 'w').write(open(sys.argv[1]).read())
FileNotFoundError: [Errno 2] No such file or directory: ''
main.sh: line 16: syntax error near unexpected token `fi'
main.sh: line 16: ` fi'
Traceback (most recent call last): File "/usr/src/app/check.py", line 6, in <module> print('final: ' + str(check_ok())) File "/usr/src/app/check.py", line 3, in check_ok return open("txt/a.txt").read() == "done1" and open("txt/b.txt").read() == "hello2" and open("txt/c.txt").read() == "done3" and len(os.listdir("txt")) == 3 FileNotFoundError: [Errno 2] No such file or directory: 'txt/b.txt' Resulting in output: False