I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
If you want to search for PDF files and convert them to text files without overwriting existing ones, and ensure the output files have a `.txt` extension instead of `.pdf.txt`, you can't accomplish this with a simple `find` command as you've described. The modification required involves checking if the output `.txt` file already exists before deciding to process the PDF file. You can achieve this by using a shell script or a one-liner that loops through the found files, checks for the existence of the target `.txt` file, and then processes the file if the `.txt` version doesn't exist. One efficient way to do this is by using `find` with a `while` loop or an `if` condition inside a bash script. Here is a solution using a simple bash one-liner:
1 2 3 4 5 6 | find . -name "*.pdf" | while read -r file; do output="txt/$(basename "${file%.pdf}").txt" if [ ! -f "$output" ]; then python3 process.py "$file" "$output" fi done |
I extracted the following code from that output:
1 2 3 4 5 6 | find . -name "*.pdf" | while read -r file; do output="txt/$(basename "${file%.pdf}").txt" if [ ! -f "$output" ]; then python3 process.py "$file" "$output" fi done |
Running the following program:
1 2 3 4 5 6 | find . -name "*.pdf" | while read -r file; do output="txt/$(basename "${file%.pdf}").txt" if [ ! -f "$output" ]; then python3 process.py "$file" "$output" fi done |
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify the command so that it doesn't overwrite existing `.txt` files and correctly formats the output file name (replacing `.pdf` with `.txt`), you need to incorporate a conditional check before executing the `python3` script. However, the `find` command itself doesn't directly support conditional execution based on the existence of files. Instead, you can use a shell loop or an if-statement within the `-exec` option by invoking a shell. Here's how you can do it:
1 2 3 4 5 6 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" if [ ! -f "$txt" ]; then python3 process.py "$pdf" "$txt" fi done |
I extracted the following code from that output:
1 2 3 4 5 6 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" if [ ! -f "$txt" ]; then python3 process.py "$pdf" "$txt" fi done |
Running the following program:
1 2 3 4 5 6 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" if [ ! -f "$txt" ]; then python3 process.py "$pdf" "$txt" fi done |
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify your command so that it doesn't overwrite existing `.txt` files and changes the output filename from `.pdf.txt` to just `.txt`, you can use a small shell script snippet within the `-exec` option of `find`. This script will check if the output `.txt` file already exists before attempting to convert the `.pdf` file. Here's how you can do it:
1 | find . -name "*.pdf" -exec sh -c 'output="txt/${1%.pdf}.txt"; [ ! -f "$output" ] && python3 process.py "$1" "$output"' _ {} \; |
I extracted the following code from that output:
1 | find . -name "*.pdf" -exec sh -c 'output="txt/${1%.pdf}.txt"; [ ! -f "$output" ] && python3 process.py "$1" "$output"' _ {} \; |
Running the following program:
1 | find . -name "*.pdf" -exec sh -c 'output="txt/${1%.pdf}.txt"; [ ! -f "$output" ] && python3 process.py "$1" "$output"' _ {} \; |
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To modify the provided `find` command so that it converts PDF files to text files without overwriting existing files and names the output files with a `.txt` extension (replacing `.pdf.txt`), you can use a small bash script snippet within the `-exec` option. This script will check if the output `.txt` file already exists before attempting to process a PDF file. Here's how you could do it:
1 | find . -name "*.pdf" -exec bash -c 'f="{}"; out="txt/${f%.*}.txt"; [ ! -f "$out" ] && python3 process.py "$f" "$out"' \; |
I extracted the following code from that output:
1 | find . -name "*.pdf" -exec bash -c 'f="{}"; out="txt/${f%.*}.txt"; [ ! -f "$out" ] && python3 process.py "$f" "$out"' \; |
Running the following program:
1 | find . -name "*.pdf" -exec bash -c 'f="{}"; out="txt/${f%.*}.txt"; [ ! -f "$out" ] && python3 process.py "$f" "$out"' \; |
final: True Resulting in output: True
I have setup the docker container to run the model evaluation.
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
Run find . -name "*.pdf" -exec python3 process.py {} txt/{}.txt \; but don't rewrite files that already exist. Instead of .pdf.txt make it just .txt
To accomplish this task, you need to modify the command slightly to check if the output `.txt` file already exists before executing `process.py` on the `.pdf` file. One way to do this is by using a shell loop instead of `-exec` directly with `find`, because you need to add some logic to check for the existence of the output file. Here's how you can do it:
1 2 3 4 5 6 7 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" # Replace the extension .pdf with .txt and prepend directory name if [ ! -f "$txt" ]; then # Check if the .txt file does not exist mkdir -p "$(dirname "$txt")" # Ensure the target directory exists python3 process.py "$pdf" "$txt" fi done |
I extracted the following code from that output:
1 2 3 4 5 6 7 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" # Replace the extension .pdf with .txt and prepend directory name if [ ! -f "$txt" ]; then # Check if the .txt file does not exist mkdir -p "$(dirname "$txt")" # Ensure the target directory exists python3 process.py "$pdf" "$txt" fi done |
Running the following program:
1 2 3 4 5 6 7 | find . -name "*.pdf" | while read pdf; do txt="txt/${pdf%.pdf}.txt" # Replace the extension .pdf with .txt and prepend directory name if [ ! -f "$txt" ]; then # Check if the .txt file does not exist mkdir -p "$(dirname "$txt")" # Ensure the target directory exists python3 process.py "$pdf" "$txt" fi done |
final: True Resulting in output: True