准确率中等,总比手打一遍好

convert pdf to txt

how to use

1. sudo apt install -y poppler-utils
2. pdftotext *.pdf file.txt

how to convert imgae to txt

how to use,but not easy to do

1. sudo apt install -y tesseract-ocr tesseract-ocr-chi-sim 
2. convert file.png out.tif
3. tesseract out.tif out.txt

Installing Tesseract in Ubuntu / Linux

sudo apt-get install -y tesseract-ocr tesseract-ocr-chi-sim

Further, you can install any language packages if required.

Now, before you start using Tesseract, you need to convert the files (png/jpg) to tif format (input format supported by tesseract). Use the following command (you may need to install imagemagick package) –

convert file_name.png out_file_name.tif

Now, you can try reading the content using Tesseract.

tesseract your_scanned_file.tif output_content

results matching ""

    No results matching ""