Skip to content

Extracting Word Format Source Files from PDF

Document Translation

Extracting Word Format Source Files from PDF

Nearly every customer finds themselves in the embarrassing situation of not having a critical source file for translation in the appropriate desktop publishing (DTP) format. Sometimes, all that is available is a PDF file derived from a “lost” source file in FrameMaker, InDesign or Word format.

Regardless of the reason, you and your translation company have several tools and methods available to create an editable source file from PDF. This blog will cover two of the more effective techniques that our production staff has discovered here at GPI while performing multilingual desktop publishing.

1. Save the PDF as MS Word file.

This option is best suited for PDF files which are 20-30 pages in length that were created by printing from editable files like InDesign, FrameMaker and MS Word. Adobe Acrobat X Pro and Nitro PDF Professional are two software programs which allow you to do this.

*

The resulting MS Word file will preserve the text formatting (font family, size, and color) and the graphics. If your other source files are FrameMaker or InDesign, you can easily import the resulting MS Word file into these applications. This will enable you to recreate a properly formatted, editable source file that is suitable for language translation.

2. Convert the PDF using a standalone PDF to MS Word conversion tool.

You have a wide range of tools to choose from, including ABBYY FineReader and deskUNPDF. In addition, there are some tools which can be used directly online, like Zamzar, PDFonline, and PDF to Word Converter.

Check the list of 30+ Tools of PDF Converter, PDF Creator and PDF Reader for the complete list.

When converting a PDF file using one of these tools:

  • We have the possibility of editing every page individually
  • We can indicate which part of the page should be converted as a graphic or a table
  • We can even indicate the format of the text before saving the conversion into Word format

These options are highly useful in the event that the application didn’t detect the correct format in the first place.

*

*

Option to indicate PDF text language

Another great thing about these tools is that we can indicate the language of the text in the PDF file before starting the conversion process.

We can create a list of languages we use most often and the program will remember them, or we can specify the language(s) for the file we are converting on an as-needed basis.

The results will be better, especially with scans and large files with multiple graphics and complex formatting.

Optimizing Word files for translation

The results are not always perfect, regardless of the method used. You will usually need to prepare the word documents for translation before submitting it to your translation agency.

Some paragraphs might have the lines split by hard or soft returns, (as indicated in the screen capture below this paragraph.) The text on some of the graphics might be converted to editable text.

*

Find the best tool for your files

In order to obtain the best results with the least amount of work we need to make sure we are using the best tool for the type of PDF file we need to convert. Begin the process by analyzing the PDF file to determine its source, layout and format complexity.