PDF to Text

Pdf to marked text

I specialize in taking PDF documents and extracting all text, images, tables and other information and putting it in a clean text format that is easy to work with, while still keeping all information and styles.

Below you can see the steps in the process and more information.

1

Pdf document

If you have PDFs of books, articles, science research or whatever, doesn’t matter if it’s a clean digital document or scanned, we can extract all information from it into a usable form.

2

Production

The PDF document is loaded into a program where every element can be marked (main text, footnotes, etc), the useful information selected, and the useless information (like page numbers) discarded. 

3

The extracted text

With the help from Macros, every style is replaced by tags that change, for example, a sentence part in italics with a tagged sentence part, i.e. <i> a sentence part in italics. </i>

The text can then be used in whatever program needed, for example SmartCat for translation, without loosing any stylistic information.

Headings, tables and images are also tagged.

Examples of tags

<i> italics</i>

<b> bold </b>

<h> super script </h>

# Heading outside of main text

## First level heading

### Second level heading

$$$ Table

$$$ TABLE 1

A timeline of events.

Date

Event
JUL 1947 ROSWELL UFO CRASH
FEB 1961 THE OUTER LIMITS
SEPR 1961 BETTY & BARNEY HILL
OCT 1975 THE UFO INCIDENT
NOV 1975 TRAVIS WALTON ABDUCTION
DEC 1977 CLOSE ENCOUNTERS OF THE THIRD KIND
DEC 1985 WHITLEY STRIEBER ABDUCTION
NOV 1989 COMMUNION
MAR 1993 FIRE IN THE SKY
MAY 1993 INTRUDERS
SEP 1993 THE X-FILES

Tables are remade for easy use.

4

Text ready for use

When you are ready to use the extracted text, wheter it is in Word, InDesign or on a website, it is easy to replace all the tags with the style itself. 

If you need, then I can do the redesign both for print and web.

Please contact me if I can be of service.

[email protected]