Convert PDF to text

PDF-to-Text allows you to extract all the text data from PDF files and further analyze the text or use the text in applications such as question answering. Note that you can save the extracted text into a knowledge-set to avoid redoing the PDF-to-Text step.

On this page, we will introduce the Tool step at Relevance to convert PDF to text.

How to use Convert PDF to text step

Add the component

Add the PDF to text converter step to your Tool (check how to get started with creating a tool).

File URL

A PDF-to-text converter requires a file as an input. If your file is publicly accessible on the web (i.e. with no authentication or sign-up requirement), simply provide the URL directly or as a text input. Otherwise, you will need to add a File-to-URL input. In either situation, use the {{variable name}} to provide the data to the converter.

PDF to text

Use OCR

OCR (Optical character recognition or optical character reader) is needed for image PDFs (e.g. scanned data). This option uses more credits. So, only activate it for image PDFs.

Available converters

Fast converter: Relevance AI’s default audio and video-to-text converter which is fast and reasonably accurate
Quality converter: Slower and more accurate compared to the previous option

Follow the links below for more information about

How to run a step
How to delete a step
How to configure output
How to configure a default value
How to move a step in a Tool
How to duplicate a step
How to add condition to a step (i.e. execute only if a condition is met)
How to loop a step (i.e. run one step multiple times)

Access the step output

The output is a dictionary with two keys text and number_of_pages containing the extracted text and the number of pages in the file respectively. Below you can see samples where the default name assigned to the step pdf_to_text is used. Note that a step name is different from the step title. Step titles can be found on the top left of steps. A step name is shown on the bottom left, in smaller font and highlighted green.

pdf_to_text.text
pdf_to_text.number_of_pages

Common errors

Unsupported protocol

An error similar to the one noted below indicates that the provided input is not a valid URL.

Error:
Only HTTP(S) protocols are supported

Was this page helpful?

LLM output Extract Website Content

How to use Convert PDF to text step
Add the component
File URL
Use OCR
Available converters
Access the step output
Common errors
Unsupported protocol

Get started

Agents

Tools and Templates

Use cases

How to build a custom Tool

Data

Notebook

Integrations

SDK

Datasets

How to use Convert PDF to text step

Add the component

File URL

Use OCR

Available converters

Access the step output

Common errors

Unsupported protocol

Get started

Agents

Tools and Templates

Use cases

How to build a custom Tool

Data

Notebook

Integrations

SDK

Datasets

​How to use Convert PDF to text step

​Add the component

​File URL

​Use OCR

​Available converters

​Access the step output

​Common errors

​Unsupported protocol

How to use Convert PDF to text step

Add the component

File URL

Use OCR

Available converters

Access the step output

Common errors

Unsupported protocol