How to Run Nougat with an API on Clarifai’s Python SDK

🔥 Just Launched

Introducing the AI Playground — Your LLM Battleground to Test Powerful AI Models!

Introduction

Nougat is a visual transformer model developed by researchers at Meta AI that can convert images of document pages into structured text. It takes a scanned image of a document page as input and outputs text in a lightweight markup language.

The key advantage of Nougat is that it relies solely on the document image and does not need any OCR text. This allows it to recover semantic structure like math equations properly. It is trained on millions of academic papers from arXiv and PubMed to learn the patterns of research paper formatting and language.

Model Architecture

Nougat uses a visual transformer encoder-decoder architecture. The encoder uses a Swin Transformer to encode the document image into latent embeddings. The Swin Transformer processes the image in a hierarchical fashion using shifted windows. The decoder then generates the output text tokens autoregressive using self-attention over the encoder outputs.

Running Nougat model with Python

You can run Nougat with Clarifai’s Python SDK in just a few lines of code. To get started, Signup to Clarifai and get your Personal Access Token(PAT) following the instructions here.

Export your PAT as an environment variable

export CLARIFAI_PAT={your personal access token}

Check out the Code below to run the Model:

	from clarifai.client.model import Model

	image_url = ''

	# Model Predict
	model_prediction = Model("https://clarifai.com/facebook/nougat/models/nougat-base").predict_by_url(image_url, "image")

view raw nougat.py hosted with ❤ by GitHub

Running Nougat model with Javascript

You can also run it with our Javascript Client:

	///////////////////////////////////////////////////////////////////////////////////////////////////
	// In this section, we set the user authentication, user and app ID, model details, and the URL
	// of the image we want as an input. Change these strings to run your own example.
	//////////////////////////////////////////////////////////////////////////////////////////////////

	// Your PAT (Personal Access Token) can be found in the portal under Authentification
	const PAT = '';
	// Specify the correct user_id/app_id pairings
	// Since you're making inferences outside your app's scope
	const USER_ID = 'facebook';
	const APP_ID = 'nougat';
	// Change these to whatever model and image URL you want to use
	const MODEL_ID = 'nougat-base';
	const MODEL_VERSION_ID = 'c7c1393511d24e008d0dde5a044b8513';
	const IMAGE_URL = 'https://samples.clarifai.com/metro-north.jpg';

	///////////////////////////////////////////////////////////////////////////////////
	// YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
	///////////////////////////////////////////////////////////////////////////////////

	const raw = JSON.stringify({
	"user_app_id": {
	"user_id": USER_ID,
	"app_id": APP_ID
	},
	"inputs": [
	{
	"data": {
	"image": {
	"url": IMAGE_URL
	}
	}
	}
	]
	});

	const requestOptions = {
	method: 'POST',
	headers: {
	'Accept': 'application/json',
	'Authorization': 'Key ' + PAT
	},
	body: raw
	};

	// NOTE: MODEL_VERSION_ID is optional, you can also call prediction with the MODEL_ID only
	// https://api.clarifai.com/v2/models/{YOUR_MODEL_ID}/outputs
	// this will default to the latest version_id

	fetch("https://api.clarifai.com/v2/models/" + MODEL_ID + "/versions/" + MODEL_VERSION_ID + "/outputs", requestOptions)
	.then(response => response.text())
	.then(result => console.log(result))
	.catch(error => console.log('error', error));

view raw nougat.js hosted with ❤ by GitHub

You can also run Nougat using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc.

Model Demo in the Clarifai Platform:

Try out the Nougat model here: https://clarifai.com/facebook/nougat/models/nougat-base

Best Use Cases

Nougat Model has a wide range of applications in the field of document understanding and extraction. Some key use cases include:

Research Paper Parsing: Nougat can accurately parse research papers, extracting text, tables, figures, and equations from document images. This capability is crucial for making the information in research papers more accessible for various applications.
Data Extraction: The model's ability to convert documented images into structured text makes it valuable for extracting valuable data from academic papers, which can be used for research, analysis, and data-driven decision-making.
Summarization: Nougat can be integrated into text summarization pipelines to extract and summarize the content of research papers automatically, saving time and effort for researchers.

Keep up to speed with AI

Follow us on Twitter X to get the latest from the LLMs
Join us in our Discord to talk LLMs!

Previous Return to Blog Menu Next

Compute

Create

Governance & Control

Platform overview

Learn more about Clarifai's AI Lifecycle Platform

on-demand WEBINAR

Founder's AMA: Maximize the value of your AI investments

AI Compute Orchestration

Create and control your AI workloads on any compute infrastructure

How to run Nougat with an API

Table of Contents:

Table of Contents

Introduction

Model Architecture

Running Nougat model with Python

Running Nougat model with Javascript

Model Demo in the Clarifai Platform:

Best Use Cases

CONTACT

Platform

Solutions

Community

COMPANY

Resources

CONTACT