nougat-base

Nougat is a Meta AI-developed visual transformer model that converts document images, including complex math equations, into structured text, offering advancements in academic paper parsing.

Input

The maximum number of tokens to generate. Shorter token lengths will provide faster performance.

Output

Submit an image for a response.

Notes

Introduction

Nougat is a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language.

Nougat Model

Nougat is a visual transformer model developed by researchers at Meta AI that can convert images of document pages into structured text. It takes a rasterized image of a document page as input and outputs text in a lightweight markup language.

The key advantage of Nougat is that it relies solely on the document image and does not need any OCR text. This allows it to recover semantic structure like math equations properly. It is trained on millions of academic papers from arXiv and PubMed to learn the patterns of research paper formatting and language.

Model Details

Nougat uses a visual transformer encoder-decoder architecture. The encoder uses a Swin Transformer to encode the document image into latent embeddings. The Swin Transformer processes the image in a hierarchical fashion using shifted windows. The decoder then generates the output text tokens autoregressively using self-attention over the encoder outputs.

Nougat is trained end-to-end on page image and text pairs using stochastic gradient descent. Data augmentation like erosions, dilations, and elastic transforms are used to improve robustness. A special anti-repetition regularization is also used during training to reduce text repetitions.

Running Nougat with an API

You can run the nougat Model using Clarifai’s python SDK.

Check out the Code Below:

import os
os.environ["CLARIFAI_PAT"] = "your personal access token"

from clarifai.client.model import Model

image_url = ''

inference_params = dict(max_tokens=1024)


# Model Predict
model_prediction = Model("https://clarifai.com/facebook/nougat/models/nougat-base").predict_by_url(image_url, "image", inference_params=inference_params)

You can also run Nougat API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Use Cases

Nougat Model has a wide range of applications in the field of document understanding and extraction. Some key use cases include:

  • Research Paper Parsing: Nougat can accurately parse research papers, extracting text, tables, figures, and equations from document images. This capability is crucial for making the information in research papers more accessible for various applications.
  • Data Extraction: The model's ability to convert documented images into structured text makes it valuable for extracting valuable data from academic papers, which can be used for research, analysis, and data-driven decision-making.
  • Summarization: Nougat can be integrated into text summarization pipelines to extract and summarize the content of research papers automatically, saving time and effort for researchers.
  • ID
  • Model Type ID
    Image To Text
  • Input Type
    image
  • Output Type
    text
  • Description
    Nougat is a Meta AI-developed visual transformer model that converts document images, including complex math equations, into structured text, offering advancements in academic paper parsing.
  • Last Updated
    Oct 17, 2024
  • Privacy
    PUBLIC
  • Use Case
  • Toolkit
  • License
  • Share
  • Badge
    nougat-base