Nougat is a visual transformer model from Meta AI that converts document images, including complex math equations, into structured text, offering advancements in academic paper parsing.
You can now try out nougat-base in the Clarifai Platform and access it through the API.
Nougat is a visual transformer model developed by researchers at Meta AI that can convert images of document pages into structured text. It takes a scanned image of a document page as input and outputs text in a lightweight markup language.
The key advantage of Nougat is that it relies solely on the document image and does not need any OCR text. This allows it to recover semantic structure like math equations properly. It is trained on millions of academic papers from arXiv and PubMed to learn the patterns of research paper formatting and language.
Nougat uses a visual transformer encoder-decoder architecture. The encoder uses a Swin Transformer to encode the document image into latent embeddings. The Swin Transformer processes the image in a hierarchical fashion using shifted windows. The decoder then generates the output text tokens autoregressive using self-attention over the encoder outputs.
You can run Nougat with Clarifai’s Python SDK in just a few lines of code. To get started, Signup to Clarifai and get your Personal Access Token(PAT) following the instructions here.
Export your PAT as an environment variable
export CLARIFAI_PAT={your personal access token}
Check out the Code below to run the Model:
You can also run it with our Javascript Client:
You can also run Nougat using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc.
Try out the Nougat model here: https://clarifai.com/facebook/nougat/models/nougat-base
Nougat Model has a wide range of applications in the field of document understanding and extraction. Some key use cases include:
Keep up to speed with AI
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy
© 2023 Clarifai, Inc. Terms of Service Content TakedownPrivacy Policy