Clarifai's platform offers predictive modeling for images, videos and text - three of the most common forms of data in the world. One of the most exciting things about a platform that can work with these three input types, is how these input types can work together to solve complex problems. Let's take a look at our public "Visual Text Recognition" workflow and how it can help you to connect the dots between text in images and encoded text.
Setting up an app to work with "Visual Text Recognition"
To begin, simply create a new application and choose "Visual Text Recognition" as your base workflow.
Workflow ID: Visual-Text-Recognition |
Owner: Clarifai |
Input: Image |
Output: Text |
Once we've created our app, lets create a custom Text Aggregation model. To do this, click on the "Model Mode" tab on the lefthand side of the screen. The Text Aggregation Operator offers parameters that let you tune the width and height of the window within which words are considered part of the same line. You can adjust these parameters to for optimal performance based on the type of image data that you will be processing (road signs will have different visual characteristics than scanned documents, for example). Adjust as needed, give your model a descriptive name, and Click create new model.
Next, visit the "Workflows" tab in Model Mode. Navigate to your "Visual Text Recognition" workflow and click the "Copy to New Workflow" button.
This will grab all of the underlying models in the Visual Text Recognition workflow and take you to the "Create a Workflow" page. From here lets add our Text Aggregator model to our Visual Text Recognition workflow.
Finally, we need to connect our "Input Nodes" so that data flows through our workflow properly. Connect the "1.0 Cropper" to the "Visual Text Detection" model, the "Visual Text Recognition" model to the "1.0 Cropper", the "Text Aggregation" model to the "Visual Text Recognition" model, and click "Create Workflow".
Now lets upload some images and test out the results. Upload your images through Data Mode or our API, and view your images in the explorer tab. In the righthand sidebar, select the App Workflow tab, select the gear icon and select your new workflow.
Your new workflow now detects, crops, recognizes and aggregates your image text into encoded text.