Last December, we announced a new feature that would let Clarifai users build their own custom trained models to predict against, aptly called Custom Training. Now, we’re excited to add on this functionality with a new tool called Model Evaluation. You now have the ability to test your model’s performance before using it in a production environment.
Since we launched Custom Training, we’ve seen a ton of interesting use cases and custom visual recognition models being built on our platform, from flower recognition models, to shoe recognition models, and even muppet recognition models! But the number one question we’ve gotten from users is, “How do I make my custom model more accurate?” Well, we’ve been hard at work coming up with a solution, and we’re proud to introduce the Model Evaluation tool for Custom Training. Here’s how it works!
How it works
Model Evaluation does a 5-split cross validation on the data used to train your custom model. You might be wondering what that means? Here is a graphic to visually showcase that.
We take all the training data you’ve given us for your custom trained model. We then split that in 5 parts. Next, we set aside 1 part for a test set and use the remaining 80% of data to train a new model against.
Once that model is created we use the Test set and make predictions against it using this model. We then compare these predictions against the actual labels given for the inputs. After that we repeat this process for every Test set.
Let’s try it out
What better way to introduce this feature than creating a simple model and using Model Evaluation to make it better! Let’s begin by creating a model using our Preview UI, a visual way to build and train models. If you’re not familiar with it, you can get more info here.
The first couple of steps below will just quickly walk us through creating a model. If you already have one, feel free to skip to Step Four.
Step One: Sign up with Clarifai and create an application
If you haven’t already done so, you can quickly sign up here.
After the signup process, go to your dashboard and create an application.
I’ll be calling my application “modelEvaluation”. This application will house our model.
Step Two: Go to Our Preview UI.
Next, click on the eye icon to open up our Custom Model builder called PreviewUI. (Note: This step is only if you don’t already have your own custom built model.)
Step Three: Custom Build and train a model.
For this blog post we are going to quickly custom build a model to run our evaluation tool against. The model that I’ll be building will be able to look at a photo of a Garage Door and let us know if it’s Open or Closed.
1. Add Inputs (Images)
In interface above I’ll just drag some photos for now with various open or closed garage images.
2. Create Model & Concepts
After that, make sure you create a model and concepts with the menu on the left side. In this case, I created the model “garage_eval” and concepts “open” and “closed”.
3. Label each image and train model
Next, let’s go through each photo and label it with one of the concepts. If you don’t see the concepts you just created, make sure to click on the gear icon and select show all concepts. Once completed, click train and you now have a simple custom trained model that will let you know if it’s see a closed garage door or open one.
Step Four: Run Model Evaluation
If you click on the model name on the menu on the left hand side of the screen, it will take you to a new page that will showcase the model details. On this page, make sure to click on the versions tab.
1. Click on Versions Tab
2. Click on the Evaluate button
In this screen, we showcase all the versions of the model you have trained. You can evaluate any version by clicking on the Evaluate button.
Step Five: Interpreting Results
Once the evaluation completes, the “Evaluate” button will turn into a “View” button. Click to view the evaluation results, which should look similar to this:
The results are shown in 3 main parts: Evaluation Summary Table, Concept by Concept Matrix, and Selection Details. The Evaluation Summary Table shows how the model performed when it predicted against the test set in 1 split. This is why the total number of labeled inputs on this table is around 20% of the size of the original training set you used. Feel free to adjust the threshold bar to find the right gauge for your recall and precision rates.
In this example, you can quickly see from the results above that our model will need more data to give stronger predictions for open garage doors in certain instances. The model predicted that a picture with a slightly open garage door has a low probability score for the “open” concept. You can see this particular input in the Selection Details section. To improve the model, we would want to add more images of a partially open door labeled as ‘open’ so the model can start to recognize those images as the “open” concept.
For a detailed breakdown on how to interpret the results and best practices around building your custom model, check out our docs.
We want to hear from you!
If you have any other questions or thoughts on this blog post, the Model Evaluation tool, or Clarifai in general, feel free to reach out, feedback@clarifai.com! We look forward to hearing from you.