#TipTuesday: Image Moderation with A.I

Options
AndrewD
AndrewD HLV Staff
edited June 25 in Talk Community #1

Hey gang! It's Tuesday somewhere. I'm going to deviate a little bit from the usual quick tips for using Vanilla and instead jump into something a little more complicated (and long-winded).

Lets get an A.I to moderate images on our communities!

My goal here will be to create a way to quickly flag posts that have images uploaded to them if those images hit an certain "inappropriate" threshold. The flagged post will be sent to a human moderator to make the final call and take appropriate action.

What do you need?

The first two there are probably items everyone reading this are familiar with. The A.I-That-Can-See-Pictures part may be the one that's new to you. There are a lot of A.Is that are floating around the internet that are already trained for this task. The work is done and you just need to find one you like.

Image AI Options

There are a lot of available options to tap into this new power:

All of these have an API that you can tap into to build out an integration. For this build, however, I'm going to recommend that everyone cheat and use a platform called EdenAI.

Why is EdenAI cheating? It is already integrated with all the listed A.Is above (and more) and it ties really nicely into Zapier. When you use EdenAI, you can even tell it to use the APIs from some A.I platforms but not others. The work is done and I'm all about working smarter, not harder. You can do this with individual A.I APIs, but it does make things a little more complicated. For the rest of this guide, assume that EdenAI is being used as our platform of choice.

Setup On Vanilla

We need three things configured on your Vanilla community before moving on to setting up the integration.

First, create a category for the integration to post into. In my testing of this, I just used my "Reported Posts" category that is created when you enable the Reporting addon but if you'd rather have a separate "queue" category just for images, you can set up something separate.

Next, create a role for your Image Moderation Bot. This role should have standard user permissions, the ability to create Tokens and the permission to View any category that you want it to be active in. It only needs permission to Post in the Reporting category you created in the first steps. We will come back to make a slight tweak to these View permissions later on.

Last, we're going to make an Image Bot user. Create a user in the Dashboard that uses the Image Moderation role you created in the second step. It's the API Token for this user that we will use when building out the Vanilla part of the integration in Zapier.

To Zapier!

Now we're kicking this off on Zapier. Here is what we're going to be building toward:

I'm going to go over each step, from the top:

Starting action: Discussion Created In Community

Add a Zapier Filter that continues the Zap as long the post contains an Image URL. The Image URL is where image data is kept in the post body (typically uploaded to the Vanilla CDN).

At this phase, we tap into EdenAI and let some magic happen over there.

Event Config: Explicit Content Detection

Action Config: Select your providers and

Quick Break: What Is Happening At This Stage?
This is where the interesting robot stuff is happening. EdenAI is grabbing the image from the Image URL we passed over via Zapier, and runs the provider A.I models over it. These A.Is have been trained to look for certain things and when it finds those things, it increases a score value for a different category (Violence, Hate, Adult Content etc). When the detection phase is complete, it shoots out a number that averages those together. If the number is high, than it believes the image isn't safe for work.

Here is an example of the return if I send Google's SafeSearch AI a picture of the statue of David:

{
"google": {
"status": "success",
"nsfw_likelihood": 5,
"nsfw_likelihood_score": 1,
"items": [
{
"label": "Adult",
"likelihood": 3,
"likelihood_score": 0.6,
"category": "Sexual",
"subcategory": null
},
{
"label": "Spoof",
"likelihood": 1,
"likelihood_score": 0.2,
"category": "Other",
"subcategory": null
},
{
"label": "Medical",
"likelihood": 2,
"likelihood_score": 0.4,
"category": "Content",
"subcategory": null
},
{
"label": "Violence",
"likelihood": 2,
"likelihood_score": 0.4,
"category": "Violence",
"subcategory": null
},
{
"label": "Racy",
"likelihood": 5,
"likelihood_score": 1,
"category": "Hate",
"subcategory": null
}
],
"cost": 0.0015
}
}

You can see how it breaks down the images into categories based on the training it received. The whole thing is interesting to go through, but really, we want to focus on the main nsfw_likeliness score. This is a 1-5 value, with 5 meaning that the A.I does not believe the image is safe for work at all. The nsfw_likelihood_score is a value between 0-1, and operates as a 'confidence' score'. In the case of an image of David being uploaded here , the A.I is 100% confident that the image is not safe for work.

It's not hard to understand why it would get this score from an A.I model - if you ignore the idea of art, the subject matter is inappropriate. However, we are human beings and have a lot more context for images like this, which is why before our Zap is completed, we introduce a Human Moderator to be the final check.

Back To Zapier!
Now we're going to add a filter so only potentially NSFW images go through the rest of our Zap.


We're telling Zapier to grab the Eden AI NSFW Likelihood value, which takes an average of the NSFW Likelihood from the providers that you're tapping into. You can make this as sensitive as you like - just remember that the likelihood score is a range from 1-5 with 1 being unlikely to be inappropriate and 5 being HIGHLY likely to be inappropriate.

There are two ways you can progress after this filter.

Easy Way

The only important thing is the averages. This doesn't rely on Zapier Paths (which require a paid subscription to the platform). Just create a HL-Vanilla trigger to create a new discussion:

The text of the action is up to you, but in my example, I made something like this:

This gets posted to the category we setup at the very beginning of this #TipTuesday article. Remember, this category should be one that only moderators have access to. You know what data your moderator team needs with flagged posts like this, but I highly recommend including the post Url so they can easily jump to it and take action.

The very last thing you should do is go back to your community and edit your Image Moderation role so it no longer has permission to View your moderation category (where it's sending flagged posts). You don't want to accidentally create a loop where it's flagging it's own posts.

After this step, you're done.

Complicated Way

When I was building this out originally, something I considered was different communities may have different priorities when it came to flagging NSFW images. A technical support community may be fine taking a straight average for all images and working off that data for moderation, but if a community has an Artistic focus, they may be less sensitive to images that contain Adult content and require a particularly high NSFW_likeliness return before an image like that gets flagged. Alternatively, an international news community or video game community may not care as much about Violence depicted in images, but have a very strict policy against hate symbols.

By using Paths, you can take images that pass through the last Filter we set up and change the action taken based on why that post was getting flagged.

With my Zap, images can go down different paths based on whether they were flagged due to Violence, Adult Content or Hate Content.

Each path does essentially the same thing, but the Discussion posted to the community will have different language depending on which path a flagged post goes down. This also creates an opportunity for you to configure other external triggers for specific image types - for instance, if the Adult Path gets triggered on your community, you could have it create a high-priority Zendesk ticket or ping a specific Slack channel to get eyes on the post faster.

Lets take a look at what I'm doing with the Violence Path:

I have several Continue If and Or rules set up with each path. This requires a little more looking at the full return that EdenAI provides with each image it scans. There are different item labels in each NSFW category, and each gets it's own score. This is an area where you can choose how sensitive you want to be to each subject. In my Zap, the Violence Path watches for:

GraphicViolenceOrGore with a likelihood over 3
or
Violence with a likelihood over 4
or
WeaponViolence with a likelihood over 4


If any post gets directed into my Violence Path and has an image that hits the thresholds set by any of those checks, it will post a discussion to the moderator section of the community indicating a Violent image was posted and it should be checked out.

You can find the different categories and subcategories that EdenAI sends over by looking at their API reference guide. Expand the section next to the 200 success information, and scroll the eden-ai section.

Copying from The Easy Way: The very last thing you should do is go back to your community and edit your Image Moderation role so it no longer has permission to View your moderation category (where it's sending flagged posts). You don't want to accidentally create a loop where it's flagging it's own posts.

And you're done. Publish the Zap and let A.I take some of your moderation workload away.

A.I Image Moderation: General Callouts

There are a few items I want to make sure we cover before closing out this article.

This technology is still pretty new. False positives are bound to happen. That is why I think it's very important for a human being to be the one that actually takes action on flagged posts, rather than letting the A.I decide based on the numbers it calculates.

This costs money per image scan. It is fractions of a cent, but if you have a large community, it can add up fast. While it's easy to use something like EdenAI and select all providers, it is smarter to pick a couple that you want to primarily lean on. This will help reduce costs and make it easier to learn how an A.I model works and what it's likely to respond to.

Don't forget comments. Images can be posted in comments too. I set this Zap up to trigger on Discussions only. If you copy the template and change the initial trigger to Comment Added, you can cover more bases. Be aware that this will increase costs.

Tweak As Necessary. It is very difficult to know how an A.I model is going to see each image. If you launch this on your community, don't be afraid to adjust the threshold for when an image gets flagged to make it more or less sensitive. This is only helpful to your moderators if it isn't flooding them with false positives.

Be transparent with your community regarding use of A.I. There are a lot of conversations happening around the use of A.I in the community space. If this is an integration you plan to set up, I recommend telling your users when it's going to be set up, why you're doing it and then be open to feedback. Image Moderation definitely isn't something that every community needs, but it can be a very powerful safety feature.


That's All, Folks

That's all I have for this #TipTuesday! Let me know if you found this helpful or if you have any questions. While no expert (is anyone truly an expert on A.I yet?), I have had a lot of fun playing around A.I image moderation on own site, and I may be able to help guide you if you're trying to do something very specific.

Have a great weekend everyone!

Comments