While the recent explosion of AI has brought with it genuine excitement, this new technological frontier also ushers in some security concerns that both businesses and individuals should be aware of. In this article, we're going to focus on one such concern: the fact that AI bots, for the purpose of AI training, may crawl your site and take your data without your consent.
To combat this possibility, Higher Logic Vanilla (Vanilla) provides a built-in security feature, called AI Bot Shield, that blocks known AI bots and crawlers from leveraging your community data. It does this by identifying requests coming from these crawlers and blocks them, denying them access to you and your users data.
📝 NOTE: An AI crawler isn't necessarily a malicious entity, and many of them appear to crawl sites in a similar way as search engines. Unlink search engines, though, these bots take data from your sites to train their Artificial Intelligence models. This data is not attributed to your site or your users, and is taken without consent.
What about search engines?
AI Bot Shield will not affect your ranking in search engines. Crawlers for Google Search, Bing, Yahoo, Baidu, and others are still allowed.
Some companies, like Google, have AI crawlers and search crawlers. In these cases, only the AI crawlers are blocked when AI Bot Shield is turned on.
What bots are blocked?
Currently, the following crawlers are blocked:
User-Agents | Company |
---|
ChatGPT-User GPTBot | OpenAI |
cohere-ai | Cohere |
anthropic-ai | Anthropic |
Bytespider | ByteDance / TikTok |
CCBot | CommonCrawl - Used for many bots |
FacebookBot | Meta - Used for training AI speech recognition |
Google-Extended | Bard / Vertex / Gemini |
omgili | webz.io → Resold to other companies |
Additionally, AI bots recorded in Cloudflare's Verified Bot's Radar are blocked.
How do I enable this feature?
- Access the Dashboard.
- Navigate to Settings > Technical > AI Settings.
- Toggle ON the AI Bot Shield option.
📝 NOTE: For sites using our Hub / Node features, this setting can only be managed on the hub of the site, and is configured for all sites on the hub.