Distil organizes incoming threats according to the types listed below.
NOTE: Use this information when reviewing the Bad Bots report to better understand the bots targeting your site.
Distil maintains a curated list of known content thieves and malicious bots. This is a global IP and bot signature ACL that is maintained based on information gathered from all of our customers.
A subset of this are the "Known Violator User Agents" shown under the "Bad Bots" section of Threat Analysis. These are browsers that have failed low level checks for user agents which always indicate scraping behavior. The presence of these Known Violator User Agents add that visitor to the global Known Violators list automatically.
Bad User Agents
Similar to our Known Violators list, Distil also maintains a curated list of User Agents known to be regularly associated with malicious and resource hogging bots. User agents that appear on this list will be flagged as "Bad User Agents" within the Distil Portal.
Aggregator User Agents
These are known user agents associated with content aggregators such as RSS feed readers and Wordpress plugins. Much like with the bad user agent list, user agents that appear on this list will be flagged as "Aggregator User Agents." In general, this setting should be left on Monitor only if you have syndicated (RSS/Atom) content.
On each connection, Distil validates that specific request headers and request behaviors fall in line with the specifications of the reported user agent. For example, if a browser reports as an Internet Explorer variant, but sends a Firefox or Chrome-specific header.
Often referred to as hotlinking, Distil allows you to block websites from referring to content on your website. Many customers use this feature to prevent scrapers from linking to their content.
When trying to prevent fraud, knowing where your users are accessing your website from can be an important first step. Distil subscribes to a GeoIP data set that allows us to identify when a visitor is accessing your website via an anonymous proxy that is masking their true IP address.
Rate Limiting Settings
As Distil has deep insight into your website's traffic patterns, we are able to give recommendations for rate limiting settings that would be otherwise unavailable.
Pages Per Minute
The Pages Per Minute setting is what you would expect from basic rate limiting. It tells you when someone has exceeded the number of pages per minute that you've set in the configurable box. One thing to note is that if this setting is set to Monitor with a value greater than 0 is entered within the box, Distil will still report when someone has exceeded the pages per minute threshold, there will just be no action taken on those requests.
Pages Per Session
Pages Per Session defines the number of pages a single browser can access before leaving the website for 30 minutes. For example, if you set your Pages Per Session to "CAPTCHA" and "45" this means that a visitor would be able to access 45 pages on your website before being prompted to complete a CAPTCHA. If they continued to access your website after completing the CAPTCHA, they'd continue to receive CAPTCHAs (every approximately 5-10 minutes) until they leave the website for 30 minutes. This allows the timer to reset.
Session Length Exceeded
Session Length Exceeded defines the number of minutes a user can consecutively access your website before having to leave for 30 minutes (the Distil session expiration time). Users that exceed this many minutes of constantly accessing your website will be flagged as "Session Length Exceeded" violators.
The Unverified Signature threat is an experimental Distil trap type that occurs when a browser attempts to access Distil with a bot signature that has not yet been verified as valid. This behavior is often tied to bots that visit a website and attempts to randomly cycle its Distil signature to evade detection. Due to its experimental status, it is not currently actionable though information is being collected and monitored by Distil.
Unclassified User Agents
User agents that have not yet been classified in any of the above categories. Often these are tied to the “Reporting As” title; that is, they appear to be normal browser user agents, but Distil believes the browser may not actually be making the request.
User agent not matching any of the patterns Distil has defined as known bots. Further research is required before it can be categorized.