By Jackie Glade
•
28 March 2026
•
6142 views
As you may know, we take anti-bot measures very seriously at Glade Art; Helping our fellow users learn their craft is one of our top priorities. We also engage in trolling bots by using an endless maze of useless data to trap them. These are commonly known as “honeypots” or “digital tar pits”. And so, after 6.8 million requests over the last 55 days at the time of writing this, we have some substantial data, so get ready and let’s share it with you. : ) > 1. Quick explanation. For starters, these bots do not follow robots.txt. This is what is expected of unethical companies, but that doesn’t make it any better. (A robots.txt file is a plain txt file placed on websites that contains rules for where bots are allowed to go and where they are not allowed to go. Good bots like search engine crawlers follow these rules, while bad bots do not). To avoid trapping good bots, we’ve set our robots.txt to block all bots from visiting this site’s tar pit. > 2. Pages and content. The 2 traps with the most bot activity on this site are: GladArt (DOT) com (SLASH) data-export (over 6.8 million requests in the last 55 days). GladArt(dot)com(slash)grow (over 84k requests in the last 35 days). (Note: Use a VPN on these pages if you don’t want your IP shown in logs, but among millions of others it won’t matter anyway). As you can see by visiting the pages, GRO produces more book-like text, while Data Export’s text is nice… whatever that is. Data export is much more successful than GRO. It would be safe to assume that these companies are looking for more number-rich data for better facts and content. Fake personal information like emails or phone numbers are also very attractive for scraping. > 3. Features of these bots. The IPs of these bots here do not actually come from the datacenter or VPN most of the time; The overwhelming majority comes from residential and mobile networks. Asian and Indonesian countries are where almost all the people live. By taking advantage of cheap computers from such countries while using residential IPs, they can appear on multiple websites as purely human traffic and perform large scale scraping. However, there is some good news: These bots do not execute JavaScript, at least not when scraping random sites across the web. Just imagine the compute cost if they couldn’t use headless browsers while scraping millions of sites every hour! This makes PoW challenges extremely effective against them. Website traffic on these human-looking scales comes from bots, which begs the question: “How much of Internet traffic comes from bots?” > 4. How much traffic on the Internet comes from bots? The 2024 report says that about 51% of the traffic on the internet comes from bots. Now that sounds like a lot, and it is, but it’s much worse than that. This is because these estimates depend heavily on where the IP addresses originated: whether they come from a datacenter or not. As we can see in our data, there is a very high proportion of bots that do not come from the datacenter at all. They can certainly be rigged to execute JavaScript on high quality sites, and many sites don’t even require JS, such as Wikipedia and old Reddit. With this in mind, it would not be unreasonable to assume that the amount of bot traffic on the Internet is very high, perhaps even over 70%. > 5. Some experiments on these bots. Of course we did some experiments on these bots. Quick Facts: Anubis is a program that adds a proof of work challenge to websites before they reach users. And so Anubis was enabled in the Tar Pit on difficulty 1 (the lowest setting) with requests coming in 24/7. Before it was enabled, it was receiving several hundred-thousands of requests every day. As Anubis became active there, it dropped to about 11 requests after 24 hours, most of which were from just curious humans. Was it a coincidence? No it was not. It was tested on several other occasions and yielded very similar results. As this confirms, bots don’t like PoW challenges, even extremely easy ones. If few people execute JS, very few challenges will be solved; Take the search engine crawler GoogleBot, for example. > 6. Whose bots are these? These bots are almost certainly scraping data for AI training; Typical bad actors don’t have the money for millions of unique IPs thrown at a page. They probably belong to many different companies. Perhaps they sell their scraped data to AI companies, or they are AI companies themselves. We can’t tell, but we can guess because there aren’t that many big AI corporations out there. > 7. How can you protect your sites from these bots? If you have a large number of pages on your site, these bots can potentially increase the resource usage for your server when they are crawling everything. The best choices in this case would be Cloudflare or Anubis. Alternatively, you can add a simple JS requirement to your web-server, for example Nginx, (this will not be as effective, but will often be enough for most sites). It would be recommended to add hCaptcha to sign up and similar forms. Overall, a correctly configured Anubis eliminates almost all bot traffic on your site. > 8. Server resource usage. Our server usage for the Tar Pit endpoint is quite low. For example, when data export was reaching the global 1000 requests per minute rate-limit, the server’s CPU usage was no higher than idle (i5 4460). The RAM usage for it was also very low, less than 500 MB. And since it was only text data being sent, uploads did not exceed 700KiB/s. > 9. Fun facts. So on average, Data Export Tar Pit generates 9000 characters per request. Doing the math on that, 6.8 million loads equates to ~52 billion characters, or a total of 120,000 novels’ worth of text generated and sent since January 29, 2026. > 10. Download a log file. Here’s a huge log file for some activity in the data export tar pit: https://mega.nz/file/69Rh3IpS#ThlagHz8e58jLvU-vWn9U9m9T_WegL4SE0H2mhZRcZY Caution: This file decompresses to about 1.1GB. Standard text editors will have difficulty opening it. Note: This file contains logs from January 29 to March 22, 2026.
[This is for educational purposes only]. <> Outro. And so, from this information we can see how bad the situation of bots is on the internet right now. However, look on the positive side, trolling bots is fun! We also recommend you add your own tar pit to your site; The more quantity the better. Just be sure not to allow it there in your robots.txt so good bots don’t get stuck. Bad bots actually often visit that page because you didn’t allow it for them. Thank you for reading! : ) <>
← Back to Blog
<a href