*Summary:* AllResearch’s attitude is that if they tell you what they’re going to do with the content they take from you, you won’t give it to them, so they have no choice but to lie. This makes sense to them. They can’t imagine why anyone would not want to give them content. I liken it to a bank robber who, when asked why he robs banks, answers incredulously “Well, if I just walked up and asked for money they wouldn’t give it to me, would they?!”

I just discovered that we had a visitor to “our site”: that viewed 173,025 pages in 3 days. Seemed a bit high. So I looked into it and checked the IP address: A reverse DNS lookup shows that “”: owns the IP address.

Normally this isn’t a big deal. Googlebot does 10x as much traffic on the site. And there are plenty of other bots from other companies. But AllResearch’s bot does something that most bots don’t do: it lies. Every request to a web site requires that the bot tell the site what it is, such as Internet Explorer, Netscape, etc. AllResearch’s bot says it’s IE. In fact, it says it is different versions of IE so as to not arouse suspicion. Sometimes it says it’s IE 5 for the Mac (Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)) and sometimes it says it’s IE 6 for Windows (Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)). Why is it easier to go to the trouble to alternate user-agent strings than it is to just provide one honest static string?

It is a violation of our “Terms of Service”: and they would have known that had they bothered to read it. The damage is that they are effectively blocking legitimate use of the site by real people, they are placing an unnecessary burden on our servers and costing us money in bandwidth. And what are they doing with the data they retrieve? They are reselling it on their business. This is simply theft. If they want our content, they can do what every other bot we allow does: tell us the truth. The content is free, just be honest about it and everyone’s happy.

I called the company to ask them why their bot lies. I talked to one guy who sounded like he’s heard this before (“Uhh.. hold on…”) and was immediately passed on to someone else. I asked this guy the same question and his answer was that if they are honest about their user-agent string, sites won’t serve them content. Shocking! Maybe they don’t want you stealing their content! Did he ever think of that? Apparently not, he just answered “You can call it stealing if you want, we consider it a valuable service to our customers”. I love how people rationalize their actions when they know they’re wrong. And he fully admits that he’s taking content to provide it to others. I wonder if their “customers”: know they are paying for stolen content.

Google does a fine job being honest about their user-agent. If it’s good enough for Google, isn’t it good enough for AllResearch? It is unless AllResearch doesn’t want anyone to know they’re crawling sites. Why wouldn’t they want anyone to know? Honest people don’t need to cover their tracks.

When I insisted that they be honest in their user-agent string, he answered “Don’t you have anything better to do on a Friday afternoon than complain about our bot?”. In other words, he wants me to stop complaining and get back to work creating content for AllResearch to steal.

So just like “this person”: did when they discovered AllResearch’s bot, we now block their IP address. This will work until everyone does this to them and they change their IP address.

15 thoughts on “AllResearch is a thief

  1. Even though I banned them by IP, they keep hitting me once an hour, on the hour. Everytime, I give them a 403 Forbidden. And one hour later, there they are again. They may be obnoxious but at least they’re stupid. Makes my life easier. Meanwhile, I’m still keeping an eye on any IP that gets overzealous.

  2. I guess I’m not understanding something. When you say they are stealing content to sell do you mean recipes? If so why and how would they do that? I guess what I’m asking is what is this company trying to do? I understand your problem with bandwidth etc. I just don’t understand their idea. I don’t know what they think they are gaining besides getting to annoy someone.

  3. AllResearch takes content from sites (not just recipe sites, but in their words “tens of thousands of sites”) who don’t want to provide it to them. That’s “stealing”. Google, Yahoo, MSN, etc. all have bots that take content from sites for their search engines but their bots tell the site that they are Google, Yahoo, MSN, etc. so that content providers can choose whether or not to give it to them. AllResearch’s bot lies about who they are (they claim they’re Internet Explorer to fool sites into thinking it’s a human browsing the site) so that content providers are unable to make that decision. And they are doing that so that they can take content without your knowledge. Taking something against the owner’s will and without paying for it is the definition of “stealing”.

    They then take the content and put it behind their pay-service “wall” where they charge people to look at the content. They have several businesses that sell the content to different markets for different reasons. Taking something you acquired and charging others for it is the definition of “reselling”.

    AllResearch plays semantic games with the words “theft” and “reselling”, but that doesn’t disguise what they are doing. The fact that their bot surreptitiously pulls content from sites makes it clear that they know what they’re doing is dishonest. If they themselves believed it was honest, they wouldn’t try to cover their tracks.

    The claim they simply can’t be honest because sites won’t serve them content if they are honest. That’s the point they fail to grok… the sites should choose who gets their content and if sites don’t want AllResearch to have their content, AllResearch should respect their wishes. Otherwise, it’s theft.

  4. Thank you for explaining, I understand what they are now. They are the high tech version of the person that dresses as a repairman, comes into your home, takes your jewelry and goes and sells it.

  5. I have another question now, when you say, “They have several businesses that sell the content to different markets for different reasons”, are these businesses owned by them or innocent companies that are unaware that they are purchasing stolen content?

    Either way it seems they have a type of laundering scheme for the stolen content so that the end purchaser has no idea that they are purchasing stolen property. Is that a correct assumption?

  6. As far as I know, there are only two companies: AllResearch and is “an AllResearch Inc company” and as near as I can tell, and given the name, this copies content from web sites and serves up to paying customers.

    AllResearch sells the content to different markets. From their web site, these are “compeititive market studies”, “TrademarkTracker”, “Realtime Newswire Filtering”, “Wild Public Opinion Polling”, etc. The CEO & founder, Noah Silverman, is very vague on what they do with the content, preferring to insult my technical understanding of how the web works than explain his business(es). And because their businesses are all fee-based, to truly see what the content they provide would require paying for their services.

  7. I’ve got three of their IP’s blocked and have been fighting the refer war for only a few days now. Sending out plenty of 403s – but not stopping anybody from coming back. I had allresearch outta the way for a while — but now they’ve come back with — and they’ve made up over 2% of my traffic in December alone – 143 visits. Your IP address matches one of mine — the other two end in .11 and .17. So — how do we stop em?

  8. Try calling them and asking them to stop crawling your site. That’s what I had to do. The owner of the company was not polite and was condescending, but they did stop.

  9. yeah — i just found noah’s email and emailed him a note last night — I’m still waiting to hear back from him. They’ve crawled through 30 times since…

  10. I’m starting to chronicle my email exchange with this company at my website. I suppose I should expect an email demanding it’s removal soon enough. 🙂 In the meantime, I’ve added a RewriteCond to my .htaccess:

    RewriteCond %{REMOTE_ADDR} ^38\.144\.36\.
    RewriteRule .* - [F,L]

    His bot just stops when it gets a 403 trying to access my feed URL and then it comes back in an hour. That’s better than the raping I was getting once an hour!

  11. I see they did to you what Noah repeatedly did to me: tell you that you just don’t understand the technology because if you did, you wouldn’t complain. Other bots are successful at crawling sites without annoying the site owners but AllResearch is the only one who understands technology. The saving grace is that with management and technical know-how like Noah, they aren’t going to be in business long.

  12. Well I see your having the same trouble as I’m having. Well I’ve put something in place which is similar to what Larry has and it redirects all traffic from their IP addresses to my special ‘Bog Off’ page here. It is quite handy because here you’ve warned them to cease and desist or you’ll report them to law enforcement. 🙂

  13. OK, I have a solution of sorts to referrer spam – I bounce it back to them. I have almost eliminated it, except for one from h*hproftclub who refuses to go away. To be really EVIL, I’ve redirected that spammers output alone to the signup page at Probably not entirely ethical but then we’re dealing with two unethical orgs – why not play them off against each other? (Yeah I know, two wrongs…)

    It does mean more crap in my log file (the way I have it set up) but it’s fun. Now, my log file sees an attempt to access a page on my site, which is redirected. However, because of the way it is redirected the bot still thinks it is on MY site – so I also get to see the cgi request. For example: - - [04/Mar/2005:06:19:37 -0600] "GET /cgi-bin/signup.cgi?cname=web+poker&caddress=+%3Ch1%3EYou+can+a...

    So now webclippings are being auto-spammed by a spammer! We hate them both – now they can hate each other. And I think I am effectively hidden from this loop, thanks to how these redirects work. It just looks like h**hprofit is spamming them direct!

    Ha ha ha! Death to the spammer…

