Note– this is a fairly novice topic that’s covered in at least part elsewhere… but I figured it may be helpful so here you go.
I don’t care about SEO for this site. I have installed plenty of SEO plugins, no doubt. But mostly to test things for other sites… and some of them make it easy to edit things like .htaccess without having to ssh in. But once in a while, I do look at numbers. This time, I was using one of those shady domain valuation tools. I wanted to show a client how useless those tools are, so I put this site in as an example. Because I’ve had this domain for almost 15 years, the tool said it’s worth over $10k. Ha! I doubt anyone on GoDaddy auction is going to offer me 1/100th of that. There seems to be only one other Jay Ratkowski out there that the internet is aware of. He doesn’t seem to have much of an online identity, and probably wants nothing to do with his own dotcom.
Anyway, in some of the stats rattled off by that tool, it said I had like 3 results indexed in Bing and -1 in Yahoo!. How do you have a negative number of pages indexed? Well, I guess that speaks to the quality of such tools.
In reality, I have 22 pages in Yahoo! and 21 in Bing. Still, not what it should be. Google has almost 700 pages indexed, which is pretty good considering how many times I’ve redone my URL structure and taken entire sections of content offline. I only have about 550 pages in my sitemaps, and the remaining pages are probably either random orphans or results of pagination, tags, etc. Regardless, I think about 400 pages is a fair mark for unique information that should be indexed. Anything extra is a bonus.
So why only 22 pages in Bing?
Well, you have to check the obvious stuff first. My robots.txt is not blocking bingbot. It’s not setting any funky crawl delay either. Crawl delays are largely used, in my opinion, by developers with really outdated logic who think that a search engine crawling more than one page every other second is going to take their servers down. If googlebot or any other reputable search engine spider hurts site performance in a meaningful way, you’ve got serious problems. Crawl delays are not the solution.
Anyway, I’m on Apache, so I checked .htaccess as well. Not blocking Bing there either.
I double-checked a few other common stumbling points:
- Meta Robots: I leave it off, which means index, follow.
- Errors & redirects: I don’t have them. Very very few at least.
- Canonicals: In almost all instances, urls redirect/rewrite to the canonical version. So Bing shouldn’t be seeing much duplication.
- Log files: Bing doesn’t visit my site much, at all.
Next, I wanted to see what else I could do to get Bing to pay a little more attention.
Especially since server logs indicate that most days Bing gets to my Robots.txt file and gives up.
So I made two changes as a test.
- Added my sitemap location (if you don’t know, the format is Sitemap: http://www.yourdomain/yoursitemap.xml) & waited a week.
- Added the line:
Again waited a week.
The first change yielded no results. And that’s logical as Bing already knows my sitemap exists, but I was still curious if they’d view it more often once it was in the robots.txt. Ditto to the next change. And for reference I gave bots access to everything first in robots.txt and after that gave directives to stay out of specific areas.
Next I went to look at my stats in Bing Webmaster Tools.
I thought I may have never submitted my sitemap there. I at least hadn’t via my current Bing account… but looks like I must have at some point because after verifying, it was in there. Anyway, that’s kind of a good sign, as Bing seems generally clueless about sites unless you go through the trouble of letting them know.
Bing is saying they’ve index 374 pages. So why is bing.com only returning 22 of them? (sidenote at this point, if you search my site in Bing/Yahoo and go to page 2, the results expand and you actually get 40 pages. Still.)
There’s no commonality among the pages that actually appear in results. Some have backlinks, some don’t. Some are blog posts, some are category/tag archives. Some are new, some are old.
Where it gets interesting is when you modify the query with some keywords…
See, the lesson here is that while in Google doing a site: search returns nearly all pages indexed (often within 5%), Bing is just full of shit. They’re seemingly returning the most relevant pages, or something similar to that. Why someone using advanced search operators would want a limited result set, I have no clue. But Bing seems to think it’s a good idea. Whether intentionally or it’s just built into their model and they didn’t even think about it, it’s dumb.
Remember: don’t freak out over what Bing search tells you for indexation. Always go to their Webmaster Tools reports.
Again, Bing is just full of shit.
Regardless, if you have legitimately low indexation in Bing, I think I just outlined all the common-sense stuff to check on. Make sure you do that. Then go back to caring about Google entirely.