Indexing Limits – When do the spiders stop?
November 1, 2007
During a regular call with one of my clients, who’s site is in redesign process, asked how large can page size really be. Well, it has been my experience that more than 50K in size, is probably too much and most likely won’t get crawled. Actually, I have seen pages much larger than this get crawled and heard this from many other SEOs, but I wanted to see real actual data to support the max 50K rule.
I came across a great experiment in sitepoint, where 25 pages pages of different sizes (from 45 KB to 4151 KB) and inserted unique, non-existent keywords into each page at 10 KB intervals. These pages were clearly generated specifically for this experiment and not for human use.
I was pretty surprised to learn that the experiment established the fact that the leading search engines differ considerably in terms of the the amount of page text they’re able to crawl. For Yahoo!, the limit is 210KB; for Google, 520KB; and for MSN, it’s 1030KB.
Filed under: SEO Factors
2 Comments Leave a Comment
1.
Zeetarian | September 23, 2008 at 2:27 am
can u give me more deetail for the case study u carried for this purpose
2.
admin | September 23, 2008 at 1:14 pm
Sure – here’s the link to the page I was referring to about the experiment http://www.sitepoint.com/article/indexing-limits-where-bots-stop/ it is 2 years old now, but I think the bots are more sophisticated now and can probably take on as much or more of the page. I think the biggest issue to consider should be page download time rather than how much of the page the spiders can index – very true when considering those who still have dial up connection.
Leave a Comment
XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
TrackBack URL | RSS feed for comments on this post.