View Full Version : Legacy Pages Out There?
dthomsen8
07-31-2007, 08:19 AM
Does anyone here have any idea how many legacy pages are out there in general? How many HTML 2.0 and 3.2 pages are still out there on the web? Surely someone on the web has attempted to collect statistics.
Google has a big study on web page code (http://code.google.com/webstats/index.html), with a great deal of detail. I have not read much of it yet, but I will.
I found this study from here. (http://triin.net/2006/06/12/HTML) This smaller study does have interesting statistics.
My own perception is that there is a great deal of HTML 4.01 out there which has never been upgraded. Also, many web sites do not specify a DOCTYPE. Of those that do, only a few validate with the www.w3c.org (http://www.w3c.org) validation service. The only large web site that I have found that validates as XHTML is the W3C site itself. Doubtless there are others.
iamback
08-02-2007, 07:45 AM
Google has a big study on web page code (http://code.google.com/webstats/index.html), with a great deal of detail. I have not read much of it yet, but I will.
I found this study from here. (http://triin.net/2006/06/12/HTML) This smaller study does have interesting statistics.Interesting stuff in both. Google drops a few balls (not surprising since apparently they've only recently become aware there are such things as standards!). The biggest blooper I found (but I didn't read all pages):...revisit-after, supposedly used to tell search engines how often to recrawl the page. To our knowledge only one search engine has ever supported it, and that search engine was never widely used — at this point, it is nothing more than a good luck charm. A remarkably widely used one. More pages use the completely worthless <meta name="revisit-after"> than use the <em> element!Google seem to be unaware that there are such things as local search engines which would be perfectly happy to be told not to reconsider a page for a while! If your search engine indexer evaluates this directive, it's far from "completely worthless", and if public search engines ignore it, so what? It serves its function regardless.
My own perception is that there is a great deal of HTML 4.01 out there which has never been upgraded. Also, many web sites do not specify a DOCTYPE.My bet is that a lot of those that don't specify a DOCTYPE are actually (sort of) HTML 3.2 - from when that was effectively the only game in town - but no one cared much about standards.
Of those that do, only a few validate with the www.w3c.org (http://www.w3c.org) validation service.And for those that don't specify a DOCTYPE, the validator assumes HTML 4.01 - so they likely won't validate anyway if it was actually HTML 3.2 (or something intended to be sort-of that)...
dthomsen8
08-02-2007, 05:28 PM
What do you mean by "local search engines"? Do you mean those for a limited geographical area, or those run by local ISPs, or what?
iamback
08-03-2007, 10:40 AM
What do you mean by "local search engines"? Do you mean those for a limited geographical area, or those run by local ISPs, or what?Those running on a site (server) itself! Or a server, for a collection of sites run on that server. Or within a company, for all that company's websites, internal or external. Etc.... In general: search engines operated by the owners of websites, usually (but not necessarily) running on the servers where their websites are hosted
ht://Dig (http://htdig.org/) (an OSS engine I once made some contributions to) is one example, but there are many more!
CarlSeiler
08-18-2007, 06:01 AM
My own perception is that there is a great deal of HTML 4.01 out there which has never been upgraded. Also, many web sites do not specify a DOCTYPE. Of those that do, only a few validate with the www.w3c.org (http://www.w3c.org) validation service. The only large web site that I have found that validates as XHTML is the W3C site itself. Doubtless there are others.
IBM's web site (http://www.ibm.com/) has for a number of years validated. It is currently using XHTML 1.0 strict.
vBulletin® v3.8.7, Copyright ©2000-2013, vBulletin Solutions, Inc.