Are the RSC allowing Web-spiders into the site and are major updates submitted to the web-search engines so as to ensure a renewed grab by the spider? Also do the titles of the webpages repeat the name of the organisation as well as the topic of the webpage to enforce the association(?0 If not - it is not surprising that little comes up on the RSC(?).
From Month to month - spiders can be 20 to 75% of downloads to the academic domain I would after. Which is good - this means (in theory) the info can be easy to find on the search engines. http://www.ccp14.ac.uk
On this theme, we have written a little program called "meta-hunter", which seeks out metadata in whatever form from remote web sites. Arguably, since meta-data is often given a weight factor 10-20 times that of "body text", carefully selected metadata can significantly improve the quality of search results. Alta vista is one search engine that clearly indicates that meta-data is indexed by them. Our trawl of a number of carefully selected sites, including the RSC, reveals that predominantly the only meta-data most sites have is the "generator" field, which tells the company that wrote the authoring software used to prepare the site that their product was used. At most what one also gets is "description|" and "keywords". We are urging people to use the Dublin Core variant of meta-data. I would like to propose that one element of this scheme, namely <META NAME="DC.Type" CONTENT="chemical"> be used to identify that the page is "predominantly" chemical in its content. Since such a field can give say a weight of 10 or 20 to the term "chemical", any subsequent search of an index using X and chemical as the search term would return only pages which have genuine chemistry in their content. To see how Dublin Core works, go to http://www.ukoln.ac.uk/metadata/dcdot/ If you point it at our web site for example, http://www.ch.ic.ac.uk/ you will see the metadata we have entered, at least on the root document of our site. We will release the metahunter program in the near future (at the moment its a Java appplet, but rather hungry in its memory requirements, which we are working at reducing). Finally, assuming that the chemical community agrees on at least some common meta-data declarations, the stage would be set for genuine chemical indices of high value content. Still, I remain cynical that the community will ever be happy with a "single" chemistry portal into such content. Maybe I should also suggest that we all move to XML. Once that is done, the world will change again! You ain't seen nothing yet! Watch this space for some interesting stuff in this area!! Dr Henry Rzepa, Dept. Chemistry, Imperial College, LONDON SW7 2AY; mailto:rzepa@ic.ac.uk; Tel (44) 171 594 5774; Fax: (44) 171 594 5804. URL: http://www.ch.ic.ac.uk/rzepa/ chemweb: A list for Chemical Applications of the Internet. To post to list: mailto:chemweb@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe chemweb List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)