Re: www.google.com Searches & Royal Society of Chemistry (fwd)

15 Apr 1999

      ...
Are the RSC allowing Web-spiders into the site and are major updates
submitted to the web-search engines so as to ensure a renewed
grab by the spider?    Also do the titles of the webpages
repeat the name of the organisation as well as the topic of the
webpage to enforce the association(?0
If not - it is not surprising that little comes up on the RSC(?).
...
From Month to month - spiders can be 20 to
75% of downloads to the academic domain I would after. 
Which is good - this means (in theory) the info can
be easy to find on the search engines.
   http://www.ccp14.ac.uk
On this theme, we have written a little program called  "meta-hunter",
which seeks out metadata in whatever form from remote web sites.
Arguably, since meta-data is often given a weight factor 10-20 times
that of "body text", carefully selected metadata can significantly improve
the quality of search results. Alta vista is one search engine that
clearly indicates that meta-data is indexed by them.

Our trawl of a number of carefully selected sites, including the RSC,
reveals that predominantly the only meta-data most sites have is the
"generator" field, which tells the company that wrote the authoring
software used to prepare the site that their product was used. At
most what one also gets is  "description|" and "keywords".

We are urging people to use the Dublin Core variant of meta-data.
I would like to propose that one element of this scheme, namely

<META NAME="DC.Type" CONTENT="chemical">

be used to identify that the page is "predominantly" chemical in
its content. Since such a field can give say a weight of  10 or 20
to the term "chemical", any subsequent search of an index using
X and chemical
as the search term would return only pages which have genuine
chemistry in their content. To see how  Dublin Core works, go to
http://www.ukoln.ac.uk/metadata/dcdot/
  If you point it at our web site for example,
http://www.ch.ic.ac.uk/   you will see the metadata we have entered,
at least on the root document of our site.

We will release the  metahunter program in the near future (at the moment
its a Java appplet, but rather hungry in its memory requirements, which we
are working at reducing).

Finally, assuming that  the chemical community agrees on at least some
common meta-data declarations, the stage would be set for genuine
chemical indices of high value content. Still, I remain cynical that
the community will ever be happy with a  "single" chemistry portal
into such content.

Maybe I should also suggest that we all move to  XML. Once that is
done, the world will change again! You ain't seen nothing yet!
Watch this space for some interesting stuff in this area!!

Dr Henry Rzepa,  Dept. Chemistry,  Imperial College,  LONDON SW7 2AY;
mailto:rzepa@ic.ac.uk; Tel  (44) 171 594 5774; Fax: (44) 171 594 5804.
URL: http://www.ch.ic.ac.uk/rzepa/ 

chemweb: A list for Chemical Applications of the Internet.
To post to list:  mailto:chemweb@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/chemweb/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe chemweb
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Re: www.google.com Searches & Royal Society of Chemistry (fwd)

Rzepa, Henry